complex aggregator based on http://datasketches.github.io#1897
complex aggregator based on http://datasketches.github.io#1897fjy merged 6 commits intoapache:masterfrom
Conversation
2cf7712 to
d4ddc5e
Compare
There was a problem hiding this comment.
what algorithm is actually being run behind hte covers? sketchMerge is a bit confusing
There was a problem hiding this comment.
I think a theta sketch, which is a more general version of KMV is being done, is that true?
if so, can we call the aggregators thetaIngest and theta?
There was a problem hiding this comment.
names here have some historical significance as they were used since the inception of this module with many ppl using those.
That said, I think, it will be possible to have new names (with support for old names at the same time so that most of our client code does not break).
but I believe, names should have build or merge in them so that it is clear whether they build a fresh sketch or just merge sketches (e.g. sketchMerge at ingestion time is used when user has already produced sketches as part of his/her batch pipeline, so input to druid already contains sketches) . Also having ingest in the name might be misleading some time e.g. sketchMerge aggregator is used both at ingestion time and query time.
yes, algorithm used is theta sketch a variant of KMV.
There was a problem hiding this comment.
you have an extra ) here
There was a problem hiding this comment.
approxiate => approximate
|
This looks cool overall but the test coverage looks really sparse at first glance. |
7905c2b to
9201e44
Compare
There was a problem hiding this comment.
I think a description of high level when to use the aggregators and post aggregators is required
There was a problem hiding this comment.
please also provide an example of how to ingest data with theta sketch
|
👍 after comments around documenting usage are fixed |
9201e44 to
4823b12
Compare
|
👍 |
4823b12 to
b1768c0
Compare
|
@fjy updated the doc with more explanation and examples. I believe, this is ready to merge now. |
b1768c0 to
0262961
Compare
old names are still valid though so as to be backwards compatible for now
0262961 to
7788f7c
Compare
|
will merge after travis |
complex aggregator based on http://datasketches.github.io
these aggregators are similar to hyperUnique in terms of functionality, but also provide arbitrary set operations on underlying sketches via a post aggregator.
We will formally announce it with a blog post some time in november .