Skip to content

complex aggregator based on http://datasketches.github.io#1897

Merged
fjy merged 6 commits intoapache:masterfrom
himanshug:new_sketch_aggregation
Nov 12, 2015
Merged

complex aggregator based on http://datasketches.github.io#1897
fjy merged 6 commits intoapache:masterfrom
himanshug:new_sketch_aggregation

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

these aggregators are similar to hyperUnique in terms of functionality, but also provide arbitrary set operations on underlying sketches via a post aggregator.

We will formally announce it with a blog post some time in november .

@himanshug himanshug force-pushed the new_sketch_aggregation branch from 2cf7712 to d4ddc5e Compare November 1, 2015 14:26
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what algorithm is actually being run behind hte covers? sketchMerge is a bit confusing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a theta sketch, which is a more general version of KMV is being done, is that true?

if so, can we call the aggregators thetaIngest and theta?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names here have some historical significance as they were used since the inception of this module with many ppl using those.
That said, I think, it will be possible to have new names (with support for old names at the same time so that most of our client code does not break).
but I believe, names should have build or merge in them so that it is clear whether they build a fresh sketch or just merge sketches (e.g. sketchMerge at ingestion time is used when user has already produced sketches as part of his/her batch pipeline, so input to druid already contains sketches) . Also having ingest in the name might be misleading some time e.g. sketchMerge aggregator is used both at ingestion time and query time.

yes, algorithm used is theta sketch a variant of KMV.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have an extra ) here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approxiate => approximate

@drcrallen
Copy link
Copy Markdown
Contributor

This looks cool overall but the test coverage looks really sparse at first glance.

@himanshug himanshug force-pushed the new_sketch_aggregation branch 2 times, most recently from 7905c2b to 9201e44 Compare November 10, 2015 08:06
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: approximate

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the URL also stopped working

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a description of high level when to use the aggregators and post aggregators is required

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also provide an example of how to ingest data with theta sketch

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 10, 2015

👍 after comments around documenting usage are fixed

@himanshug himanshug force-pushed the new_sketch_aggregation branch from 9201e44 to 4823b12 Compare November 11, 2015 05:48
@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Nov 12, 2015

👍

@himanshug himanshug force-pushed the new_sketch_aggregation branch from 4823b12 to b1768c0 Compare November 12, 2015 05:58
@himanshug
Copy link
Copy Markdown
Contributor Author

@fjy updated the doc with more explanation and examples. I believe, this is ready to merge now.

@himanshug himanshug force-pushed the new_sketch_aggregation branch from b1768c0 to 0262961 Compare November 12, 2015 06:01
@himanshug himanshug force-pushed the new_sketch_aggregation branch from 0262961 to 7788f7c Compare November 12, 2015 06:04
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 12, 2015

will merge after travis

@fjy fjy closed this Nov 12, 2015
@fjy fjy reopened this Nov 12, 2015
fjy added a commit that referenced this pull request Nov 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants