Skip to content

Reset sketch combiner in AggregatorCombiner#8368

Merged
gianm merged 5 commits intoapache:masterfrom
a2l007:fix_sketch_reset
Aug 23, 2019
Merged

Reset sketch combiner in AggregatorCombiner#8368
gianm merged 5 commits intoapache:masterfrom
a2l007:fix_sketch_reset

Conversation

@a2l007
Copy link
Copy Markdown
Contributor

@a2l007 a2l007 commented Aug 21, 2019

Fixes #7741
This PR fixes an issue where the AggregateCombiner.reset() within SketchAggregatorFactorydoes not clear the Sketch union object. As a result, during index merging if the sketches are empty for a set of rows to be combined, the combined sketch value from the previous set of merged rows would be reused in the current combine thereby corrupting the data.
This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths.
  • been tested in a test Druid cluster.

@leventov Please review.

@jihoonson jihoonson added the Bug label Aug 21, 2019
@jihoonson jihoonson added this to the 0.16.0 milestone Aug 21, 2019
@a2l007 a2l007 changed the title Reset union in AggregateCombiner Reset sketch union in AggregatorCombiner Aug 21, 2019
@a2l007 a2l007 changed the title Reset sketch union in AggregatorCombiner Reset sketch combiner in AggregatorCombiner Aug 21, 2019
@himanshug
Copy link
Copy Markdown
Contributor

LGTM but build failure is genuine

@leventov
Copy link
Copy Markdown
Member

For context, #7329 could be used to test AggregatorCombiner.reset() in all aggregators

@a2l007
Copy link
Copy Markdown
Contributor Author

a2l007 commented Aug 22, 2019

That's weird. I don't seem to hit this issue when I run the tests locally. Let me look into it.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Aug 23, 2019

coverage/coveralls — Coverage decreased (-2.8%) to 65.21%

Coveralls errored with a -2.8% coverage decrease. This seems incorrect: only one line of prod code was changed, and there is a test for it. So I'll merge the patch anyway.

@gianm gianm merged commit 661976f into apache:master Aug 23, 2019
@a2l007
Copy link
Copy Markdown
Contributor Author

a2l007 commented Aug 23, 2019

@clintropolis Would it be worth calling this patch out in the Release Notes? It would be useful for the datasketch users to know that existing theta sketch aggregates generated with releases since 0.13.0 may be invalid and would have to be recomputed in 0.16.0

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis Would it be worth calling this patch out in the Release Notes? It would be useful for the datasketch users to know that existing theta sketch aggregates generated with releases since 0.13.0 may be invalid and would have to be recomputed in 0.16.0

Yeah, I think so, I'll be sure to add mention of this, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indexing tasks containing thetaSketches results in incorrect sketch values

6 participants