add bloom filter fallback aggregator when types are unknown#7719
Merged
gianm merged 1 commit intoapache:masterfrom Jun 6, 2019
Merged
Conversation
gianm
approved these changes
Jun 6, 2019
gianm
pushed a commit
to implydata/druid-public
that referenced
this pull request
Jul 3, 2019
gianm
pushed a commit
to implydata/druid-public
that referenced
this pull request
Jul 3, 2019
clintropolis
added a commit
that referenced
this pull request
Jul 24, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I discovered a similar issue to #7660 while working on #7718 with the bloom filter aggregator, where it behaved in a manner even more strict than the quantiles aggregator, just not working at all if
ColumnCapabilitiesare not available. This PR remedies this issue by adding a fallback aggregator,ObjectBloomFilterAggregatorwhich examines the objects and aggregates to the best of its ability.This (and many other) aggregator could perhaps be improved by using something like a functional interface inside
bufferAddto have the initial version of the function checking types, and then locking in a selector specialized function after the first non-null value. However, since i'm unsure if the cost of the if is insignificant to the rest of the work, and since this is not the only aggregator that is using this per-row check, I save exploring this optimization for future work revisiting complex value aggregators as a whole.The added test only works for group by v2 because the bloom filter aggregator only has stub methods for it's
ComplexMetricSerde, which group by v1 requires to be a bit more implemented to perform nested queries, and results in some confusingBloom filter aggregators are query-time onlyerror messages that should probably be fixed in a follow-up PR.