Add aggregatorMergeStrategy property in SegmentMetadata queries#14560
Merged
zachjsh merged 5 commits intoapache:masterfrom Jul 13, 2023
Merged
Add aggregatorMergeStrategy property in SegmentMetadata queries#14560zachjsh merged 5 commits intoapache:masterfrom
aggregatorMergeStrategy property in SegmentMetadata queries#14560zachjsh merged 5 commits intoapache:masterfrom
Conversation
- Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness.
Comment on lines
+858
to
+864
| FACTORY.mergeRunners( | ||
| Execs.directExecutor(), | ||
| Lists.newArrayList( | ||
| toolChest.preMergeQueryDecoration(runner1), | ||
| toolChest.preMergeQueryDecoration(runner2) | ||
| ) | ||
| ) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
24c1c2a to
7e926ab
Compare
7e926ab to
47ae161
Compare
jon-wei
approved these changes
Jul 11, 2023
zachjsh
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
Contributor
ektravel
left a comment
There was a problem hiding this comment.
Left a few suggestions related to the documentation.
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
abhishekrb19
commented
Jul 12, 2023
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
6 tasks
sergioferragut
pushed a commit
to sergioferragut/druid
that referenced
this pull request
Jul 21, 2023
…ache#14560) * Add aggregatorMergeStrategy property to SegmentMetadaQuery. - Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness. * Don't explicitly set strict strategy in tests * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/segmentmetadataquery.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
SegmentMetadata queries currently supports two types of aggregator merge strategies, namely strict and lenient, when "aggregators" analysis type is enabled. Users often want something less strict than a lenient policy, where the most recent aggregator is selected for a column in an evolving data model. A strict strategy is best suited once the data model is locked in.
This PR:
aggregatorMergeStrategy, tosegmentMetadataqueries. Please seedocs/querying/segmentmetadataquery.mdfor more information. This also allows us to define an earliest strategy (similar to the latest strategy) or more sophisticated merge strategies as needed.lenientAggregatorMergeboolean property in favor ofaggregatorMergeStrategy.strictwhen aggregators analysis type is enabled.mergedto<datasource>_<interval>_merged_<partition_number>format.SegmentAnalysisobject instead of just the aggregators. Also, add tests for latest aggregator merge and backwards compatibility logic.Release note
lenientAggregatorMergeproperty in segment metadata queries is deprecated in favor of a new propertyaggregatorMergeStrategy.aggregatorMergeStrategyalso supports a latest strategy in addition to existing strict and lenient strategies fromlenientAggregatorMerge.Key changed/added classes in this PR
AggregatorMergeStrategy.javaSegmentMetadataQuery.javaSegmentMetadataQueryQueryToolChest.javaSegmentMetadataQueryTest.javaSegmentMetadataQueryQueryToolChestTest.javaThis PR has: