Support overlapping segment intervals in auto compaction#12062
Merged
maytasm merged 8 commits intoapache:masterfrom Jan 4, 2022
Merged
Support overlapping segment intervals in auto compaction#12062maytasm merged 8 commits intoapache:masterfrom
maytasm merged 8 commits intoapache:masterfrom
Conversation
maytasm
commented
Dec 15, 2021
| { | ||
| final ISOChronology chrono = ISOChronology.getInstance(DateTimes.inferTzFromString("America/Los_Angeles")); | ||
| Map<String, Object> specs = ImmutableMap.of("%%GRANULARITYSPEC%%", new UniformGranularitySpec(Granularities.WEEK, Granularities.NONE, false, ImmutableList.of(new Interval("2013-08-31/2013-09-02", chrono)))); | ||
| // Create WEEK segment with 2013-08-26 to 2013-09-20 |
Contributor
Author
There was a problem hiding this comment.
2013-08-26 to 2013-09-02
maytasm
commented
Dec 15, 2021
| if (config.getGranularitySpec() == null || config.getGranularitySpec().getSegmentGranularity() == null) { | ||
| // Determines segmentGranularity from the segmentsToCompact | ||
| // Each batch of segmentToCompact from CompactionSegmentIterator will contains a single time chunk | ||
| boolean allSegmentsOverlapped = true; |
Contributor
Author
There was a problem hiding this comment.
all segments have same interval -> no need to do union
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Support overlapping segment intervals in auto compaction
Description
This PR fixes two problems when Druid compact overlapping segment intervals via auto compaction.
Imagine we have a segment with interval 2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z (MONTH segmentGranularity) and another segment with interval 2016-06-27T00:00:00.000Z/2016-07-04T00:00:00.000Z (WEEK segmentGranularity).
CompactionSegmentIteratoronly return segment from a single time chunk bucket. For example, NewestSegmentFirstIterator would return the interval 2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z and submit a compaction task with the interval 2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z. However, the segment return from the iterator would only contains the MONTH segment and hence the sha256OfSortedSegmentIds calculated by auto compaction only contains the MONTH segment (2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z). This causes compaction task to fail when it starts running as the task would get all segments marked as used in the interval, which would be both the WEEK segment and MONTH segment, then compute the sha256 and compare it with the sha256 in the compaction spec. The sha256 would be different as the compaction task's sha256 only contains the MONTH segment. This issue is fixed by removing the sha256OfSortedSegmentIds from the compaction task spec created by auto compaction. sha256OfSortedSegmentIds was added in Use hash of Segment IDs instead of a list of explicit segments in auto compaction #8571 to enforce a limit on the number of segments in one compaction task. However, this is no longer necessary as compaction task can use parallel ingestion task.CompactionSegmentIterator. To fix this issue, the segmentGranularity to be used in compaction task is determined in auto compaction based on the segments returned by auto compaction'sCompactionSegmentIterator, thus ensuring that we preserve the same bucketing/chunking of segments.This PR has: