Skip to content

Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity #11009

Merged
maytasm merged 8 commits intoapache:masterfrom
maytasm:IMPLY-6411
Mar 20, 2021
Merged

Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity #11009
maytasm merged 8 commits intoapache:masterfrom
maytasm:IMPLY-6411

Conversation

@maytasm
Copy link
Copy Markdown
Contributor

@maytasm maytasm commented Mar 18, 2021

Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity

Description

Auto-compaction now supports setting SegmentGranularity. However, it currently submits compaction tasks unnecessary for segments that was compacted (lastCompactionState != null) but lastCompactionState.getGranularitySpec() == null (i.e. compacted when the auto compaction spec does not set SegmentGranularity) and the segments' SegmentGranularity is already the same as the SegmentGranularity configured in the auto compaction spec. This change will make auto compaction checks the existing segments' SegmentGranularity and determine if compaction is needed or not even when SegmentGranularity is not stored in lastCompactionState.

Note that, with this change, we actually never need to check SegmentGranularity in lastCompactionState to determine if segments need compaction or not. However, I still keep that check there for the following reasons:

  1. If lastCompactionState has SegmentGranularity, then it is faster to check and compare using that.
  2. If we add support for queryGranularity, we will need to check it in lastCompactionState

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

maytasm added 3 commits March 17, 2021 18:07
…ready have the configured segmentGranularity
…ready have the configured segmentGranularity
…ready have the configured segmentGranularity
}
} else if (!config.getGranularitySpec().getSegmentGranularity().equals(existingSegmentGranularity)) {
log.info(
"Configured granularitySpec[%s] is different from the one[%s] of segments. Needs compaction",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe you can just log the segment granularity instead of the granularity spec?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.map(segment -> GranularityType.fromPeriod(segment.getInterval().toPeriod()).getDefaultGranularity())
.collect(Collectors.toSet());
if (segmentGranularities.size() != 1 || !segmentGranularities.contains(config.getGranularitySpec().getSegmentGranularity())) {
needsCompaction = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a log statement here as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// We need to check if all segments have the same segment granularity and if it is the same
// as the configured segment granularity.
Set<Granularity> segmentGranularities = candidates.segments.stream()
.map(segment -> GranularityType.fromPeriod(segment.getInterval().toPeriod()).getDefaultGranularity())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GranularityType.fromPeriod() can throw an exception if the segment interval is not one of the enums in GranularityType. I think we should probably compare the duration of the granularity as well as their timezone and origin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just catch the exception and set needsCompaction to true in that case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that means it must not be the same as the configured segmentGranularity in the auto compaction spec. Since it if it is the same, then it must be resolvable to a Granularity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I think this should work

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please address my comment before merging this PR. The CI seems flaky, I restarted failed jobs.

// Candidate segments were all compacted without segment granularity set.
// We need to check if all segments have the same segment granularity as the configured segment granularity.
needsCompaction = candidates.segments.stream()
.anyMatch(segment -> !config.getGranularitySpec().getSegmentGranularity().isAligned(segment.getInterval()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to print a log in this case too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I accidentally removed the logging message. Added it back in.

@maytasm maytasm merged commit 51d2c61 into apache:master Mar 20, 2021
@maytasm maytasm deleted the IMPLY-6411 branch March 20, 2021 00:38
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants