Skip to content

Optimize unused segment query for segment allocation#16623

Merged
cryptoe merged 4 commits intoapache:masterfrom
AmatyaAvadhanula:optimize_unused_segment_query
Jun 18, 2024
Merged

Optimize unused segment query for segment allocation#16623
cryptoe merged 4 commits intoapache:masterfrom
AmatyaAvadhanula:optimize_unused_segment_query

Conversation

@AmatyaAvadhanula
Copy link
Copy Markdown
Contributor

#16380 utilized an existing metadata query to fetch unused segments for a given datasource, interval and version but this appeared to take a long time despite the indexes, and could have potential overlord stability implications.

This PR optimizes the query by using an equality check on the interval start and end as it is a special case for segment allocation, instead of using the OVERLAPS or CONTAINS match modes.

On a cluster with 1.8M unused segments for a given datasource, the query which relied on the existing method took over 30s on average, while the new query takes less than a second.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula requested a review from kfaraz June 18, 2024 09:56
@AmatyaAvadhanula AmatyaAvadhanula requested a review from kfaraz June 18, 2024 11:07
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, left a few non-blocking suggestions.

Comment on lines +285 to +286
log.debug("Found [%,d] unused segments for datasource[%s] for interval[%s] and version[%s].",
matchingSegments.size(), dataSource, interval, version);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
log.debug("Found [%,d] unused segments for datasource[%s] for interval[%s] and version[%s].",
matchingSegments.size(), dataSource, interval, version);
log.debug(
"Found [%,d] unused segments for datasource[%s], interval[%s] and version[%s].",
matchingSegments.size(), dataSource, interval, version
);

new NumberedShardSpec(0, 0)
);
DataSegment unusedSegmentForDifferentInterval = createSegment(
Intervals.of("2023/2024"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a disjoint interval, a better test would be to verify that a segment in an overlapping (but not identical) interval is not returned.

@Test
public void testRetrieveUnusedSegmentsForExactIntervalAndVersion() throws Exception
{
DataSegment unusedForDifferentVersion = createSegment(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataSegment unusedForDifferentVersion = createSegment(
final DataSegment unusedSegmentMay2024V0 = createSegment(

public void testRetrieveUnusedSegmentsForExactIntervalAndVersion() throws Exception
{
DataSegment unusedForDifferentVersion = createSegment(
Intervals.of("2024/2025"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use an interval which is easier to use in a name. You may even assign this interval value to a field named Interval may2024 so that you can reuse it in multiple places.

Suggested change
Intervals.of("2024/2025"),
Intervals.of("2024-05/P1M"),

"v0",
new NumberedShardSpec(0, 0)
);
DataSegment unusedSegmentForExactIntervalAndVersion = createSegment(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataSegment unusedSegmentForExactIntervalAndVersion = createSegment(
final DataSegment unusedSegmentMay2024V1 = createSegment(

new NumberedShardSpec(0, 0)
);
DataSegment unusedSegmentForExactIntervalAndVersion = createSegment(
Intervals.of("2024/2025"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Intervals.of("2024/2025"),
Intervals.of("2024-05/P1M"),

"v1",
new NumberedShardSpec(0, 0)
);
DataSegment unusedSegmentForDifferentInterval = createSegment(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataSegment unusedSegmentForDifferentInterval = createSegment(
final DataSegment unusedSegmentYear2024V1 = createSegment(

new NumberedShardSpec(0, 0)
);
DataSegment unusedSegmentForDifferentInterval = createSegment(
Intervals.of("2023/2024"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Intervals.of("2023/2024"),
Intervals.of("2024/P1Y"),

);
coordinator.markSegmentsAsUnusedWithinInterval(DS.WIKI, Intervals.ETERNITY);

DataSegment usedSegmentForExactIntervalAndVersion = createSegment(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataSegment usedSegmentForExactIntervalAndVersion = createSegment(
final DataSegment usedSegmentMay2024V1 = createSegment(

coordinator.markSegmentsAsUnusedWithinInterval(DS.WIKI, Intervals.ETERNITY);

DataSegment usedSegmentForExactIntervalAndVersion = createSegment(
Intervals.of("2024/2025"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Intervals.of("2024/2025"),
Intervals.of("2024-05/P1M"),

@cryptoe cryptoe merged commit be3593f into apache:master Jun 18, 2024
@kfaraz kfaraz deleted the optimize_unused_segment_query branch June 18, 2024 15:46
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants