Skip to content

Optimize used segment fetching in Kill tasks#15107

Merged
AmatyaAvadhanula merged 6 commits intoapache:masterfrom
AmatyaAvadhanula:optimize_killtask_segmentfetch
Oct 9, 2023
Merged

Optimize used segment fetching in Kill tasks#15107
AmatyaAvadhanula merged 6 commits intoapache:masterfrom
AmatyaAvadhanula:optimize_killtask_segmentfetch

Conversation

@AmatyaAvadhanula
Copy link
Copy Markdown
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Oct 7, 2023

#14407 - introduced a change in the behaviour of kill tasks to fetch used specs to prevent used load specs from being killed.

However there could be a significant overhead when this is done for each batch. This PR aims to optimize it.

The preliminary approach taken in this PR is to only fetch those used segments belonging to the intervals corresponding to the unused segments in a given batch.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

final Set<Map<String, Object>> usedSegmentLoadSpecs = toolbox
.getTaskActionClient()
.submit(new RetrieveUsedSegmentsAction(getDataSource(), getInterval(), null, Segments.INCLUDING_OVERSHADOWED))
.submit(new RetrieveUsedSegmentsAction(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format: would be easier to read if the unused intervals were pre-constructed and the constructor was moved to a new line.

Suggested change
.submit(new RetrieveUsedSegmentsAction(
.submit(
new RetrieveUsedSegmentsAction(getDataSource(), null, unusedIntervals, Segments.INCLUDING_OVERSHADOWED)
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review October 8, 2023 05:12
Comment on lines +248 to +249
.filter(unusedSegment -> !usedSegmentLoadSpecs.contains(unusedSegment.getLoadSpec())
|| unusedSegment.getLoadSpec() == null)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unusedSegment.getLoadSpec() == null should be checked first IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JsonCreator
public Stats(
@JsonProperty("numSegmentsKilled") int numSegmentsKilled,
@JsonProperty("numSegmentsKilledInDeepStorage") int numSegmentsKilledInDeepStorage,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this property too? This is a user-facing property.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it to test the changes. I can remove it if it is unnecessary in the report

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove it please. we will add it later if its needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmatyaAvadhanula AmatyaAvadhanula merged commit 40a6dc4 into apache:master Oct 9, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
* Optimize used segment fetching in Kill tasks
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
* Optimize used segment fetching in Kill tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants