Skip to content

Make batched segment sampling the default, minor cleanup of coordinator config#13391

Merged
kfaraz merged 6 commits intoapache:masterfrom
kfaraz:cleanup_coord_config
Nov 21, 2022
Merged

Make batched segment sampling the default, minor cleanup of coordinator config#13391
kfaraz merged 6 commits intoapache:masterfrom
kfaraz:cleanup_coord_config

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Nov 20, 2022

Description

The batch segment sampling added in #11257 performs significantly better than the older method of sampling if there are a large number of used segments. It also avoid duplicate results in sampling.

Changes

  • Make batch segment sampling the default
  • Deprecate the property useBatchedSegmentSampler
  • Remove unused coordinator config druid.coordinator.loadqueuepeon.repeatDelay
  • Cleanup KillUnusedSegments
  • Simplify KillUnusedSegmentsTest, add better tests, remove redundant tests

Release note

Batch sampling has been made the default method for sampling segments during balancing as it performs significantly better than the alternative when there is a large number of used segments in the cluster.

The following have been deprecated and will be removed in future releases:

  • coordinator dynamic config useBatchedSegmentSampler
  • coordinator dynamic config percentOfSegmentsToConsiderPerMove
  • non-batch method of sampling segments used during coordinator duty BalanceSegments

The unused coordinator property druid.coordinator.loadqueuepeon.repeatDelay has been removed.
Use only druid.coordinator.loadqueuepeon.http.repeatDelay to configure repeat delay for the HTTP-based segment loading queue.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@JsonProperty("maxSegmentsToMove") int maxSegmentsToMove,
@Deprecated @JsonProperty("percentOfSegmentsToConsiderPerMove") @Nullable Double percentOfSegmentsToConsiderPerMove,
@JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
@Deprecated @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is setting the default value of useBatchedSegmentSampler to true missing?

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, @rohangarg ! It was in a different patch.
Updated to use the default of true both when

  • deserializing using the constructor. This would mean that configs already stored in the DB with no explicit value of useBatchedSegmentSampler would now start using true.
  • creating/deserializing using the Builder. So newly created/updated configs would now use true.

Also updated tests to verify this.

public void testRunWithNoIntervalShouldNotKillAnySegments()
{
@Test
public void testFindIntervalForKill()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test essentially just verifies the JodaUtils.umbrellaInterval() method and these cases are already being verified in JodaUtilsTest.
The required test cases from here have been merged into the other tests in this class.

@JsonProperty("maxSegmentsToMove") int maxSegmentsToMove,
@Deprecated @JsonProperty("percentOfSegmentsToConsiderPerMove") @Nullable Double percentOfSegmentsToConsiderPerMove,
@JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
@Deprecated @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, @rohangarg ! It was in a different patch.
Updated to use the default of true both when

  • deserializing using the constructor. This would mean that configs already stored in the DB with no explicit value of useBatchedSegmentSampler would now start using true.
  • creating/deserializing using the Builder. So newly created/updated configs would now use true.

Also updated tests to verify this.

@kfaraz kfaraz added this to the 25.0 milestone Nov 21, 2022
@kfaraz kfaraz requested a review from rohangarg November 21, 2022 04:00
@AmatyaAvadhanula
Copy link
Copy Markdown
Contributor

I think the default value for useBatchedSegmentSampler must also be changed in coordinator-dynamic-config.tsx

Copy link
Copy Markdown
Contributor

@AmatyaAvadhanula AmatyaAvadhanula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @kfaraz.

Copy link
Copy Markdown
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the changes

@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Nov 21, 2022

Thanks for the reviews, @rohangarg , @AmatyaAvadhanula !

@kfaraz kfaraz changed the title Make reservoir segment sampling the default, minor cleanup of coordinator config Make batched segment sampling the default, minor cleanup of coordinator config Nov 21, 2022
@kfaraz kfaraz merged commit 133054b into apache:master Nov 21, 2022
@kfaraz kfaraz deleted the cleanup_coord_config branch May 3, 2023 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants