Make batched segment sampling the default, minor cleanup of coordinator config#13391
Make batched segment sampling the default, minor cleanup of coordinator config#13391kfaraz merged 6 commits intoapache:masterfrom
Conversation
| @JsonProperty("maxSegmentsToMove") int maxSegmentsToMove, | ||
| @Deprecated @JsonProperty("percentOfSegmentsToConsiderPerMove") @Nullable Double percentOfSegmentsToConsiderPerMove, | ||
| @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler, | ||
| @Deprecated @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler, |
There was a problem hiding this comment.
is setting the default value of useBatchedSegmentSampler to true missing?
There was a problem hiding this comment.
Thanks for catching this, @rohangarg ! It was in a different patch.
Updated to use the default of true both when
- deserializing using the constructor. This would mean that configs already stored in the DB with no explicit value of
useBatchedSegmentSamplerwould now start usingtrue. - creating/deserializing using the Builder. So newly created/updated configs would now use
true.
Also updated tests to verify this.
| public void testRunWithNoIntervalShouldNotKillAnySegments() | ||
| { | ||
| @Test | ||
| public void testFindIntervalForKill() |
There was a problem hiding this comment.
This test essentially just verifies the JodaUtils.umbrellaInterval() method and these cases are already being verified in JodaUtilsTest.
The required test cases from here have been merged into the other tests in this class.
| @JsonProperty("maxSegmentsToMove") int maxSegmentsToMove, | ||
| @Deprecated @JsonProperty("percentOfSegmentsToConsiderPerMove") @Nullable Double percentOfSegmentsToConsiderPerMove, | ||
| @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler, | ||
| @Deprecated @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler, |
There was a problem hiding this comment.
Thanks for catching this, @rohangarg ! It was in a different patch.
Updated to use the default of true both when
- deserializing using the constructor. This would mean that configs already stored in the DB with no explicit value of
useBatchedSegmentSamplerwould now start usingtrue. - creating/deserializing using the Builder. So newly created/updated configs would now use
true.
Also updated tests to verify this.
|
I think the default value for |
AmatyaAvadhanula
left a comment
There was a problem hiding this comment.
Thanks for the changes @kfaraz.
rohangarg
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the changes
|
Thanks for the reviews, @rohangarg , @AmatyaAvadhanula ! |
Description
The batch segment sampling added in #11257 performs significantly better than the older method of sampling if there are a large number of used segments. It also avoid duplicate results in sampling.
Changes
useBatchedSegmentSamplerdruid.coordinator.loadqueuepeon.repeatDelayKillUnusedSegmentsKillUnusedSegmentsTest, add better tests, remove redundant testsRelease note
Batch sampling has been made the default method for sampling segments during balancing as it performs significantly better than the alternative when there is a large number of used segments in the cluster.
The following have been deprecated and will be removed in future releases:
useBatchedSegmentSamplerpercentOfSegmentsToConsiderPerMoveBalanceSegmentsThe unused coordinator property
druid.coordinator.loadqueuepeon.repeatDelayhas been removed.Use only
druid.coordinator.loadqueuepeon.http.repeatDelayto configure repeat delay for the HTTP-based segment loading queue.This PR has: