Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move.#10284
Conversation
add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This config caps the number of segments that are iterated over when selecting a segment to move. The default value combined with current balancing strategies will still iterate over all provided segments. However, setting this value to something > 0 will cap the number of segments visited. This could make sense in cases where a cluster has a very large number of segments and the admins prefer less iterations vs a thorough consideration of all segments provided.
|
I recently became a committer, but haven't gotten repo access setup yet (If another committer reads this and is willing to guide me on this process, I'm all ears!) so I cannot complete all of the committer steps. For labels, I would add:
|
|
Would it make sense to perform the segment limiting for each |
If the Perhaps what we really want here is a way to configure the balancer to use one of many(?) implementations of "pick a segment from a list of
I had not considered this initially, but it is a good idea. I'd be okay with going this route if it is trivial to calculate the number of segments in the cluster including replication. I'll have to look at that. |
Thanks for clarifying. My suggestion was basically ignoring the fact that the serverholder list is sorted by available space. In case we don't proceed to go the percentage route in this PR, it may be helpful to provide some recommendation in the docs on what might be a good value to set this to percentage-wise. |
Hey @capistrant, have you finished the setup for GitBox? You can check it here: https://gitbox.apache.org/setup/. |
Sorry about accidentally editing your previous comment. Was not paying attention to what I was doing. I reverted the changes I made. On to what I was trying to say: Finally circling back to this now that I have time... I decided to just update the doc for now. I am still trying to determine if the overhead to do percentages would require compute overhead that defeats the purpose of this. Looking at the code as I type this update.. tl;dr I am going to implement the percentages approach instead of the raw number. The compute overhead to calculate the number of segments to iterate seems negligible compared to what we will save in the cases where this will be used (large clusters). |
|
This pull request introduces 1 alert when merging b3f680d into 3fc8bc0 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging a8a7c42 into 176b715 - view on LGTM.com new alerts:
|
|
@a2l007 checking in to see if you will have some time to review this PR now that I have changed the implementation to use a percentage approach vs raw number. |
|
Yeah sure, should be able to review it soon. |
a2l007
left a comment
There was a problem hiding this comment.
Overall changes LGTM
Can we please add this property to the Coordinator Dynamic Config dialog box in the web console as well?
| |`mergeBytesLimit`|The maximum total uncompressed size in bytes of segments to merge.|524288000L| | ||
| |`mergeSegmentsLimit`|The maximum number of segments that can be in a single [append task](../ingestion/tasks.md).|100| | ||
| |`maxSegmentsToMove`|The maximum number of segments that can be moved at any given time.|5| | ||
| |`percentOfSegmentsToConsiderPerMove`|The percentage of the total number of segments in the cluster that are considered every time a segment needs to be selected for a move. Druid orders servers by available capacity ascending (the least available capacity first) and then iterates over the servers. For each server, Druid iterates over the segments on the server, considering them for moving. The default config of 100% means that every segment on every server is a candidate to be moved. This should make sense for most small to medium-sized clusters. However, an admin may find it preferable to drop this value lower if they don't think that it is worthwhile to consider every single segment in the cluster each time it is looking for a segment to move.| |
There was a problem hiding this comment.
Could you please add the default value for this property into the Default column?
There was a problem hiding this comment.
Also it would be helpful to provide some sort of recommended percentage value that operators with large clusters can use as a starting point for this property. We could either add it here or in the Basic Cluster Tuning doc.
| * @param percentOfSegmentsToConsider The % of total cluster segments to consider before short-circuiting and | ||
| * returning immediately. | ||
| * @return | ||
| */ |
Good call out. I will add that. Added some more docs for helping guide when and how to set this value. |
|
Could you please fix the spellcheck and console test errors as well? |
|
@capistrant Can we write integration tests for this dynamic config? Since it's not the default and used infrequently, I would hate for a refactoring to break this functionality and us not know about it |
…bad inputs for % of segments to consider
Good point, @suneet-s. I like the idea of protecting this against regressions, but I'm wondering if there is the proper plumbing in place to make this easy to test as is. Wondering if you have any inputs/ideas on how it may be accomplished. We want to test the integration between the Coordinator's dynamic config and the act of balancing the cluster. Specifically, we want to ensure that our new config is being honored by Coordinator, meaning that balancing considers the expected % of segments when looking for segments to move.
The above section with strike through was me thinking of the wrong metric we are testing. we don't care about the number of segments moved. we care about the number of segments considered per move. back to the drawing board on ideas for this. Do you have any ideas that I may be overlooking? |
|
@suneet-s This ended up being a hard one to come up with an integration test for since since it is hard to analyze what the code is doing and whether or not it is honoring this config. However, my latest commit adds a new test to BalanceSegmentsTest where it validates that pickSegmentToMove is called using the non-default value that the test uses for the new dynamic config. This provides some protection against regressions. What do you think about it? |
|
@capistrant Sorry about not getting back to you on your last comment. If we set the percentage to a low enough value(1%) - could we verify that Druid never moves a segment because it never has any to consider. Not sure if this is possible, but I figured I'd throw this out there. The test you added to |
suneet-s
left a comment
There was a problem hiding this comment.
Just skimmed through the change and had some comments on the type of the parameter. Let me know what you think
| this.maxSegmentsToMove = maxSegmentsToMove; | ||
| // This helps with ease of migration to the new config, but could confuse users. Docs explicitly state this value must be > 0 | ||
| if (percentOfSegmentsToConsiderPerMove <= 0 || percentOfSegmentsToConsiderPerMove > 100) { | ||
| this.percentOfSegmentsToConsiderPerMove = 100; |
There was a problem hiding this comment.
Since this is a json configured property, I think we should throw an exception here so the user knows they've done something incorrect.
EDIT: Something like
Preconditions.checkArgument(percentOfSegmentsToConsiderPerMove > 0 && percentOfSegmentsToConsiderPerMove <= 100, "percentOfSegmentsToConsiderPerMove should be between 0 and 100!");
There was a problem hiding this comment.
I agree, it's confusing to mask an error like I was. fixed
| @JsonProperty("mergeBytesLimit") long mergeBytesLimit, | ||
| @JsonProperty("mergeSegmentsLimit") int mergeSegmentsLimit, | ||
| @JsonProperty("maxSegmentsToMove") int maxSegmentsToMove, | ||
| @JsonProperty("percentOfSegmentsToConsiderPerMove") int percentOfSegmentsToConsiderPerMove, |
There was a problem hiding this comment.
Should this be a double since we're dealing with percentages? I think it would make it less likely for someone in the future to make some math mistake by accidentally dividing by an integer.
There was a problem hiding this comment.
ya, this is a good point. using double up front makes things a lot more straightforward
| this.mergeBytesLimit = mergeBytesLimit; | ||
| this.mergeSegmentsLimit = mergeSegmentsLimit; | ||
| this.maxSegmentsToMove = maxSegmentsToMove; | ||
| this.percentOfSegmentsToConsiderPerMove = percentOfSegmentsToConsiderPerMove; |
There was a problem hiding this comment.
Should this have a PreCondition check that it falls between 0 - 100 ?
There was a problem hiding this comment.
hmm, Since this is a Builder class that builds a CoordinatorDynamicConfig object, I think this precondition is covered by the actual constructor for the CoordinatorDynamicConfig class. IMO, having another precondition here is redundant
| // Reset a bad value of percentOfSegmentsToConsider to 100. We don't allow consideration less than or equal to | ||
| // 0% of segments or greater than 100% of segments. | ||
| if (percentOfSegmentsToConsider <= 0 || percentOfSegmentsToConsider > 100) { | ||
| log.debug("Resetting percentOfSegmentsToConsider to 100 because only values from 1 to 100 are allowed." |
There was a problem hiding this comment.
IMO this should be at least a WARN since it should be impossible that percentOfSegmentsToConsider should have already been checked earlier in the system
There was a problem hiding this comment.
I'm cool with this being warn
…ents are considered when picking a segment to move. (apache#10284) * dynamic coord config adding more balancing control add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This config caps the number of segments that are iterated over when selecting a segment to move. The default value combined with current balancing strategies will still iterate over all provided segments. However, setting this value to something > 0 will cap the number of segments visited. This could make sense in cases where a cluster has a very large number of segments and the admins prefer less iterations vs a thorough consideration of all segments provided. * fix checkstyle failure * Make doc more detailed for admin to understand when/why to use new config * refactor PR to use a % of segments instead of raw number * update the docs * remove bad doc line * fix typo in name of new dynamic config * update RservoirSegmentSampler to gracefully deal with values > 100% * add handler for <= 0 in ReservoirSegmentSampler * fixup CoordinatorDynamicConfigTest naming and argument ordering * fix items in docs after spellcheck flags * Fix lgtm flag on missing space in string literal * improve documentation for new config * Add default value to config docs and add advice in cluster tuning doc * Add percentOfSegmentsToConsiderPerMove to web console coord config dialog * update jest snapshot after console change * fix spell checker errors * Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider * add new config back to web console module after merge with master * fix ReservoirSegmentSamplerTest * fix line breaks in coordinator console dialog * Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove * Make improvements based off of feedback in review * additional cleanup coming from review * Add a warning log if limit on segments to consider for move can't be calcluated * remove unused import * fix tests for CoordinatorDynamicConfig * remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
Release Notes
A new Coordinator Dynamic Config,
percentOfSegmentsToConsiderPerMove, has been added. This configuration specifies the percent of served segments that will be considered when picking a segment to potentially move. The coordinator uses this value and the number of currently served segments to calculate the number of segments that will be candidates. It will not consider any segments beyond that calculated number when picking a segment to move. The default value is 100, meaning the coordinator will consider all segments every time it is picking a segment to move. This default leads to the same behavior that existed prior to this configuration being added.Description
A large cluster with many segments results in a lot of work being done by the Coordinator in order to complete its duties. I believe that any optimization to coordinator duties can help in a large cluster. This patch gives an experienced admin a knob to turn in order to try and shave some time off of the balance segments duty. As of now, existing Balancer Strategies iterate over all of the segments in the cluster when choosing a segment to move. The first segment candidate is the most likely to be moved and the last segment candidate is the least likely to move. This patch gives an admin the ability to put a limit on the number of segments that will be candidates to be moved. For most cases, I don't think this knob will be needed, but in some large enterprise cases I feel that it could be beneficial.
I updated the BalancerStrategy Interface. The pickSegmentToMove method gained a 3rd parameter that specifies the number of segments that should be considered when picking a segment to move.
CostBalancerStrategy(and it's inheriting classes) andRandomBalancerStrategyboth leverageReservoirSegmentSamplerto choose a segment "at random" from a list of candidate servers. I updated the required method inReservoirSegmentSamplerto adhere to the limiting parameter described above. If the limit is reached, the method picking a segment will break out of its iteration and return immediately.Currently all code paths use a new dynamic coordinator config that an admin can tune if they'd like to put a limiter on this action of picking a segment to move. The default value for the config is such that all segments will be iterated and be candidates to pick. I thought a dynamic config was good because it is flexible and could be leveraged in times such as if you wanted to temporarily boost up the number of segments to move in order rebalance to new servers faster. If doing that, and you also wanted to make this go quicker by not bothering with having so many potential segments to be picked to move.
The new dynamic config is
maxSegmentsToConsiderPerMovewith a default ofInteger.MAX_VALUEI call out in the documentation that an admin should be experienced when considering altering this config. I say that because in many cases, the default is fine.
An alternative of this approach would be to restrict what is sent to
pickSegmentToMovein the first place. I choose not to approach this at this time because I didn't like the idea of either choosing the number ofServerHoldersto send topickSegmentsToMoveor to analyze theServerHoldersbefore picking how many to send to ensure only a certain number of segments are sent. I'd be open to re-assessing whether or not this would be a better approach or not if someone suggests it may be the proper approach.This PR has:
Key changed/added classes in this PR
BalancerStrategyInterfaceCostBalancerStrategyRandomBalancerStrategyReservoirSegmentSamplerCoordinatorDynamicConfig