Coordinator Dynamic Config changes to ease upgrading with new config value#10724
Coordinator Dynamic Config changes to ease upgrading with new config value#10724suneet-s merged 7 commits intoapache:masterfrom
Conversation
|
@suneet-s I have added you as reviewer since you were reviewer on initial PR. |
suneet-s
left a comment
There was a problem hiding this comment.
@capistrant good catch! Let me know if you think it should remain a warn log instead of debug, otherwise LGTM
| this.maxSegmentsToMove = maxSegmentsToMove; | ||
|
|
||
| if (percentOfSegmentsToConsiderPerMove == null) { | ||
| log.warn("percentOfSegmentsToConsiderPerMove was null! This is likely because your metastore does not " |
There was a problem hiding this comment.
I don't think this should be a warn log message. A user should not be expected to set an optional config. If the config is not set, it seems reasonable to me that the default is set to preserve the old behavior (ie all segments are considered per move).
For users who want to use the new behavior, I suspect they will hear about it in the release notes or the docs, and update the config to get the new functionality.
| log.warn("percentOfSegmentsToConsiderPerMove was null! This is likely because your metastore does not " | |
| log.debug("percentOfSegmentsToConsiderPerMove was null! This is likely because your metastore does not " |
There was a problem hiding this comment.
ya, I guess this is fine as a debug. It will actually automatically get set to the default (or the user specified value) the next time a user submits a new dynamic config since that code path uses builder which will populate missing values. will update now
|
@capistrant It looks like CI is complaining about a missing file for a newly added (I think) integration test 🤔 I'm not sure how this PR could have caused that failure - https://travis-ci.com/github/apache/druid/jobs/468930258 (I tried re-triggering this job a couple times just incase) Could you maybe try merging master back into this PR to see if that will make Travis happy |
| private final int maxSegmentsInNodeLoadingQueue; | ||
| private final boolean pauseCoordination; | ||
|
|
||
| private static final EmittingLogger log = new EmittingLogger(CoordinatorDynamicConfig.class); |
There was a problem hiding this comment.
nit: I don't think this needs to be EmittingLogger since it isn't doing an alert anywhere
| @JsonProperty("mergeSegmentsLimit") int mergeSegmentsLimit, | ||
| @JsonProperty("maxSegmentsToMove") int maxSegmentsToMove, | ||
| @JsonProperty("percentOfSegmentsToConsiderPerMove") double percentOfSegmentsToConsiderPerMove, | ||
| @JsonProperty("percentOfSegmentsToConsiderPerMove") Double percentOfSegmentsToConsiderPerMove, |
There was a problem hiding this comment.
I guess the alternative way to fix this instead of switching out the primitive would be to treat 0 as not configured and so use the default of 100, though this way seems fine too, I think it just sticks out a bit because the rest of the numbery parameters are primitives.
also nit: should annotate this parameter with @Nullable
i missed the first PR, but any reason this is a double instead of integer similar to other percent based config decommissioningMaxPercentOfMaxSegmentsToMove? I guess there can be millions of segments so the numbers being dealt with here are a bit different (at least hopefully no one is trying to actually move on the scale of that many segments per coordinator run..) so maybe those decimal points really do make a difference, but otoh I can't really imagine operators configuring this much more granular than integer numbers, which I think was the reason we used integer for the other one iirc.
There was a problem hiding this comment.
This was because of my comment in the initial PR - #10284 (comment)
I was thinking about developer error in calculating the percentage
|
@capistrant the code changes LGTM, but could you add a unit test that verifies JSON deserialization of |
…value (apache#10724) * Coordinator Dynamic Config changes to ease upgrading with new config value * change a log to debug level following review * changes based on review feedback * fix checkstyle
Fixes #10723
Description
Fixed the bug introduced in #10723
The mentioned change introduced undesirable behavior during upgrade. The new config value
percentOfSegmentsToConsiderPerMoverequires a value of 1 - 100. Default is 100. However on upgrade the coordinator will load the dynamic config from metastore usingCoordinatorDynamicConfigfor deserialization. Since the config will not have existed pre-upgrade, it will be null for the constructor which violates the precondition check on the value. This would cause the code to load the Builder default of CoordinatorDynamicConfig at startup. I believe this is undesirable. Instead we want Druid to load the existing persisted config with the default for the new config while logging a warn to the operator telling them their persisted config is out of date.This PR has:
Key changed/added classes in this PR