Improve rolling Supervisor restarts at taskDuration#15859
Merged
suneet-s merged 1 commit intoapache:masterfrom Feb 14, 2024
Merged
Improve rolling Supervisor restarts at taskDuration#15859suneet-s merged 1 commit intoapache:masterfrom
suneet-s merged 1 commit intoapache:masterfrom
Conversation
|
|
||
| EasyMock.reset(spec); | ||
| EasyMock.expect(spec.isSuspended()).andReturn(false).anyTimes(); | ||
| EasyMock.expect(spec.getDataSchema()).andReturn(getDataSchema()).anyTimes(); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
YongGang
commented
Feb 14, 2024
| AtomicInteger stoppedTasks = new AtomicInteger(); | ||
| // Sort task groups by start time to prioritize early termination of earlier groups, then iterate for processing | ||
| activelyReadingTaskGroups | ||
| .entrySet().stream().sorted( |
Contributor
Author
There was a problem hiding this comment.
For the Concurrency check: the stream change provide the same level of Concurrency guaranteed as the previous for loop.
suneet-s
reviewed
Feb 14, 2024
| @Nullable | ||
| @JsonProperty | ||
| public int getStopTaskCount() | ||
| public Integer getStopTaskCount() |
Contributor
There was a problem hiding this comment.
nit: Would be nice to add a javadoc here telling devs to use #getMaxAllowedStops instead.
suneet-s
approved these changes
Feb 14, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR builds on the foundation set by #14396 which introduced rolling supervisor restarts as a strategy to potentially reduce the number of task slots required for task handover in Druid. The primary goal of this PR is to refine the behavior and utilization of this feature by addressing two specific areas:
stopTaskCountdefaulted to the value oftaskCountif not explicitly specified in the Supervisor spec. This implicit behavior led to inconsistency whentaskCountwas manually increased, asstopTaskCountdid not automatically adjust to reflect the new taskCount. This PR eliminates this implicit defaulting. Now, ifstopTaskCountis not set, it remains null and is preserved as such, requiring explicit configuration to activate rolling restarts.stopTaskCountis configured to a low value (e.g., 1), it could lead to prolonged cycling and stopping of active tasks, especially for tasks that started earlier. To address this, the PR introduces logic to sort active running tasks by their start date, prioritizing the stopping of older tasks first. This approach aims to prevent early-started tasks from running excessively long, improving the overall efficiency of task cycling.Key changed/added classes in this PR
SeekableStreamSupervisorupdated to sort task groups by start time to prioritize early termination of earlier groupsSeekableStreamSupervisorIOConfigremoved the automatic defaulting ofstopTaskCounttotaskCount, ensuring thatstopTaskCountmust be explicitly defined to influence the rolling restart behavior.This PR has: