KAFKA-13764: Improve balancing algorithm for Connect incremental rebalancing#12019
KAFKA-13764: Improve balancing algorithm for Connect incremental rebalancing#12019C0urante wants to merge 5 commits intoapache:trunkfrom
Conversation
…gnment to eliminate mocking and simplify parameters
…nments to simplify parameters
… around tests for detecting unexpected rebalances, introduce failing test due to lack of consecutive revocations
|
Apologies @showuon, this does not actually (fully) address KAFKA-12495. Several of the test cases here relied on rebalances being triggered in circumstances that they would not normally be triggered under, which caused the issue with not performing consecutive revocations to be masked. I've added a new failing test case that's very similar to the one in #10367 but which fails with all functional and testing framework changes I've made so far in this pull request. At this point I don't see too many alternatives to permitting consecutive revocations, but here are a few that come to mind when the cluster is imbalanced but a revocation took place during the last round:
These (and the strategy of permitting consecutive revocations) all fall under the umbrella of https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-ChangestoConnect'sRebalancingProcess, which outlines this strategy:
(Emphasis mine) Overall I think permitting consecutive revocations is the safest and most intuitive option here, and with the new testing logic in this PR (especially the new |
…/distributed/IncrementalCooperativeAssignor.java Co-authored-by: YEONCHEOL JANG <65603611+YeonCheolGit@users.noreply.github.com>
|
@C0urante |
|
Hi @YeonCheolGit! There's always more PRs to review than there are reviewers (especially with Connect), so feel free to give this (or probably #11983, which this PR depends on) a review if you'd like to help. |
| } | ||
| return performTaskAssignment(leaderId, leaderOffset, memberConfigs, coordinator, protocolVersion); | ||
| Map<String, ByteBuffer> result = serializeAssignments(assignments); | ||
| log.debug("Finished assignment"); |
There was a problem hiding this comment.
This works with Map<String, ExtendedAssignment> assignment's'.
So maybe this?
| log.debug("Finished assignment"); | |
| log.debug("Finished assignments"); |
| log.debug("Complete (ignoring deletions) connector assignments: {}", connectorAssignments); | ||
| // The connectors and tasks that should already be running on the cluster, but which are not included | ||
| // in the assignment reported by any workers in the cluster | ||
| final ConnectorsAndTasks lostAssignments = ConnectorsAndTasks.diff(previousAssignment, activeAssignments, deleted); |
There was a problem hiding this comment.
Could you explain what lost assignment meaning is?
As far as i know ConnectorsAndTasks.diff returns remainder after subtracted assignments.
| while (load.hasNext()) { | ||
| int firstLoad = first.tasksSize(); | ||
| int upTo = IntStream.range(0, workerAssignment.size()) | ||
| .filter(i -> workerAssignment.get(i).tasksSize() > firstLoad) | ||
| int firstLoad = allocationSize.apply(first); | ||
| int upTo = IntStream.range(0, workers.size()) |
There was a problem hiding this comment.
This is minor suggestion and could be ignored.
If calculate workers.size in while loop it has to be calculated all the time while it is true.
What about calculating size one time then use it many times.
final int workersSize = workers.size();
IntStream.range(0, workersSize)
| * <li>The allocation of connectors and tasks across the cluster were as balanced as possible (i.e., the difference in allocation size between any two workers is at most one)</li> | ||
| * <li>Any workers that left the group within the scheduled rebalance delay permanently left the group</li> | ||
| * <li>All currently-configured connectors and tasks were allocated (including instances that may be revoked in this round because they are duplicated across workers)</li> | ||
| * </ul> |
There was a problem hiding this comment.
Maybe this is silly question and i don't know this things much.
Is there any reason to put HTML tag for comments?
| } | ||
|
|
||
| return revoking; | ||
| private int calculateDelay(long now) { |
There was a problem hiding this comment.
I could see using many final parameters in Kafka code. So this just for code convention and safety.
| private int calculateDelay(long now) { | |
| private int calculateDelay(final long now) { |
|
Converting this to a draft since I haven't had time to prioritize it (sorry @YeonCheolGit!) and the changes here are not safe to merge as-are. |
No worries @C0urante! All good and thanks for letting me know this:) |
Jira
Depends on #11983
The primary goal of this PR is to address several outstanding issues with incremental rebalancing that lead to stable-but-unbalanced clusters. However, other small bug fixes are also applied, and some liberty is taken with refactoring to improve readability and flexibility in the code base.
This should also address KAFKA-12495, and includes an adapted test case from #10367, which addresses that issue but with a different approach.High-level changes:
ConnectorsAndTasksclass into its own file, enrich it and its builder class with developer-friendly methods, make its contents completely immutable, and useSetinstead of genericCollectioninstances to store connectors and tasksIncrementalCooperativeAssignor::assignConnectorsandIncrementalCooperativeAssignor::assignTasks, for example) and abstract it into a single reusable methodfinalkeyword for base and derived sets inIncrementalCooperativeAssignor::performTaskAssignment(tracking mutations across a 100+ line method is difficult)IncrementalCooperativeAssignorto place static utility methods together at the bottom of the classprotectedto package-private (protectedimplies that the field/method is intended for use by subclasses, which is not the case for any of these)DEBUG-level log messages toTRACECommitter Checklist (excluded from commit message)