KAFKA-13764: Improve balancing algorithm for Connect incremental rebalancing by C0urante · Pull Request #12019 · apache/kafka

C0urante · 2022-04-08T17:18:23Z

Depends on #11983

The primary goal of this PR is to address several outstanding issues with incremental rebalancing that lead to stable-but-unbalanced clusters. However, other small bug fixes are also applied, and some liberty is taken with refactoring to improve readability and flexibility in the code base.

~~This should also address KAFKA-12495, and includes an adapted test case from #10367, which addresses that issue but with a different approach.~~

High-level changes:

Refine the logic for load-balancing revocations:
- Perform load-balancing revocations any time the cluster appears imbalanced and there are still connectors and tasks that can be revoked from workers, instead of only when the number of workers in the cluster changes
- Remove the "rough estimation" logic and replace it with a precise calculation of exactly how to allocate all currently-configured connectors and tasks as evenly as possible across a cluster
- Account for load-balancing revocations when assigning new and lost connectors and tasks across the cluster
Improve code quality:
- Extract the ConnectorsAndTasks class into its own file, enrich it and its builder class with developer-friendly methods, make its contents completely immutable, and use Set instead of generic Collection instances to store connectors and tasks
- Where possible, identify logic that is shared for connectors and tasks (IncrementalCooperativeAssignor::assignConnectors and IncrementalCooperativeAssignor::assignTasks, for example) and abstract it into a single reusable method
- Use the final keyword for base and derived sets in IncrementalCooperativeAssignor::performTaskAssignment (tracking mutations across a 100+ line method is difficult)
- Reword unnecessary and confusing comments ("... is a derived set from the set difference of ..." is not very informative)
- Reorganize the grouping of methods in IncrementalCooperativeAssignor to place static utility methods together at the bottom of the class
- Demote visibility of testing-only methods and fields from protected to package-private (protected implies that the field/method is intended for use by subclasses, which is not the case for any of these)
Testing:
- Add several new tests to cover a variety of new cases, many of which result in imbalanced allocation with the current rebalancing logic, but which are all correctly handled with the improvements in this PR
- Add a few testing utility methods to help "hand wave" test cases without having to specify fine-grained expectations like how many rounds of rebalance are required to reach stability after some changes have been applied to the cluster
- Add coverage to all tests that ensures that no connectors or tasks are both revoked and assigned from the same worker, and that the leader's view of the complete assignment of connectors and tasks across the cluster appears to be correct after each rebalance
Miscellaneous:
- Demote a ton of noisy DEBUG-level log messages to TRACE

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…gnment to eliminate mocking and simplify parameters

…nments to simplify parameters

…in Connect

… around tests for detecting unexpected rebalances, introduce failing test due to lack of consecutive revocations

C0urante · 2022-04-11T02:45:14Z

Apologies @showuon, this does not actually (fully) address KAFKA-12495. Several of the test cases here relied on rebalances being triggered in circumstances that they would not normally be triggered under, which caused the issue with not performing consecutive revocations to be masked.

I've added a new failing test case that's very similar to the one in #10367 but which fails with all functional and testing framework changes I've made so far in this pull request.

At this point I don't see too many alternatives to permitting consecutive revocations, but here are a few that come to mind when the cluster is imbalanced but a revocation took place during the last round:

Have the leader send out an assignment with no revocations but also a delay of 1ms to trigger an immediate follow-up rebalance for revocations
Have every worker automatically rejoin the cluster whenever they receive an assignment that contains revoked connectors/tasks (like now) or newly-assigned connectors/tasks, until a rebalance takes place that doesn't change any worker's assigned connectors/tasks
Have the leader immediately trigger a new rebalance after completing this one by rejoining the group

These (and the strategy of permitting consecutive revocations) all fall under the umbrella of https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-ChangestoConnect'sRebalancingProcess, which outlines this strategy:

When a Worker is elected as Leader, it computes a new assignment, describing both assigned and revoked connectors and tasks (previously the Leader computed an assignment from scratch without defining revoked resources).

When a Worker receives its assignment, if this assignment includes any revoked connectors or tasks, it stops (releases) these resources and then immediately rejoins the group with an assignment that excludes revoked resources (previously, upon receipt of assignment, the Worker started the connectors and tasks and operated in the new generation of the group protocol until the next rebalance some time in the future).

Normally in the next assignment round, the Leader will assign resources according to its policy and there will be no revoked resources in any of the Workers. If that's not the case, the previous steps will be repeated until the group converges into an assignment without revocations.

(Emphasis mine)

Overall I think permitting consecutive revocations is the safest and most intuitive option here, and with the new testing logic in this PR (especially the new assertNoFurtherAssignments method), we get pretty extensive coverage to help ensure we don't get trapped in an infinite revocation loop.

…/distributed/IncrementalCooperativeAssignor.java Co-authored-by: YEONCHEOL JANG <65603611+YeonCheolGit@users.noreply.github.com>

yeoncheol-jang · 2022-04-23T10:09:38Z

@C0urante
Hello! I am just wondering that how this PR is going? Is threre anything that i can help with?

C0urante · 2022-04-24T15:59:40Z

Hi @YeonCheolGit! There's always more PRs to review than there are reviewers (especially with Connect), so feel free to give this (or probably #11983, which this PR depends on) a review if you'd like to help.

yeoncheol-jang · 2022-04-30T19:16:24Z

        }
-        return performTaskAssignment(leaderId, leaderOffset, memberConfigs, coordinator, protocolVersion);
+        Map<String, ByteBuffer> result = serializeAssignments(assignments);
+        log.debug("Finished assignment");


@C0urante

This works with Map<String, ExtendedAssignment> assignment's'.
So maybe this?

Suggested change

log.debug("Finished assignment");

log.debug("Finished assignments");

yeoncheol-jang · 2022-05-01T10:34:13Z

-        log.debug("Complete (ignoring deletions) connector assignments: {}", connectorAssignments);
+        // The connectors and tasks that should already be running on the cluster, but which are not included
+        // in the assignment reported by any workers in the cluster
+        final ConnectorsAndTasks lostAssignments = ConnectorsAndTasks.diff(previousAssignment, activeAssignments, deleted);


Could you explain what lost assignment meaning is?
As far as i know ConnectorsAndTasks.diff returns remainder after subtracted assignments.

yeoncheol-jang · 2022-05-01T15:45:57Z

        while (load.hasNext()) {
-            int firstLoad = first.tasksSize();
-            int upTo = IntStream.range(0, workerAssignment.size())
-                    .filter(i -> workerAssignment.get(i).tasksSize() > firstLoad)
+            int firstLoad = allocationSize.apply(first);
+            int upTo = IntStream.range(0, workers.size())


This is minor suggestion and could be ignored.
If calculate workers.size in while loop it has to be calculated all the time while it is true.
What about calculating size one time then use it many times.

final int workersSize = workers.size(); IntStream.range(0, workersSize)

yeoncheol-jang · 2022-05-01T15:57:50Z

+     *     <li>The allocation of connectors and tasks across the cluster were as balanced as possible (i.e., the difference in allocation size between any two workers is at most one)</li>
+     *     <li>Any workers that left the group within the scheduled rebalance delay permanently left the group</li>
+     *     <li>All currently-configured connectors and tasks were allocated (including instances that may be revoked in this round because they are duplicated across workers)</li>
+     * </ul>


Maybe this is silly question and i don't know this things much.
Is there any reason to put HTML tag for comments?

yeoncheol-jang · 2022-05-01T16:07:18Z

+    }

-        return revoking;
+    private int calculateDelay(long now) {


I could see using many final parameters in Kafka code. So this just for code convention and safety.

Suggested change

private int calculateDelay(long now) {

private int calculateDelay(final long now) {

C0urante · 2022-07-26T01:55:43Z

Converting this to a draft since I haven't had time to prioritize it (sorry @YeonCheolGit!) and the changes here are not safe to merge as-are.

yeoncheol-jang · 2022-07-30T14:23:15Z

Converting this to a draft since I haven't had time to prioritize it (sorry @YeonCheolGit!) and the changes here are not safe to merge as-are.

No worries @C0urante! All good and thanks for letting me know this:)

C0urante added 3 commits April 8, 2022 12:07

KAFKA-13763: Refactor IncrementalCooperativeAssignor::performTaskAssi…

508bfe1

…gnment to eliminate mocking and simplify parameters

KAFKA-13763: Refactor IncrementalCooperativeAssignor::handleLostAssig…

b45f73f

…nments to simplify parameters

KAFKA-13764: Improve balancing algorithm for incremental rebalancing …

7315d09

…in Connect

C0urante mentioned this pull request Apr 8, 2022

KAFKA-12495: allow consecutive revoke in incremental cooperative assignor in connector #10367

Closed

3 tasks

yeoncheol-jang reviewed Apr 10, 2022

View reviewed changes

Comment thread ...c/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java Outdated

KAFKA-13764: Improve testing coverage and flexibility, add guardrails…

ade18da

… around tests for detecting unexpected rebalances, introduce failing test due to lack of consecutive revocations

Update connect/runtime/src/main/java/org/apache/kafka/connect/runtime…

eaca9de

…/distributed/IncrementalCooperativeAssignor.java Co-authored-by: YEONCHEOL JANG <65603611+YeonCheolGit@users.noreply.github.com>

yeoncheol-jang reviewed Apr 30, 2022

View reviewed changes

yeoncheol-jang reviewed May 1, 2022

View reviewed changes

C0urante mentioned this pull request May 3, 2022

KAFKA-13763 (2): Refactor IncrementalCooperativeAssignor for improved unit testing #11983

Merged

3 tasks

C0urante added the connect label Jul 26, 2022

C0urante marked this pull request as draft July 26, 2022 01:55

C0urante closed this Aug 29, 2022

C0urante mentioned this pull request Sep 22, 2022

KAFKA-12495: Exponential backoff retry to prevent rebalance storms when worker joins after revoking rebalance #12561

Merged

C0urante mentioned this pull request Oct 23, 2022

KAFKA-12495: Improve rebalance allocation algorithm vamossagar12/kafka#1

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-13764: Improve balancing algorithm for Connect incremental rebalancing#12019

KAFKA-13764: Improve balancing algorithm for Connect incremental rebalancing#12019
C0urante wants to merge 5 commits intoapache:trunkfrom
C0urante:kafka-13764

C0urante commented Apr 8, 2022 •

edited

Loading

Uh oh!

Uh oh!

C0urante commented Apr 11, 2022

Uh oh!

yeoncheol-jang commented Apr 23, 2022

Uh oh!

C0urante commented Apr 24, 2022

Uh oh!

yeoncheol-jang Apr 30, 2022 •

edited

Loading

Uh oh!

yeoncheol-jang May 1, 2022

Uh oh!

yeoncheol-jang May 1, 2022 •

edited

Loading

Uh oh!

yeoncheol-jang May 1, 2022 •

edited

Loading

Uh oh!

yeoncheol-jang May 1, 2022

Uh oh!

C0urante commented Jul 26, 2022

Uh oh!

yeoncheol-jang commented Jul 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	log.debug("Finished assignment");
	log.debug("Finished assignments");

	private int calculateDelay(long now) {
	private int calculateDelay(final long now) {

Conversation

C0urante commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Uh oh!

C0urante commented Apr 11, 2022

Uh oh!

yeoncheol-jang commented Apr 23, 2022

Uh oh!

C0urante commented Apr 24, 2022

Uh oh!

yeoncheol-jang Apr 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeoncheol-jang May 1, 2022

Choose a reason for hiding this comment

Uh oh!

yeoncheol-jang May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeoncheol-jang May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeoncheol-jang May 1, 2022

Choose a reason for hiding this comment

Uh oh!

C0urante commented Jul 26, 2022

Uh oh!

yeoncheol-jang commented Jul 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

C0urante commented Apr 8, 2022 •

edited

Loading

yeoncheol-jang Apr 30, 2022 •

edited

Loading

yeoncheol-jang May 1, 2022 •

edited

Loading

yeoncheol-jang May 1, 2022 •

edited

Loading