Skip to content

KAFKA-7126: Reduce number of rebalance for large consumer group after a topic is created#5408

Closed
jonlee2 wants to merge 1 commit intoapache:trunkfrom
jonlee2:KAFKA-7126
Closed

KAFKA-7126: Reduce number of rebalance for large consumer group after a topic is created#5408
jonlee2 wants to merge 1 commit intoapache:trunkfrom
jonlee2:KAFKA-7126

Conversation

@jonlee2
Copy link
Copy Markdown
Contributor

@jonlee2 jonlee2 commented Jul 20, 2018

This patch forces metadata update for consumers with pattern subscription at the beginning of rebalance (retry.backoff.ms is respected). This is to prevent such consumers from detecting subscription changes (e.g., new topic creation) independently and triggering multiple unnecessary rebalances. KAFKA-7126 contains detailed scenarios and rationale.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jonlee2
Copy link
Copy Markdown
Contributor Author

jonlee2 commented Jul 20, 2018

@lindong28 @hachikuji @tedyu Can you take a look? Thanks.

@tedyu
Copy link
Copy Markdown
Contributor

tedyu commented Jul 20, 2018

Raised INFRA-16796 for Jenkins

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for nowMs parameter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tedyu Done

Copy link
Copy Markdown
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left one minor comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can improve the comment a bit more to clarify why we need to request metadata here. The important point is that, when a topic is created, the metadata update in any consumer can trigger the rebalance. In order to the topic creation to trigger one rebalance in the entire consumer group, we need each consumer to refresh metadata before it re-joins the group.

Currently the comment suggests, if we can refresh metadata before consumer re-joins the group, then consumer can avoid rebalance due to metadata refresh later. It seems a bit confusing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this:

For consumer group that uses pattern-based subscription, after a topic is created, any consumer that discovers the topic after metadata refresh can trigger rebalance across the entire consumer group. Multiple rebalance can be triggered after one topic creation if consumers refresh metadata at vastly different times. We can significantly reduce the number of rebalance caused by single topic creation by asking consumer to refresh metadata before re-joining the group as long as the refresh backoff time has passed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment as suggested.

@lindong28 lindong28 self-assigned this Jul 21, 2018
@lindong28
Copy link
Copy Markdown
Member

LGTM. @junrao @ijuma @hachikuji Would you have time to review this patch if you have time?

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. I think the approach makes sense. Can you write a test case so that we do not regress this behavior in the future?

@lindong28
Copy link
Copy Markdown
Member

Thanks much for your review @hachikuji. Yeah it is better to have a test.

@jonlee2
Copy link
Copy Markdown
Contributor Author

jonlee2 commented Jul 26, 2018

@hachikuji @lindong28 Thanks for the review. I added a unit test. Can you take a look?

Copy link
Copy Markdown
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one minor comment. I will just make the change and merge to trunk.

* @return remaining time in ms till updating the cluster info
*/
public synchronized long timeToNextUpdate(long nowMs) {
public synchronized long timeToNextUpdate(long nowMs) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you remove the extra space here?

@lindong28 lindong28 changed the title KAFKA-7126: Reduce number of rebalance period for large consumer group after a topic is created KAFKA-7126: Reduce number of rebalance for large consumer group after a topic is created Jul 26, 2018
lindong28 pushed a commit that referenced this pull request Jul 26, 2018
… a topic is created

This patch forces metadata update for consumers with pattern subscription at the beginning of rebalance (retry.backoff.ms is respected). This is to prevent such consumers from detecting subscription changes (e.g., new topic creation) independently and triggering multiple unnecessary rebalances. KAFKA-7126 contains detailed scenarios and rationale.

Author: Jon Lee <jonlee@linkedin.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Ted Yu <yuzhihong@gmail.com>, Dong Lin <lindong28@gmail.com>

Closes #5408 from jonlee2/KAFKA-7126

(cherry picked from commit a932520)
Signed-off-by: Dong Lin <lindong28@gmail.com>
@lindong28 lindong28 closed this in a932520 Jul 26, 2018
@lindong28
Copy link
Copy Markdown
Member

The PR has been merged to 2.0 branch as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants