Skip to content

KAFKA-8649: version probing with upgraded leader#7413

Closed
ableegoldman wants to merge 2 commits intoapache:trunkfrom
ableegoldman:version-probing-version
Closed

KAFKA-8649: version probing with upgraded leader#7413
ableegoldman wants to merge 2 commits intoapache:trunkfrom
ableegoldman:version-probing-version

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

Version probing is currently broken when the leader is chosen from one of the new (already upgraded) instances, as older members will blindly upgrade to a version they don't support (then throw an exception) while newer members will receive an assignment with the older version and trigger a new rebalance (leading to rebalance loop if the older members can't upgrade their subscription) because we always send assignments encoded using the min version seen by any client--see ticket for details.

Note that a real "version probing" rebalance is technically one where the leader is old and receives subscriptions it can't understand. In this case we send an assignment back with the old version, which the receiving consumer then knows to downgrade to and trigger another rebalance.

When you have a "new" leader however, I propose we:

  • always send assignments back using the same version as the corresponding subscription, EXCEPT if we notice that everyone now supports the latest version but some are still using the older version. this signals the rolling upgrade is complete, so send everyone back the latest version
  • if you receive a version greater than the one you sent, it must mean the bounce is over and it is safe to now send new versions. upgrade your subscription version and trigger a final rebalance. this will actually only happen when the leader is the last to be bounced
  • if the leader is new, it can understand all subscriptions so we just wait for everyone to be bounced and allow new members to keep sending new version subscriptions. once the last member is upgraded everyone will already be using the new subscription and we don't need to trigger a second and final rebalance.
  • if you receive a version less than what you sent, this is version probing so downgrade your subscription and trigger another rebalance -- this will now only happen when you are actually on a higher version than the leader, so we know this is true version probing.

@ableegoldman
Copy link
Copy Markdown
Member Author

This is just my proposal, not the only possible fix. It has some advantages including saving intermediate rebalances (unless the leader and the final rebalance (unless the leader is the last to be bounced). This does come at the cost of some potential awkwardness, as we will be handling a mix of subscription versions during the bounce.

Another proposal that was discussed was to have the leader "lie" about the leaderVersion encoded in the assignment, and instead encode it as the min supported version

@ableegoldman
Copy link
Copy Markdown
Member Author

@mumrah
Copy link
Copy Markdown
Member

mumrah commented Sep 30, 2019

retest this please

@ableegoldman
Copy link
Copy Markdown
Member Author

Not going to close this as I think this approach has some advantages, but going with the quicker/simpler fix here for bugfix release of older branches: #7423

Will potentially revisit this as a possible improvement once we at least fix the actual bugs

@ableegoldman ableegoldman deleted the version-probing-version branch June 26, 2020 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants