KAFKA-9261; Client should handle inconsistent leader metadata#7772
Merged
ijuma merged 1 commit intoapache:2.4from Dec 4, 2019
Merged
KAFKA-9261; Client should handle inconsistent leader metadata#7772ijuma merged 1 commit intoapache:2.4from
ijuma merged 1 commit intoapache:2.4from
Conversation
d267cd6 to
ce60547
Compare
ijuma
approved these changes
Dec 4, 2019
Member
ijuma
left a comment
There was a problem hiding this comment.
LGTM, thanks. I verified that the test fails without the fix.
ijuma
pushed a commit
that referenced
this pull request
Dec 4, 2019
This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen. Also see #7770 for additional detail. Reviewers: Ismael Juma <ismael@juma.me.uk>
Member
|
Merged to 2.4 and cherry-picked to 2.3. |
efeg
pushed a commit
to efeg/kafka
that referenced
this pull request
Dec 12, 2019
…data (apache#7772) TICKET = LI_DESCRIPTION = This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen. Also see apache#7770 for additional detail. Reviewers: Ismael Juma <ismael@juma.me.uk> EXIT_CRITERIA = MANUAL [""]
xiowu0
pushed a commit
to linkedin/kafka
that referenced
this pull request
Dec 12, 2019
…9261. (#63) TICKET =[KAFKA-9212, KAFKA-9261] LI_DESCRIPTION = 1. Rollback hotfix for "Rollback KAFKA-7440 as a workaround for KAFKA-9212" 2. KAFKA-9261; Client should handle inconsistent leader metadata (apache#7772) This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen. Also see apache#7770 for additional detail. Reviewers: Ismael Juma <ismael@juma.me.uk> 3. KAFKA-9212; Ensure LeaderAndIsr state updated in controller context during reassignment KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in
MetadataCacheand consequently inCluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen.Also see #7770 for additional detail.
Committer Checklist (excluded from commit message)