KAFKA-9261; Client should handle inconsistent leader metadata by hachikuji · Pull Request #7772 · apache/kafka

hachikuji · 2019-12-03T21:54:30Z

This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen.

Also see #7770 for additional detail.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

ijuma

LGTM, thanks. I verified that the test fails without the fix.

This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen. Also see #7770 for additional detail. Reviewers: Ismael Juma <ismael@juma.me.uk>

ijuma · 2019-12-04T04:52:27Z

Merged to 2.4 and cherry-picked to 2.3.

…data (apache#7772) TICKET = LI_DESCRIPTION = This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen. Also see apache#7770 for additional detail. Reviewers: Ismael Juma <ismael@juma.me.uk> EXIT_CRITERIA = MANUAL [""]

…9261. (#63) TICKET =[KAFKA-9212, KAFKA-9261] LI_DESCRIPTION = 1. Rollback hotfix for "Rollback KAFKA-7440 as a workaround for KAFKA-9212" 2. KAFKA-9261; Client should handle inconsistent leader metadata (apache#7772) This is a reduced scope fix for KAFKA-9261. The purpose of this patch is to ensure that partition leader state is kept in sync with broker metadata in MetadataCache and consequently in Cluster. Due to the possibility of metadata event reordering, it was possible for this state to be inconsistent which could lead to an NPE in some cases. The test case here provides a specific scenario where this could happen. Also see apache#7770 for additional detail. Reviewers: Ismael Juma <ismael@juma.me.uk> 3. KAFKA-9212; Ensure LeaderAndIsr state updated in controller context during reassignment KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>

KAFKA-9261; Client should handle inconsistent leader metadata

ce60547

hachikuji force-pushed the KAFKA-9261-2.4 branch from d267cd6 to ce60547 Compare December 3, 2019 21:57

ijuma approved these changes Dec 4, 2019

View reviewed changes

ijuma merged commit 6e42532 into apache:2.4 Dec 4, 2019

ijuma mentioned this pull request Jan 30, 2020

KAFKA-9261; Client should handle unavailable leader metadata #7770

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-9261; Client should handle inconsistent leader metadata#7772

KAFKA-9261; Client should handle inconsistent leader metadata#7772
ijuma merged 1 commit intoapache:2.4from
hachikuji:KAFKA-9261-2.4

hachikuji commented Dec 3, 2019 •

edited

Loading

Uh oh!

ijuma left a comment

Uh oh!

ijuma commented Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hachikuji commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

ijuma commented Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hachikuji commented Dec 3, 2019 •

edited

Loading