KAFKA-9750; Fix race condition with log dir reassign completion#8412
KAFKA-9750; Fix race condition with log dir reassign completion#8412hachikuji merged 4 commits intoapache:trunkfrom
Conversation
| request.minBytes, | ||
| request.maxBytes, | ||
| request.version <= 2, | ||
| false, |
There was a problem hiding this comment.
We always build with the latest fetch version, so this check was unneeded.
| try { | ||
| // It is possible that the log dir fetcher completed just before this call, so we | ||
| // filter only the partitions which still have a future log dir. | ||
| val filteredFetchStates = initialFetchStates.filterKeys(replicaMgr.futureLogExists) |
There was a problem hiding this comment.
So there is a extra check of future folder here and the check is in the lock so the race condition is resolved.
|
@hachikuji Your approach is better than me :) Could you include the test of my PR and please let me test your PR on my local later. |
| // filter only the partitions which still have a future log dir. | ||
| val filteredFetchStates = initialFetchStates.filterKeys(replicaMgr.futureLogExists) | ||
| super.addPartitions(filteredFetchStates) | ||
| filteredFetchStates.keySet |
| // It is possible that the log dir fetcher completed just before this call, so we | ||
| // filter only the partitions which still have a future log dir. | ||
| val filteredFetchStates = initialFetchStates.filterKeys(replicaMgr.futureLogExists) | ||
| super.addPartitions(filteredFetchStates) |
There was a problem hiding this comment.
Should we skip to create thread if the collection of filtered states is empty?
There was a problem hiding this comment.
I think this will be a no-op in addPartitions, so it's probably harmless
|
@chia7712 Thanks for reviewing. I added your test case improvements to the patch. Will consider additional improvements in the morning. |
mumrah
left a comment
There was a problem hiding this comment.
Thanks for the patch @hachikuji and @chia7712! This looks good to me 👍
| // It is possible that the log dir fetcher completed just before this call, so we | ||
| // filter only the partitions which still have a future log dir. | ||
| val filteredFetchStates = initialFetchStates.filterKeys(replicaMgr.futureLogExists) | ||
| super.addPartitions(filteredFetchStates) |
There was a problem hiding this comment.
I think this will be a no-op in addPartitions, so it's probably harmless
| * leader epoch. This is the offset the follower should truncate to ensure | ||
| * accurate log replication. | ||
| * - Finally truncate the logs for partitions in the truncating phase and mark them | ||
| * - Finally truncate the logs for partitions in the truncating phase and mark the |
There was a problem hiding this comment.
I was tempted to fix this with the plural truncations.
|
The failure looks like a known flaky test: https://issues.apache.org/jira/browse/KAFKA-8460. I will merge to trunk, 2.5, and 2.4. |
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com> Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com> Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com> Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com> Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com> Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
…he#8412) There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com> Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock.
Co-authored-by: Chia-Ping Tsai chia7712@gmail.com
Committer Checklist (excluded from commit message)