Skip to content

KAFKA-9263 The new hw is added to incorrect log when ReplicaAlterLogD…#9423

Merged
chia7712 merged 1 commit intoapache:trunkfrom
chia7712:KAFKA-9263
Dec 2, 2020
Merged

KAFKA-9263 The new hw is added to incorrect log when ReplicaAlterLogD…#9423
chia7712 merged 1 commit intoapache:trunkfrom
chia7712:KAFKA-9263

Conversation

@chia7712
Copy link
Copy Markdown
Member

@chia7712 chia7712 commented Oct 13, 2020

issue: https://issues.apache.org/jira/browse/KAFKA-9263

The following actions results in this issue.

  1. handle_1 gets the current log
  2. ReplicaAlterLogDirsThread replaces current log by future log
  3. handle_1 adds the new hw to “current” log but the log is actually invalid

The solution is that the action 1 and 3 must be executed within same read lock of leaderIsrUpdateLock to avoid adding new hw to invalid log (which is replaced by ReplicaAlterLogDirsThread)

Test Plan

Relying on PlaintextAdminIntegrationTest.testAlterReplicaLogDirs. I have looped the test with this patch 100 times. all pass

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@chia7712
Copy link
Copy Markdown
Member Author

kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords 

pass on my local.

@chia7712
Copy link
Copy Markdown
Member Author

@hachikuji @lbradstreet Could you take a look? maybeIncrementLeaderHW was called with holding lock (f3ded39#diff-16590fdbe6661105c6faa87d68e39a553e66b34ee0a6d0b14a48d02e273bab4fL676) and f3ded39 changed it.

@chia7712 chia7712 force-pushed the KAFKA-9263 branch 4 times, most recently from 27ea5c2 to d3c9337 Compare October 26, 2020 06:25
@chia7712 chia7712 force-pushed the KAFKA-9263 branch 2 times, most recently from 743a2eb to 05217f5 Compare November 4, 2020 03:24
@chia7712
Copy link
Copy Markdown
Member Author

@junrao Could you take a look? There is no report about this bug in production but we can reproduce the bug by PlaintextAdminIntegrationTest.testAlterReplicaLogDirs. It happens infrequently but it is dangerous as it can misorder the HW.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the PR. LGTM. Are test failures related to this PR?

@chia7712
Copy link
Copy Markdown
Member Author

chia7712 commented Dec 1, 2020

Are test failures related to this PR?

They are unrelated error. Will rebase PR to trigger QA again.

@chia7712
Copy link
Copy Markdown
Member Author

chia7712 commented Dec 1, 2020

kafka.api.TransactionsTest.testBumpTransactionalEpoch is traced by #9291

org.apache.kafka.streams.integration.EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta and SourceConnectorsIntegrationTest are known flaky

…irsThread is replacing log (fix PlaintextAdminIntegrationTest.testAlterReplicaLogDirs)
@chia7712
Copy link
Copy Markdown
Member Author

chia7712 commented Dec 1, 2020

@junrao I will merge it to trunk tomorrow if there is no objection.

@chia7712 chia7712 merged commit aeeb7b2 into apache:trunk Dec 2, 2020
ijuma added a commit to ijuma/kafka that referenced this pull request Dec 2, 2020
…t-for-generated-requests

* apache-github/trunk:
KAFKA-9263 The new hw is added to incorrect log when
ReplicaAlterLogDirsThread is replacing log (fix
PlaintextAdminIntegrationTest.testAlterReplicaLogDirs) (apache#9423)
  KAFKA-10729; Bump remaining RPC's to use tagged fields. (apache#9601)

clients/src/main/java/org/apache/kafka/common/requests/AbstractResponse.java
clients/src/main/java/org/apache/kafka/common/requests/AlterReplicaLogDirsResponse.java
clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java
@chia7712 chia7712 deleted the KAFKA-9263 branch March 25, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants