-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[ML] Follow up on race condition fixes in ManagedCursorImpl #15031 #15067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Follow up on race condition fixes in ManagedCursorImpl #15031 #15067
Conversation
|
@lhotari:Thanks for your contribution. For this PR, do we need to update docs? |
|
@lhotari:Thanks for providing doc info! |
## Motivation in order to handle pending ack persistent. ## implement 1. add the transaction pending ack store, it will handle the pending ack metadata store. 2. when the sub unload, we will replay the pendingAckHandle. 3. we use one manageLedger to store the pending ack metadata by one sub , and replay by this managedLedger open cursor. 4. when we commit or abort the transaction, we will append the marker to the pendingAckStore then we will modify state memory in pendingAckHandle 4. we also modify the in memory state when append fail, because we don't know the persistent state, when we replay it, it will produce the wrong operation. so we append fail, we should wait tc time out or client abort this transaction. 5. when we append pending ack log, we will compare the the log position store the biggest topic position is bigger than persistent topic markDeletePosition. if it is smaller, will delete the position. ### Verifying this change Add the tests for it
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Show resolved
Hide resolved
255ec63 to
9344d99
Compare
- backports apache#15067 to branch-2.8
|
the periodic flushing added in #8634 will cause problems unless this fix is applied. |
- backports apache#15067 to branch-2.8
@lhotari - are you able to describe the meaning of the field more? |
(apache#15067) - follow up on apache#15031 * [ML] Fix race in persisting mark delete position * [ML] Resetting should reset lastMarkDeleteEntry * [ML] Reset fields in initializeCursorPosition method (cherry picked from commit a19a30a)
congbobo184
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this pr can fix the race condition. the lastMarkDeleteEntry field also will get updated to a previous value as a result of races. two thread also can updateLAST_MARK_DELETE_ENTRY_UPDATER in the same time.
|
@lhotari Sorry for the late response. Looks like this PR will not fix the persistentMarkDeletePosition consistency issue? Assume 2 threads, thread 0 and thread 1, thread 0 reach pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Line 1864 in f9cfc3e
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Line 1852 in f9cfc3e
And |
@codelipenghui That's true. However, I don't think that the problem has become worse than before. |
|
@lhotari got it. Thanks. |
(apache#15067) - follow up on apache#15031 * [ML] Fix race in persisting mark delete position * [ML] Resetting should reset lastMarkDeleteEntry * [ML] Reset fields in initializeCursorPosition method
(apache#15067) - follow up on apache#15031 * [ML] Fix race in persisting mark delete position * [ML] Resetting should reset lastMarkDeleteEntry * [ML] Reset fields in initializeCursorPosition method
|
@lhotari Hi, do you have a plan to fix my comment? If not I will take over the task |
Yes, I'm planning to address it. |
Could you explain why this patch has fixed this issue ? |
I don't agree with this. This will add the complexity of the logic. We should find a way to fix this completely, not merge with other ideas |
This line was missing: pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Line 1616 in 8c534db
Since #15031 added a solution to prevent race conditions in updating the pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Lines 935 to 942 in b083e9a
What ended up happening that #15031 prevented updating the pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Lines 3123 to 3129 in 8c534db
That caused the behavior that the cursor would move the readPosition to the last entry when the flush call happened and the resulted in the entries (messages) to be skipped. I'm planning to add a test case to reproduce this and I'll use it also for refactoring the solution to fix the remaining ordering issue / race condition that remains. |
@Technoboy- Please be more specific about your feedback. I don't really get the point of your sentence. Please rephrase. |
Fixes #15151
Motivation and modifications
The #15031 changes fix a race condition where the lastMarkDeleteEntry field might get updated to a previous value as a result of races.
While looking into the details and attempting to add a unit test which reproduces the issue, I came up with additional changes in ensuring the consistency for
persistentMarkDeletePositionfield.It seems that the current solution in
persistentMarkDeletePositionhas something to ensure that the value doesn't move backwards:pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 1830 to 1833 in c2c05c4
This is wrong since the meaning of the field is not what one would expect it to be.
This PR fixes that issue.
getPersistentMarkDeletedPosition()is used in MLPendingAckStore:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/transaction/pendingack/impl/MLPendingAckStore.java
Lines 247 to 249 in 8a6ecd7
In addition, this PR fixes an issue where the
persistentMarkDeletePositionandlastMarkDeleteEntryfields must be able to move backwards. When the cursor is resetted, ths fields should be cleared.In #15151 case, the relevant missing line was
lastMarkDeleteEntry = new MarkDeleteEntry(markDeletePosition, getProperties(), null, null);which was missing from theinitializeCursorPositionmethod.