[fix][broker] Fix cursor position persistence in ledger trimming#25087
Conversation
The issue was in maybeUpdateCursorBeforeTrimmingConsumedLedger where it called onCursorMarkDeletePositionUpdated, which only updates the in-memory cursor position without persisting it to the metadata store. If the broker crashes after this update, it will recover the old mark deletion position, which leads to confusion as the trimmed ledgers may still be referenced by the cursor. The fix changes the implementation to call asyncMarkDelete instead, which triggers actual persistence of the cursor's mark delete position. This ensures that ledger trimming is only based on the persistent mark delete position. Additionally, the fix preserves cursor properties by passing cursor.getProperties() instead of an empty map, ensuring that properties are not lost during the cursor position update. The test has been updated to verify that: 1. The cursor position is properly persisted after ledger rollover 2. Cursor properties are preserved during the cursor reset operation
|
@codelipenghui Please add the following content to your PR description and select a checkbox: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #25087 +/- ##
============================================
+ Coverage 73.78% 74.50% +0.72%
+ Complexity 33904 33680 -224
============================================
Files 1921 1899 -22
Lines 150566 149635 -931
Branches 17498 17396 -102
============================================
+ Hits 111088 111491 +403
+ Misses 30552 29281 -1271
+ Partials 8926 8863 -63
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
) Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 26297ac)
) Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 26297ac)
|
I skipped cherry-picking to branch-3.0 since the test org.apache.bookkeeper.mledger.impl.ManagedCursorTest#asyncMarkDeleteBlocking breaks and I couldn't easily find out the reason. |
|
This PR seems to have made ManagedCursorTest#asyncMarkDeleteBlocking more flaky. I created #25092 for that. |
…che#25087) Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 26297ac) (cherry picked from commit 9e6f47e)
…che#25087) Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 26297ac) (cherry picked from commit 9e6f47e)
…che#25087) Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 26297ac) (cherry picked from commit 9e6f47e)
Motivation
The issue was in
maybeUpdateCursorBeforeTrimmingConsumedLedgerwhere it calledonCursorMarkDeletePositionUpdated, which only updates the in-memory cursor position without persisting it to the metadata store. If the broker crashes after this update, it will recover the old mark deletion position, which leads to confusion as the trimmed ledgers may still be referenced by the cursor.Here is a real case in production:
Modifications
maybeUpdateCursorBeforeTrimmingConsumedLedgerto callasyncMarkDeleteinstead ofonCursorMarkDeletePositionUpdatedcursor.getProperties()instead of an empty maptestTrimmerRaceConditionto verify:Verifying this change
This change is already covered by existing test:
ManagedLedgerTest.testTrimmerRaceCondition- Enhanced to verify cursor position persistence and property preservationDocumentation
docdoc-requireddoc-not-neededdoc-complete