-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Fix cannot cleanup expired ledger by trim ledgers #15474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[fix][broker] Fix cannot cleanup expired ledger by trim ledgers #15474
Conversation
|
Good catch. This patch solved the problem to some extent, I think the solution is that the newly created persistent subscription 'cursor position' is always greater than 'message-TTL' and the 'message-TTL' policy is enabled. I have two questions:
why not use nodurable Reader ?
ledgerClosed(lh);
createLedgerAfterClosed();This should ensure that a new-leadger is always created when the old-leadger is closed. If this is necessary, a comment should be added to remind others later, better have a unit test to ensure that. |
@poorbarcode yes, but only enable 'message-TTL' policy is not enough. As the case 2, if we have subscription cursor and not active consumer, the
|
|
@mattisonchao |
|
Hi @Nicklee007 |
@Technoboy- added unit test for the case 2, PTAL. |
6690d83 to
66eed74
Compare
@mattisonchao hi, I have revert the unit test modify in the PR 14672, PTAL. |
|
The pr had no activity for 30 days, mark with Stale label. |
07ebaee to
8921a7a
Compare
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Show resolved
Hide resolved
eae8c0d to
67d415b
Compare
|
I think we still need to revert #14672 first (#16806). Because this PR just checks whether any cursor has been updated in the the periodically |
Ok, waiting #14672 reverted, I'll change this PR. |
|
Hi, @Nicklee007 |
67d415b to
4158ca6
Compare
|
@mattisonchao Changed this PR, PTAL. |
|
@Nicklee007 Please provide a correct documentation label for your PR. |
4158ca6 to
326dad1
Compare
|
The pr had no activity for 30 days, mark with Stale label. |
Fixes #15473
Motivation
The pr #10087 has fixed some case expired data cannot cleanup, but in some other case the bug reappeared .
case 1 : the
slowestReaderPositionbe reset by other cursor operator before we can trim the expired ledger.The topic producer produced some data and stoped in a time, then no more message will be add. When the
maximumRolloverTimeMsreached, check the current ledger is full , closed the old ledger and created new ledger, after ledger create completemaybeUpdateCursorBeforeTrimmingConsumedLedger()will update cursorslowestReaderPositionto(newLedgerId,-1)and triggertrimConsumedLedgersInBackground()immediatelypulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 2284 in a1fb200
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 2295 to 2296 in a1fb200
But almost the message
retentionTimeMsis longer thanmaximumRolloverTimeMs, then the ledger can not be clean bytrimConsumedLedgersInBackgroundat this time running.And then if we have a flink task consumer the topic( use reader to consumer and mark position by another durable cursor), the client will always reset the durable cursor to the position
(olderLedgerId, entryNum +1)internalResetCursorinvokeinternalAsyncMarkDeleteinvokeledger.updateCursorto resetslowestReaderPositionto(olderLedgerId, entryNum ), when reached the nextretentionCheckIntervalInSecondsto invoketrimConsumedLedgersInBackground,but theslowestReaderPositionhas been reset to the(olderLedgerId, entryNum ), then we can not clean the old ledger in any reach 'retentionCheckIntervalInSeconds' time.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 2474 in ac6bd3c
Also the old ledger is not expired when after created ledger trigger
trimConsumedLedgersInBackground,another possibility cursor operator is themarkDeleteLimitercause the consumer last mark-delete operations is Dirty and need mark delete byflushCursorsTaskalso will invokeledger.updateCursor(ManagedCursorImpl.this, mdEntry.newPosition)to(olderLedgerId, entryNum ). After this time we also cloud not clean the old ledger bytrimConsumedLedgersInBackground.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 3132 to 3138 in ac6bd3c
case2: have not a condition to set slowestReaderPosition to the correct position.
In this case, the most common is the producer stoped in one time and we have durable cursor, but we have not active consumer, the cursor position move forward by
MessageExpiryMonitor,maybe the cursor mark delete position is before a few ledger , and the last time create ledger and invokemaybeUpdateCursorBeforeTrimmingConsumedLedgermaybe not setslowestReaderPositionto the newest ledger(newLedgerId,-1), after thenMessageExpiryMonitoronly can resetslowestReaderPositionto(olderLedgerId, entryNum ). After this time we also cloud not clean the old ledger bytrimConsumedLedgersInBackground.Modifications
checkConsumedLedgers, we usemaybeUpdateCursorBeforeTrimmingConsumedLedger()to check if ledger consumed completely and reset cursor to next ledger, then theConsumedLedgersMonitorcan trim the old ledger data.rollCurrentLedgerIfFulllogic to follow lazy creation of ledger, we need create new ledger to resetslowestReaderPositionto the new empty ledger. So we need add invokecreateLedgerAfterClosed()inrollCurrentLedgerIfFull. -- PR ChangerollCurrentLedgerIfFulllogic to follow lazy creation of ledger #14672 has revert.Documentation
Check the box below or label this PR directly.
Need to update docs?
doc-not-needed(Please explain why)