-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][ml] There are two same-named managed ledgers in the one broker #18688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][ml] There are two same-named managed ledgers in the one broker #18688
Conversation
9089c62 to
c3bf336
Compare
c3bf336 to
b439bcd
Compare
|
The pr had no activity for 30 days, mark with Stale label. |
b439bcd to
e491665
Compare
Codecov Report
@@ Coverage Diff @@
## master #18688 +/- ##
============================================
+ Coverage 63.30% 63.90% +0.60%
+ Complexity 26123 3490 -22633
============================================
Files 1836 1843 +7
Lines 134416 135163 +747
Branches 14772 14859 +87
============================================
+ Hits 85087 86371 +1284
+ Misses 41649 40949 -700
- Partials 7680 7843 +163
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
The pr had no activity for 30 days, mark with Stale label. |
|
@poorbarcode I assume this PR is still relevant? Please merge latest changes from master and check that tests still pass. |
f64173d to
35a073e
Compare
Yes.
Done. Could you please help review this PR? |
@poorbarcode The change to production code itself looks good, but I don't like the test. Addressing that would require more changes to ML factory to make it possible to sub class it for tests with test hooks instead of relying on Mockito. One way would be to extract a protected method for the logic here: pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java Lines 376 to 379 in 152b4a2
and here to be able to use a test specific ManagedLedgerFactoryImpl subclass where it's possible to create a custom ManagerLedgerImpl subclass with the required hooks for testing: pulsar/managed-ledger/src/test/java/org/apache/bookkeeper/test/MockedBookKeeperTestCase.java Line 86 in 152b4a2
|
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @poorbarcode. The production code change is good. I've suggested some changes to the way the test is implemented.
| private ManagedLedgerImpl makeManagedLedgerWorksWithStrictlySequentially(ManagedLedgerImpl originalManagedLedger, | ||
| ProcessCoordinator processCoordinator) | ||
| throws Exception { | ||
| ManagedLedgerImpl sequentiallyManagedLedger = spy(originalManagedLedger); | ||
| // step-1. | ||
| doAnswer(invocation -> { | ||
| synchronized (originalManagedLedger) { | ||
| // step-3. | ||
| // Wait for `managedLedger.close`, then do task: "asyncCreateLedger()". | ||
| // Because the thread selector in "managedLedger.executor" is random logic, so it is possible to fail. | ||
| // Adding 1000 tasks to stuck the executor gives a high chance of success. | ||
| for (int i = 0; i < 1000; i++) { | ||
| originalManagedLedger.getExecutor().execute(() -> { | ||
| processCoordinator.waitPreviousAndSetStep(3); | ||
| }); | ||
| } | ||
| LedgerHandle lh = (LedgerHandle) invocation.getArguments()[0]; | ||
| processCoordinator.waitPreviousAndSetStep(1); | ||
| originalManagedLedger.ledgerClosed(lh); | ||
| } | ||
| return null; | ||
| }).when(sequentiallyManagedLedger).ledgerClosed(any(LedgerHandle.class)); | ||
| // step-2. | ||
| doAnswer(invocation -> { | ||
| processCoordinator.waitPreviousAndSetStep(2); | ||
| originalManagedLedger.close(); | ||
| return null; | ||
| }).when(sequentiallyManagedLedger).close(); | ||
| return sequentiallyManagedLedger; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a hack, especially the loop of adding 1000 tasks to the executor. The "ProcessCoordinator" implementation looks like something that could be handled with java.util.concurrent.Phaser.
It would be better to modify ManagedLedgerFactoryImpl so that it's possible to override a method that creates the ledger instance. That way it would be possible to have a way to override the method for tests and inject test logic without relying on Mockito, which isn't thread safe. That itself could cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I have rewritten the test to make it simpler. Could you take a look again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I added other changes[1]:
- fix the wrong state of the closed managed ledger.
- release the
ledgerHandle, which is created after the ML is closed
35a073e to
40298db
Compare
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good work @poorbarcode .
…pache#18688) (cherry picked from commit d7186a6)
|
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
Motivation
In PR #17526, we know that a
topiccan be closed multiple times and that it is possible to have two same-named objects of classPersistenttopicin the samebrokerinstance.We know that closing the
topictriggers the closure of theManagedLedger. Thetopicobject can be closed multiple times which means theManagedLedgercan be closed multiple times. This PR is used to prove: If aManagedLedgeris closed more than once, and switchedledgerHandleoperation ofManagedLedgerand method closed executed concurrently, there will be two of the same-namedManagedledgerin the same broker, possibly with different numbers of cursors.If both Managedledgers are available and there are different numbers of cursors, this can cause the operation
trimLedgersto delete too many ledgers from the meta ofManagedLedger.Here is the process:
managedLedger_1.closeswitch ledgerHandle(managedLedger_1)create managedLedger_2create managedLedger_3LedgerHandleLedgerHandleClosedmanagedLedger_1fromManagedFactory.ledgersLedgerHandleLedgerOpenedmanagedLedger_2managedLedger_2toManagedFactory.ledgersmanagedLedger_2fromManagedFactory.ledgerscursor_1intomanagedLedger_2managedLedger_3managedLedger_3toManagedFactory.ledgerscursor_1from metacursor_2intomanagedLedger_3cursor_1cursor_1,cursor_2Modifications
remove(k,v)instead ofremove(k)when deletingManagedLedgerfromManagedLedgerFactroy.ledgers.ledgerHandle, which is created after the ML is closedDocumentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: