Skip to content

[fix][test] Fix flaky ManagedLedgerTest.testNoRetention and testInvalidateReadHandleWhenDeleteLedger#25495

Merged
merlimat merged 2 commits intoapache:masterfrom
lhotari:fix-flaky-ManagedLedgerTest
Apr 8, 2026
Merged

[fix][test] Fix flaky ManagedLedgerTest.testNoRetention and testInvalidateReadHandleWhenDeleteLedger#25495
merlimat merged 2 commits intoapache:masterfrom
lhotari:fix-flaky-ManagedLedgerTest

Conversation

@lhotari
Copy link
Copy Markdown
Member

@lhotari lhotari commented Apr 8, 2026

Fixes #25109

Motivation

Fix two flaky tests in ManagedLedgerTest that fail intermittently on master due to race conditions with async ledger rolls.

  • testNoRetention: After addEntry with maxEntriesPerLedger=1, a ledger roll is triggered asynchronously. When trimConsumedLedgersInBackground runs while the state is CreatingLedger, the trim is deferred (100ms retry). The original code did a single trim+join, which could complete as a no-op if the ledger roll was still in progress.

  • testInvalidateReadHandleWhenDeleteLedger: With maxEntriesPerLedger=1 and 3 entries, 3 ledger rolls occur. The last roll is async, so when readEntries(3) was called immediately, the 3rd entry could be read directly from currentLedger (bypassing ledgerCache), resulting in ledgerCache.size() == 2 instead of 3.

Modifications

  • testNoRetention: Wrapped the trim call and assertions in Awaitility.await().untilAsserted() so the test retries until trimming fully completes.
  • testInvalidateReadHandleWhenDeleteLedger: Added Awaitility.await().untilAsserted(() -> assertEquals(ledger.ledgers.size(), 4)) before reading, ensuring all ledger rolls are complete.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as ManagedLedgerTest.testNoRetention and ManagedLedgerTest.testInvalidateReadHandleWhenDeleteLedger. Both verified with invocationCount=10 locally (10/10 passes).

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

…idateReadHandleWhenDeleteLedger

testNoRetention: The trimConsumedLedgersInBackground call may be deferred
when the managed ledger is in CreatingLedger state (due to an ongoing
ledger roll after addEntry). The single trim+join was not sufficient
because the trim could complete as a no-op if deferred trimming hadn't
finished yet. Use Awaitility to repeatedly trigger trimming and verify
the assertions, ensuring the test waits for the async trimming to fully
complete.

testInvalidateReadHandleWhenDeleteLedger: With maxEntriesPerLedger=1,
adding 3 entries triggers 3 ledger rolls. The last roll may not have
completed by the time readEntries is called, causing the last entry to
be read directly from currentLedger (not via ledgerCache). This results
in ledgerCache having only 2 entries instead of the expected 3. Wait for
all ledger rolls to complete (ledgers.size() == 4) before reading.

Fixes apache#25109
@merlimat merlimat merged commit 104ea57 into apache:master Apr 8, 2026
80 of 83 checks passed
merlimat added a commit to merlimat/pulsar that referenced this pull request Apr 10, 2026
The debug log line in testInvalidateReadHandleWhenDeleteLedger was
removed on master (apache#25495) while the slog branch had converted it
to structured logging. Resolved by taking master's version (removing
the line).
merlimat added a commit to merlimat/pulsar that referenced this pull request Apr 10, 2026
The debug log line in testInvalidateReadHandleWhenDeleteLedger was
removed on master (apache#25495) while the slog branch had converted it
to structured logging. Resolved by removing the line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test: ManagedLedgerTest.testNoRetention

2 participants