-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][ml] Make lifetimes of read callback and context clearly the same #24748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
BewareMyPower
wants to merge
16
commits into
apache:master
Choose a base branch
from
BewareMyPower:bewaremypower/op-read-entry-lifetime
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[improve][ml] Make lifetimes of read callback and context clearly the same #24748
BewareMyPower
wants to merge
16
commits into
apache:master
from
BewareMyPower:bewaremypower/op-read-entry-lifetime
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…d once" This reverts commit b9449e6.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ML
doc-not-needed
Your PR changes do not impact docs
release/4.1.3
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Main Issue: #24697
Motivation
The current asynchronous read APIs of managed ledger are callback-based that a pair of callback and context are needed. For example,
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherSingleActiveConsumer.java
Lines 387 to 390 in 8e35e34
The callback is the dispatcher itself, the context is a customized recyclable object. The direct cause of #24697 is that an incorrect context was passed to the callback. When I'm investigating the issue, I first suspect that the context has been recycled twice unexpectedly. It's a bit impossible because
recycle()is only called in callback'sreadEntriesCompleteorreadEntriesFailedmethods whose implementations are both synchronized and will sethavePendingReadtofalse, whilehavePendingReadwastrue.The other possible reason is that an incorrect context was passed to the callback. When I'm revisiting the code, I found it very hard to read because the lifetimes of callback and context are very ambiguous.
Let's start from
ManagedLedgerImpl#internalReadFromLedger, which must be reachedfor any read operation.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 2398 in 8e35e34
Before the line above, the ownership of read context (
ctx) was passed toOpReadEntry. However, sincectxis not private, after the line,ctxis owned by bothOpReadEntryandctx.Then they're passed to
EntryCache#asyncReadEntry, whereOpReadEntryis treated as a callback, and eventuallydoAsyncReadEntriesWithAcquiredPermits.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java
Lines 349 to 365 in 8e35e34
Now, the
OpReadEntrywas wrapped into another callback. On one hand, another extra heap memory allocation on theOpReadEntrymakes the complicated recycler design ridiculous. On the other hand, oncectx2is not the same withctx, the downstream caller will fail unexpectedly, just like #24697.pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java
Lines 427 to 428 in 8e35e34
Let's go deeper,
ctxis even passed toreadMissingEntriesAsyncand thenpendingReadsManager.readEntries, which means nowctxis shared by 3 owners. The code ofPendingReadsManageris much more complicated, there is even a cache that cachesctxindirectly inPendingRead#listeners.I'm not looking further into the code. But from the code reading above, the root cause of the complex code is that the
ctxis shared everywhere and hard to track.To solve this issue,
CompletableFutureis a best choice:completeorcompleteExceptionallycalls can succeed only oncewhenCompletecan only be called onceThat means, even though the API is callback-based, if the underlying implementations all call future-based methods, the lifetimes of read callback and context can be clearly the same:
In the code above,
callbackandctxare never exposed (i.e. they are uniquely owned byOpReadEntry) and can only be called byreadEntriesCompleteorreadEntriesFailed. Therefore, it guaranteescallbackandctxalways match.Modifications
EntryCachelayer to make all APIs future-basedctxandcallbackprivate inOpReadEntryand only allow them to be called byreadEntriesCompleteorreadEntriesFailedAdditionally, it also changes the behavior when an
OpReadEntryis cancelled and makerecycle()private. #16399 fixes the case that theOpReadEntryis not recycled when being cancelled. However, if the context is also a recyclable object and needs to be recycled by triggering the callback with it, thectxcould not be recycled as well. Cancelling the read is a commonly seen case that happens when a consumer is removed, the recycler's thread local queue could easily be full.The complicated
ReadEntryCallbackWrapperdesign is also replaced by aCompletableFuturesince we no longer need to use a unique id (readOpCount) to check if the wrapper has been recycled due to timeout. Because once a future is completed exceptionally due to timeout, completing the future again will not take effect.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: BewareMyPower#53