-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] patch #5809: Fix the ledgerID not found cause NPE. #15098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@horizonzy:Thanks for your contribution. For this PR, do we need to update docs? |
|
@horizonzy:Thanks for providing doc info! |
mattisonchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but i have a question:
Could we check for an OpReadEntry earlier (to avoid creating useless objects) or throw an exception (to achieve fail fast) when creating it?
Please let me know what you think, thanks~
yep, I have thought it, but it's not atomic, it need user make invalid outside. This way more secure. |
And the operation |
Got it. I was just wondering if there is a race condition here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! LGTM +1
| opReadEntry.readEntriesFailed(new ManagedLedgerException.NoMoreEntriesToReadException("The ceilingKey(K key" | ||
| + ") method is used to return the least key greater than or equal to the given key, " | ||
| + "or null if there is no such key"), null); | ||
| opReadEntry.makeInvalid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behavior of the method. So we need to go through all the usage of this method.
As far as I can see,
- This line makes no sense any more.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpReadEntry.java
Line 142 in a242f03
cursor.ledger.startReadOperationOnLedger(nextReadPosition, OpReadEntry.this); - Also affects the
OpReadEntry.createinpulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Line 765 in 5cf3fa0
OpReadEntry op = OpReadEntry.create(this, readPosition, numberOfEntriesToRead, callback,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reminder. I have checked it, found another problem, I will fix it at another pr.
This one patch just for npe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a wait opReadEntry, when opAddEntry completed, it will nofity wait opReadEntry read again.
Can you add a unit test to cover this path?
It seems readPosition of this invalid OpReadEntry is used before asyncReadEntries in hasMoreEntries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a unit test to cover this path? It seems
readPositionof this invalid OpReadEntry is used beforeasyncReadEntriesinhasMoreEntries
Maybe not, it will still be used in asyncReadEntries.
At line_768, the wait OpReadEntry set to property WAITING_READ_OP_UPDATER.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Line 768 in c163158
| if (!WAITING_READ_OP_UPDATER.compareAndSet(this, null, op)) { |
In notifyEntriesAvailable, get wait OpReadEntry from WAITING_READ_OP_UPDATER, and handle it to asyncReadEntries .
And in notifyEntriesAvailable, there be a bad behavior. fixes at #15102
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 2749 to 2772 in c163158
| void notifyEntriesAvailable() { | |
| if (log.isDebugEnabled()) { | |
| log.debug("[{}] [{}] Received ml notification", ledger.getName(), name); | |
| } | |
| OpReadEntry opReadEntry = WAITING_READ_OP_UPDATER.getAndSet(this, null); | |
| if (opReadEntry != null) { | |
| if (log.isDebugEnabled()) { | |
| log.debug("[{}] [{}] Received notification of new messages persisted, reading at {} -- last: {}", | |
| ledger.getName(), name, opReadEntry.readPosition, ledger.lastConfirmedEntry); | |
| log.debug("[{}] Consumer {} cursor notification: other counters: consumed {} mdPos {} rdPos {}", | |
| ledger.getName(), name, messagesConsumedCounter, markDeletePosition, readPosition); | |
| } | |
| PENDING_READ_OPS_UPDATER.incrementAndGet(this); | |
| opReadEntry.readPosition = (PositionImpl) getReadPosition(); | |
| ledger.asyncReadEntries(opReadEntry); | |
| } else { | |
| // No one is waiting to be notified. Ignore | |
| if (log.isDebugEnabled()) { | |
| log.debug("[{}] [{}] Received notification but had no pending read operation", ledger.getName(), name); | |
| } | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit to cover wait opReadEntry case. And close #15102, the handle push at this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all done, pls check it again when you convenient.
|
There's also #12396 which fixes a NPE in OpReadEntry. That might be another problem that it's fixing. |
well, I will check it together. |
| return null; | ||
| } else { | ||
| // for wait opReadEntry, the readPosition will recalculate. | ||
| opReadEntry.makeValid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here for wait opReadEntry, it maybe change from invalid to valid. So add this logicment.
|
The pr had no activity for 30 days, mark with Stale label. |
|
The pr had no activity for 30 days, mark with Stale label. |
Master Issue: #5669 #5809
Motivation
Relational code:
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpReadEntry.java
Lines 48 to 63 in af35164
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 2230 to 2245 in af35164
In #5809, there be 3 potential problem.
case 1: In
OpAddEntryconstruction, it useManagedLedgerImpl#startReadOperationOnLedgerto setup readPosition, in startReadOperationOnLedger, it will callback OpAddEntry, but now some property not initial yet(like curtos,callback, entries...), cause npe.case 2:
in startReadOperationOnLedger, the callback did't pass on ctx, the callback process need it, also case npe.
case 3:
in startReadOperationOnLedger, it didn't return when not found ledger id. The judgement unbox also case npe.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 2238 in af35164
Modifications
patch #5809
Documentation
no-need-doc