-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] fix No such ledger exception #16420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@lordcheng10 Please provide a correct documentation label for your PR. |
|
@hangc0276 PTAL,thanks! |
|
@codelipenghui @eolivelli PTAL,thanks! |
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Show resolved
Hide resolved
hangc0276
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
|
/pulsarbot run-failure-checks |
2 similar comments
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
| log.debug("[{}] Updating of ledgers list after create complete. version={}", name, stat); | ||
| } | ||
| ledgersStat = stat; | ||
| ledgers.put(lh.getId(), LedgerInfo.newBuilder().setLedgerId(lh.getId()).setTimestamp(0).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lordcheng10 Adding to the ledgers map cannot be done here, because that map is used when writing to the z-node (eg: buildManagedLedgerInfo(Map<Long, LedgerInfo> ledgers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. PTAL,thanks! @merlimat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have concerns because in this method, if the ledger is not persisted to the metadata, it means the ledger cannot be used. But we put it in the ledger map; it is not persistent and may be used by other operations under multi-threaded conditions.
However, I'm not sure what it will affect. Just a minor concern. Please feel free to go on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your concern is right, I will try to update the zookeeper metadata with a temporary ledgersTmp, and update the ledgers after the zookeeper is successfully written. @mattisonchao @merlimat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @mattisonchao. This may lead to data loss.
Putting the newly created ledger into the NavigableMap<Long, LedgerInfo> ledgers, the new ledger can be seen immediately which means can write data into this ledger, and then if the meta-store operation failed, the ledger will be removed from both the ledgers and bookie, all data in this ledger are lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. @merlimat @hangc0276 @aloyszhang @mattisonchao PTAL,thanks!
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
|
This pull request made a lot of change after your review. Could you have time to review it again? |
|
/pulsarbot run-failure-checks |
This is a problem, I made a modification, please review it again. |
mattisonchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
|
/pulsarbot run-failure-checks |
4 similar comments
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
* fix No such ledger exception * move currentLedgerEntries and currentLedgerSize * move ledgers.put to operationComplete * fix ledgers.put * use ledgersTmp to write zookkeeper,after sucess,update ldgers * update updateLedgersListAfterRollover * put lh to ledgersTmp * extend buildManagedLedgerInfo * getManagedLedgerInfo(newLedger) after get metadataMutex (cherry picked from commit 2e5fbbc)
* fix No such ledger exception * move currentLedgerEntries and currentLedgerSize * move ledgers.put to operationComplete * fix ledgers.put * use ledgersTmp to write zookkeeper,after sucess,update ldgers * update updateLedgersListAfterRollover * put lh to ledgersTmp * extend buildManagedLedgerInfo * getManagedLedgerInfo(newLedger) after get metadataMutex
* fix No such ledger exception * move currentLedgerEntries and currentLedgerSize * move ledgers.put to operationComplete * fix ledgers.put * use ledgersTmp to write zookkeeper,after sucess,update ldgers * update updateLedgersListAfterRollover * put lh to ledgersTmp * extend buildManagedLedgerInfo * getManagedLedgerInfo(newLedger) after get metadataMutex (cherry picked from commit 2e5fbbc) (cherry picked from commit 41c0cf3)
* fix No such ledger exception * move currentLedgerEntries and currentLedgerSize * move ledgers.put to operationComplete * fix ledgers.put * use ledgersTmp to write zookkeeper,after sucess,update ldgers * update updateLedgersListAfterRollover * put lh to ledgersTmp * extend buildManagedLedgerInfo * getManagedLedgerInfo(newLedger) after get metadataMutex
* fix No such ledger exception * move currentLedgerEntries and currentLedgerSize * move ledgers.put to operationComplete * fix ledgers.put * use ledgersTmp to write zookkeeper,after sucess,update ldgers * update updateLedgersListAfterRollover * put lh to ledgersTmp * extend buildManagedLedgerInfo * getManagedLedgerInfo(newLedger) after get metadataMutex (cherry picked from commit 2e5fbbc)
* fix No such ledger exception * move currentLedgerEntries and currentLedgerSize * move ledgers.put to operationComplete * fix ledgers.put * use ledgersTmp to write zookkeeper,after sucess,update ldgers * update updateLedgersListAfterRollover * put lh to ledgersTmp * extend buildManagedLedgerInfo * getManagedLedgerInfo(newLedger) after get metadataMutex (cherry picked from commit 2e5fbbc)
Motivation
In the log we found No such ledger exception:

The reasons are as follows:

1. Frequent Full GC occurs on the broker, causing zk to time out:
22:29:06.119 [main-EventThread] ERROR org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client connection to the ZooKeeper server has expired!
2.When the create leadger is completed, the ledger will be put into ledgers and update currentLedger,the updateLedgersListAfterRollover method is executed, and the metadata to zookeeper fails to write:
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 1484 to 1489 in 4c958a9
3.In the failure callback method operationFailed, the corresponding ledger will be removed from the ledgers, but the currentLedger still points to the ledger that failed to create, but the ledger will be deleted here through bookKeeper.asyncDeleteLedger. When reading data, it will be read through the currentLedger. Since the ledger has been deleted, the final error is reported: No such ledger
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 1509 to 1522 in 4c958a9
Modifications
After the create ledger is created and the metadata is written to the zookeeper successfully, the currentLedger is updated.
Documentation
Check the box below or label this PR directly.
Need to update docs?
doc-required(Your PR needs to update docs and you will update later)
doc-not-needed(Please explain why)
doc(Your PR contains doc changes)
doc-complete(Docs have been already added)