-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] PulsarLedgerManager to pass correct error code to BK client #16664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eletion of already deleted ledger
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java
Outdated
Show resolved
Hide resolved
…x.getCause().getCause() instanceof Blah
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerBkEnsemblesTests.java
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerBkEnsemblesTests.java
Outdated
Show resolved
Hide resolved
| public static class NotFoundException extends MetadataStoreException { | ||
| public NotFoundException() { | ||
| super((Throwable) null); | ||
| super(makeBkFriendlyException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that MetadataStoreException is about ZooKeeper/Etdc/RocksDB metadata stores.
so NotFound is like "znode does not exist"
why do we need to always inject a BKException as cause ?
can we do it only when we are using PulsarLedgerManager/ManagedLedger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to be more lightweight (no new exception created, static final exception set as a cause, traversing exception chain is fast).
There are reasons why decided to take this route:
With all Java's love for checked exceptions, CompletableFuture in the API can be completed with any exception, hence BK's API implemented in Pulsar returns exceptions that BK cannot handle properly. So there is no way for compiler to strictly enforce API contract that suits BK.
As result, removeLedgerMetadata() just returns whatever exception store.delete() produces etc.
While I can remap the exception there into BK-specific, it can break some Pulsar code (like the callbacks that rely on ex.getCause().getCause() being MetadataStoreException). I'd very much prefer not to go through all code base tracking all possible gotchas as I cannot guarantee that tests will all the cases.
Alternatively I'd have to inject cause the same way I do now but with more steps.
Plus there is MetadataStoreException.unwrap which recreates exception with the message / without the original exception.
Current approach communicates appropriate error to BK so we are no longer getting UnexpectedConditionException in obvious cases and can properly handle basic errors, does not add overhead (again, traversing exception chain is fast), fool-proof enough so we don't have to worry about breaking it all in case LedgerManager is extended or changed, and it does not add mental overhead for Pulsar developers (no need to think about BK errors.
let me know if I am missing something obvious / other way to make this work.
| cursorLedgerDeleteFuture = bkc.newDeleteLedgerOp().withLedgerId(cursor.cursorsLedgerId) | ||
| .execute() | ||
| .handle((result, ex) -> { | ||
| if (ex != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about looking for NotFoundException in the exception chain ?
we should have some utility to traverse the chain and look for a specific class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work, in this case PulsarLedgerManager returns NotFoundException to Bookkeeper's code which remaps it into UnexpectedConditionException for the pulsar callback.
|
#16857 takes over |
Motivation
In some situations it is possible to encounter case when deletion of a ManagedLedger deals with cases of already deleted bookie ledgers.
Such cases currently handled as errors even though they are safe to ignore.
Currently, it is impossible to handle these cases because PulsarManagedLedger returns error that's not mappable into the BK error code end the end user ends up with obscure
UnexpectedConditionException(error code -999) that cannot be distinguished from ledger already deleted case.Modifications
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
If
yeswas chosen, please highlight the changesNothing that I can think of.
BK Error codes can change (on purpose) for the internal components to become more specific but MetadataStoreException's type didn't change.
Documentation
Check the box below or label this PR directly.
Need to update docs?
doc-required(Your PR needs to update docs and you will update later)
doc-not-needed(Please explain why)
doc(Your PR contains doc changes)
doc-complete(Docs have been already added)