Skip to content

Conversation

@dlg99
Copy link
Contributor

@dlg99 dlg99 commented Jul 19, 2022

Motivation

In some situations it is possible to encounter case when deletion of a ManagedLedger deals with cases of already deleted bookie ledgers.
Such cases currently handled as errors even though they are safe to ignore.
Currently, it is impossible to handle these cases because PulsarManagedLedger returns error that's not mappable into the BK error code end the end user ends up with obscure UnexpectedConditionException (error code -999) that cannot be distinguished from ledger already deleted case.

Modifications

  1. Made PulsarManagedLedger compatible with BK erros so BK client has a chance to return correct error. I didn't do this for all MetadataStoreException, just for the ones where it made sense.
  2. Ignored NoSuchLedgerExistsOnMetadataServerException on delete as it is safe there

Verifying this change

This change added tests and can be verified as follows:

  • Added unit tests

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Nothing that I can think of.
BK Error codes can change (on purpose) for the internal components to become more specific but MetadataStoreException's type didn't change.

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 19, 2022
@dlg99 dlg99 marked this pull request as draft July 19, 2022 04:31
public static class NotFoundException extends MetadataStoreException {
public NotFoundException() {
super((Throwable) null);
super(makeBkFriendlyException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that MetadataStoreException is about ZooKeeper/Etdc/RocksDB metadata stores.
so NotFound is like "znode does not exist"

why do we need to always inject a BKException as cause ?

can we do it only when we are using PulsarLedgerManager/ManagedLedger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to be more lightweight (no new exception created, static final exception set as a cause, traversing exception chain is fast).

There are reasons why decided to take this route:
With all Java's love for checked exceptions, CompletableFuture in the API can be completed with any exception, hence BK's API implemented in Pulsar returns exceptions that BK cannot handle properly. So there is no way for compiler to strictly enforce API contract that suits BK.

As result, removeLedgerMetadata() just returns whatever exception store.delete() produces etc.
While I can remap the exception there into BK-specific, it can break some Pulsar code (like the callbacks that rely on ex.getCause().getCause() being MetadataStoreException). I'd very much prefer not to go through all code base tracking all possible gotchas as I cannot guarantee that tests will all the cases.
Alternatively I'd have to inject cause the same way I do now but with more steps.
Plus there is MetadataStoreException.unwrap which recreates exception with the message / without the original exception.

Current approach communicates appropriate error to BK so we are no longer getting UnexpectedConditionException in obvious cases and can properly handle basic errors, does not add overhead (again, traversing exception chain is fast), fool-proof enough so we don't have to worry about breaking it all in case LedgerManager is extended or changed, and it does not add mental overhead for Pulsar developers (no need to think about BK errors.

let me know if I am missing something obvious / other way to make this work.

@dlg99 dlg99 marked this pull request as ready for review July 20, 2022 15:25
cursorLedgerDeleteFuture = bkc.newDeleteLedgerOp().withLedgerId(cursor.cursorsLedgerId)
.execute()
.handle((result, ex) -> {
if (ex != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about looking for NotFoundException in the exception chain ?
we should have some utility to traverse the chain and look for a specific class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work, in this case PulsarLedgerManager returns NotFoundException to Bookkeeper's code which remaps it into UnexpectedConditionException for the pulsar callback.

@dlg99
Copy link
Contributor Author

dlg99 commented Jul 28, 2022

#16857 takes over

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants