Skip to content

Conversation

@ivankelly
Copy link
Contributor

Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.

Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.
@eolivelli
Copy link
Contributor

@sijie PTAL

@eolivelli
Copy link
Contributor

@ivankelly I see several tests failing on CI and IMHO they are due to this patch
Something like:
ERROR] Errors:
[ERROR] org.apache.bookkeeper.test.BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed(org.apache.bookkeeper.test.BookieFailureTest)
[ERROR] Run 1: BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed » BKRead Error while ...
[ERROR] Run 2: BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed » BKRead Error while ...
[ERROR] Run 3: BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed » BKRead Error while ...
[ERROR] org.apache.bookkeeper.test.BookieFailureTest.testLedgerOpenAfterBKCrashed(org.apache.bookkeeper.test.BookieFailureTest)
[ERROR] Run 1: BookieFailureTest.testLedgerOpenAfterBKCrashed » BKLedgerRecovery Error while ...
[ERROR] Run 2: BookieFailureTest.testLedgerOpenAfterBKCrashed » BKLedgerRecovery Error while ...
[ERROR] Run 3: BookieFailureTest.testLedgerOpenAfterBKCrashed » BKLedgerRecovery Error while ...
[ERROR] Tests run: 517, Failures: 0, Errors: 2, Skipped: 7

}

if (numResponsesPending == 0 && !completed) {
int totalExepctedResponse = lh.getLedgerMetadata().getWriteQuorumSize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdhabalia PTAL

DigestManager digestManager = DigestManager.instantiate(ledgerId, ledgerKey,
DigestType.CRC32C,
UnpooledByteBufAllocator.DEFAULT,
true /* useV2 */);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is necessary to use V2 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use V2 internally, so that's what I tested with originally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change it to v3 ?
there is no particular reason to use v2, it does not affect code coverage but it is "legacy"
so in my opinion new code/tests should use latest version

}

/**
* Test for specific bug that was introduced with dcdd1e88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we describe the scenario ?

Maybe referring dcdd1e8 is not useful

covSet.addBookie(2, BKException.Code.NoSuchLedgerExistsException);
covSet.addBookie(3, BKException.Code.UNINITIALIZED);
covSet.addBookie(4, BKException.Code.UNINITIALIZED);
assertFalse(covSet.checkCovered());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dropping this test ?
can't we move this to assertTrue
and maybe add other cases ?

Copy link
Contributor Author

@ivankelly ivankelly Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's completely incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@ivankelly
Copy link
Contributor Author

@eolivelli I need to take a look at those test failures. it doesn't fail on our internal branch, so either the failing tests have been added since, or they're flakes which we've disabled internally.

I'll try to get to it before end of week, but pretty low on cycles.

@eolivelli
Copy link
Contributor

btw Overall I am +1 with this change

@eolivelli
Copy link
Contributor

@fpj @sijie @jvrao @merlimat @jiazhai @reddycharan to you want to take a look and comment ?

int nodesNotCovered = 0;
int nodesOkay = 0;
int nodesUninitialized = 0;
/* Nodes which have either responded with an error other than NoSuch{Entry,Ledger},
Copy link
Contributor

@rdhabalia rdhabalia Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am missing something here So, I have a question. While doing a ledger recovery, bk client waits for response from (Qw - Qa) + 1 bookies
So, if we have ledger with E=3, W=2, A=2 and if bk-client receives ack from one of the bookie then: Qw-Qa+1 = (2-2+1) = 1 >= Response (1).
So, checkCovered() should return true.
However, with this change it fails on such useacase:
eg.

RoundRobinDistributionSchedule schedule = new RoundRobinDistributionSchedule(
        2, 2, 3);
Set<Integer> resultSet = Sets.newHashSet(BKException.Code.OK,
        BKException.Code.UNINITIALIZED, BKException.Code.UNINITIALIZED);
DistributionSchedule.QuorumCoverageSet covSet = schedule.getCoverageSet();
int index =0;
for (Integer i : resultSet) {
    covSet.addBookie(index++, i);
}
boolean covSetSays = covSet.checkCovered();
assertTrue(covSetSays);

So, can you please confirm the above assumption is correct or am I missing anything here?
and can't we just check Qw-Qa+1 in this method:

public synchronized boolean checkCovered() {
int nodesUnknown = 0;
for (int i = 0; i < covered.length; i++) {
    if (covered[i] != BKException.Code.OK
            && covered[i] != BKException.Code.NoSuchEntryException
            && covered[i] != BKException.Code.NoSuchLedgerExistsException) {
            nodesUnknown++;
        }
}
int expectedKnownNodes = (writeQuorumSize - ackQuorumSize) + 1;
return (ensembleSize - nodesUnknown) >= expectedKnownNodes;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage is not Qw - Qa + 1. It's at least one bookie from all possible ack quorums.

Take your example above. You have b1=OK, b2=Unknown, b3=Unknown. But an entry could possibly be written to b2 & b3 and have attained ack quorum. So we need to check all write sets, of which there are |ensemble|, since we are roundrobining.

Within an individual writeset, we only need to hear from (Qw - Qa) + 1 nodes, which is what the change submitted is doing.

@eolivelli
Copy link
Contributor

@ivankelly you got a lot of +1s
can you please check and fix the tests ?
this way we can merge this precious patch

it would be good to cherry pick to 4.10.1 as well

This is a qualified blocker for 4.11 release, that I hope we can cut soon

@eolivelli
Copy link
Contributor

please also rebase on top of current master, we fixed a bunch of things about integration tests

@eolivelli
Copy link
Contributor

closing and reopnening in order to trigger ci again

@eolivelli eolivelli closed this Apr 30, 2020
@eolivelli eolivelli reopened this Apr 30, 2020
@eolivelli
Copy link
Contributor

You also have checkstyle errors:

Downloaded from central: https://repo.maven.apache.org/maven2/org/rocksdb/rocksdbjni/5.13.1/rocksdbjni-5.13.1.jar (14 MB at 18 MB/s)
[ERROR] src/test/java/org/apache/bookkeeper/client/ReadLastConfirmedOpTest.java:[24] (imports) ImportOrder: Import com.google.common.collect.Lists appears after other imports that it should precede
[ERROR] src/test/java/org/apache/bookkeeper/client/ReadLastConfirmedOpTest.java:[30] (imports) ImportOrder: Import org.apache.bookkeeper.proto.DataFormats.LedgerMetadataFormat.DigestType appears after other imports that it should precede
[ERROR] src/test/java/org/apache/bookkeeper/client/ReadLastConfirmedOpTest.java:[40] (javadoc) JavadocType: Missing a Javadoc comment.
[ERROR] src/test/java/org/apache/bookkeeper/client/ReadLastConfirmedOpTest.java:[62] (javadoc) JavadocStyle: First sentence should end with a period.
[ERROR] src/test/java/org/apache/bookkeeper/client/ReadLastConfirmedOpTest.java:[79,46] (naming) ParameterName: Name '_ledgerId' must match pattern '^[a-z][a-zA-Z0-9]*$'.
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.0.0:check (default-cli) on project bookkeeper-server: You have 5 Checkstyle violations. -> [Help 1]

If you do not have much time @ivankelly I can pickup the patch and fix all of the boilerplate

@eolivelli
Copy link
Contributor

Failing tests:
[```
ERROR] Errors:
[ERROR] org.apache.bookkeeper.test.BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed(org.apache.bookkeeper.test.BookieFailureTest)
[ERROR] Run 1: BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed » BKRead Error while ...
[ERROR] Run 2: BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed » BKRead Error while ...
[ERROR] Run 3: BookieFailureTest.testLedgerNoRecoveryOpenAfterBKCrashed » BKRead Error while ...
[ERROR] org.apache.bookkeeper.test.BookieFailureTest.testLedgerOpenAfterBKCrashed(org.apache.bookkeeper.test.BookieFailureTest)
[ERROR] Run 1: BookieFailureTest.testLedgerOpenAfterBKCrashed » BKLedgerRecovery Error while ...
[ERROR] Run 2: BookieFailureTest.testLedgerOpenAfterBKCrashed » BKLedgerRecovery Error while ...
[ERROR] Run 3: BookieFailureTest.testLedgerOpenAfterBKCrashed » BKLedgerRecovery Error while ...
[ERROR] Run 1: ReadOnlyBookieTest.testBookieContinueWritingIfMultipleLedgersPresent » BKNotEnoughBookies

@eolivelli
Copy link
Contributor

This change is related to #1381 from @reddycharan

@eolivelli
Copy link
Contributor

eolivelli commented May 17, 2020

@ivankelly I have picked up this patch and cleaned in up, in order to make checkstyle and tests pass
please check #2333

my addition is just
55607a8

sijie pushed a commit that referenced this pull request May 19, 2020
The original patch was contributed by ivankelly in PR #2303, I have only fixed checkstyle and removed two tests that were wrong.

Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.

Reviewers: Sijie Guo <None>, Rajan Dhabalia <rdhabalia@apache.org>

This closes #2333 from eolivelli/pr2303
@eolivelli eolivelli closed this May 19, 2020
@eolivelli
Copy link
Contributor

@sijie merged #2333
thank you @ivankelly !

Ghatage pushed a commit to Ghatage/bookkeeper that referenced this pull request Jun 19, 2020
The original patch was contributed by ivankelly in PR apache#2303, I have only fixed checkstyle and removed two tests that were wrong.

Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.

Reviewers: Sijie Guo <None>, Rajan Dhabalia <rdhabalia@apache.org>

This closes apache#2333 from eolivelli/pr2303
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants