-
Notifications
You must be signed in to change notification settings - Fork 963
QuorumCoverage should only count unknown nodes #2303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Quorum coverage checks if we have heard from enough nodes to know that there is no entry that can have been written to enough nodes that we haven't heard from to have formed an ack quorum. The coverage algorithm was correct pre-5e399df. 5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble Change) broke this, but it still seems to have worked because they had a broken else statement at the end. Why a change which is 100% about the write-path changed something in the read-path is a mystery. dcdd1e(Small fix wrong nodesUninitialized count when checkCovered) went on to fix the broken fix, so the whole thing ended up broke. The change also modifies ReadLastConfirmedOp to make it testable.
d6659b5 to
9dd48b0
Compare
|
@sijie PTAL |
|
@ivankelly I see several tests failing on CI and IMHO they are due to this patch |
| } | ||
|
|
||
| if (numResponsesPending == 0 && !completed) { | ||
| int totalExepctedResponse = lh.getLedgerMetadata().getWriteQuorumSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdhabalia PTAL
| DigestManager digestManager = DigestManager.instantiate(ledgerId, ledgerKey, | ||
| DigestType.CRC32C, | ||
| UnpooledByteBufAllocator.DEFAULT, | ||
| true /* useV2 */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is necessary to use V2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use V2 internally, so that's what I tested with originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change it to v3 ?
there is no particular reason to use v2, it does not affect code coverage but it is "legacy"
so in my opinion new code/tests should use latest version
| } | ||
|
|
||
| /** | ||
| * Test for specific bug that was introduced with dcdd1e88 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we describe the scenario ?
Maybe referring dcdd1e8 is not useful
| covSet.addBookie(2, BKException.Code.NoSuchLedgerExistsException); | ||
| covSet.addBookie(3, BKException.Code.UNINITIALIZED); | ||
| covSet.addBookie(4, BKException.Code.UNINITIALIZED); | ||
| assertFalse(covSet.checkCovered()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dropping this test ?
can't we move this to assertTrue
and maybe add other cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's completely incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
@eolivelli I need to take a look at those test failures. it doesn't fail on our internal branch, so either the failing tests have been added since, or they're flakes which we've disabled internally. I'll try to get to it before end of week, but pretty low on cycles. |
|
btw Overall I am +1 with this change |
| int nodesNotCovered = 0; | ||
| int nodesOkay = 0; | ||
| int nodesUninitialized = 0; | ||
| /* Nodes which have either responded with an error other than NoSuch{Entry,Ledger}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am missing something here So, I have a question. While doing a ledger recovery, bk client waits for response from (Qw - Qa) + 1 bookies
So, if we have ledger with E=3, W=2, A=2 and if bk-client receives ack from one of the bookie then: Qw-Qa+1 = (2-2+1) = 1 >= Response (1).
So, checkCovered() should return true.
However, with this change it fails on such useacase:
eg.
RoundRobinDistributionSchedule schedule = new RoundRobinDistributionSchedule(
2, 2, 3);
Set<Integer> resultSet = Sets.newHashSet(BKException.Code.OK,
BKException.Code.UNINITIALIZED, BKException.Code.UNINITIALIZED);
DistributionSchedule.QuorumCoverageSet covSet = schedule.getCoverageSet();
int index =0;
for (Integer i : resultSet) {
covSet.addBookie(index++, i);
}
boolean covSetSays = covSet.checkCovered();
assertTrue(covSetSays);
So, can you please confirm the above assumption is correct or am I missing anything here?
and can't we just check Qw-Qa+1 in this method:
public synchronized boolean checkCovered() {
int nodesUnknown = 0;
for (int i = 0; i < covered.length; i++) {
if (covered[i] != BKException.Code.OK
&& covered[i] != BKException.Code.NoSuchEntryException
&& covered[i] != BKException.Code.NoSuchLedgerExistsException) {
nodesUnknown++;
}
}
int expectedKnownNodes = (writeQuorumSize - ackQuorumSize) + 1;
return (ensembleSize - nodesUnknown) >= expectedKnownNodes;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage is not Qw - Qa + 1. It's at least one bookie from all possible ack quorums.
Take your example above. You have b1=OK, b2=Unknown, b3=Unknown. But an entry could possibly be written to b2 & b3 and have attained ack quorum. So we need to check all write sets, of which there are |ensemble|, since we are roundrobining.
Within an individual writeset, we only need to hear from (Qw - Qa) + 1 nodes, which is what the change submitted is doing.
|
@ivankelly you got a lot of +1s it would be good to cherry pick to 4.10.1 as well This is a qualified blocker for 4.11 release, that I hope we can cut soon |
|
please also rebase on top of current master, we fixed a bunch of things about integration tests |
|
closing and reopnening in order to trigger ci again |
|
You also have checkstyle errors: If you do not have much time @ivankelly I can pickup the patch and fix all of the boilerplate |
|
Failing tests: |
|
This change is related to #1381 from @reddycharan |
|
@ivankelly I have picked up this patch and cleaned in up, in order to make checkstyle and tests pass my addition is just |
The original patch was contributed by ivankelly in PR #2303, I have only fixed checkstyle and removed two tests that were wrong. Quorum coverage checks if we have heard from enough nodes to know that there is no entry that can have been written to enough nodes that we haven't heard from to have formed an ack quorum. The coverage algorithm was correct pre-5e399df. 5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble Change) broke this, but it still seems to have worked because they had a broken else statement at the end. Why a change which is 100% about the write-path changed something in the read-path is a mystery. dcdd1e(Small fix wrong nodesUninitialized count when checkCovered) went on to fix the broken fix, so the whole thing ended up broke. The change also modifies ReadLastConfirmedOp to make it testable. Reviewers: Sijie Guo <None>, Rajan Dhabalia <rdhabalia@apache.org> This closes #2333 from eolivelli/pr2303
|
@sijie merged #2333 |
The original patch was contributed by ivankelly in PR apache#2303, I have only fixed checkstyle and removed two tests that were wrong. Quorum coverage checks if we have heard from enough nodes to know that there is no entry that can have been written to enough nodes that we haven't heard from to have formed an ack quorum. The coverage algorithm was correct pre-5e399df. 5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble Change) broke this, but it still seems to have worked because they had a broken else statement at the end. Why a change which is 100% about the write-path changed something in the read-path is a mystery. dcdd1e(Small fix wrong nodesUninitialized count when checkCovered) went on to fix the broken fix, so the whole thing ended up broke. The change also modifies ReadLastConfirmedOp to make it testable. Reviewers: Sijie Guo <None>, Rajan Dhabalia <rdhabalia@apache.org> This closes apache#2333 from eolivelli/pr2303
Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.
The coverage algorithm was correct pre-5e399df.
5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.
dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.
The change also modifies ReadLastConfirmedOp to make it testable.