-
Notifications
You must be signed in to change notification settings - Fork 963
QuorumCoverage should only count unknown nodes #2333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Quorum coverage checks if we have heard from enough nodes to know that there is no entry that can have been written to enough nodes that we haven't heard from to have formed an ack quorum. The coverage algorithm was correct pre-5e399df. 5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble Change) broke this, but it still seems to have worked because they had a broken else statement at the end. Why a change which is 100% about the write-path changed something in the read-path is a mystery. dcdd1e(Small fix wrong nodesUninitialized count when checkCovered) went on to fix the broken fix, so the whole thing ended up broke. The change also modifies ReadLastConfirmedOp to make it testable.
|
@sijie @rdhabalia @merlimat you already approved the original patch Please any of you take a look and confirm your approval |
rdhabalia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM..
|
It looks like I still have a test to fix: |
|
@sijie did all CI tests pass this time ? |
|
@eolivelli all CI tests passed. |
|
hi @eolivelli, I have a question that why remove following code snippet? thanks After i learning the mechanism of bookkeeper, i tried to fix this flaky test case. |
|
That test was wrong. |
right, 💯 |
The original patch was contributed by ivankelly in PR apache#2303, I have only fixed checkstyle and removed two tests that were wrong. Quorum coverage checks if we have heard from enough nodes to know that there is no entry that can have been written to enough nodes that we haven't heard from to have formed an ack quorum. The coverage algorithm was correct pre-5e399df. 5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble Change) broke this, but it still seems to have worked because they had a broken else statement at the end. Why a change which is 100% about the write-path changed something in the read-path is a mystery. dcdd1e(Small fix wrong nodesUninitialized count when checkCovered) went on to fix the broken fix, so the whole thing ended up broke. The change also modifies ReadLastConfirmedOp to make it testable. Reviewers: Sijie Guo <None>, Rajan Dhabalia <rdhabalia@apache.org> This closes apache#2333 from eolivelli/pr2303
It would be great to update branch-2.6 to a stable version of bk. so some issues could be fixed. e.g. apache/bookkeeper#2333
|
just fyi: fix: #2333 |

The original patch was contributed by @ivankelly in PR #2303, I have only fixed checkstyle and removed two tests that were wrong.
Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.
The coverage algorithm was correct pre-5e399df.
5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.
dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.
The change also modifies ReadLastConfirmedOp to make it testable.