Skip to content

Conversation

@eolivelli
Copy link
Contributor

The original patch was contributed by @ivankelly in PR #2303, I have only fixed checkstyle and removed two tests that were wrong.

Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.

Ivan Kelly and others added 2 commits May 17, 2020 16:28
Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.
@eolivelli
Copy link
Contributor Author

@sijie @rdhabalia @merlimat you already approved the original patch

Please any of you take a look and confirm your approval
my addition is
55607a8

Copy link
Contributor

@rdhabalia rdhabalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM..

@eolivelli
Copy link
Contributor Author

It looks like I still have a test to fix:

[ERROR] testZoneawarePlacementPolicyCheck(org.apache.bookkeeper.replication.AuditorPlacementPolicyCheckTest)  Time elapsed: 61.135 s  <<< FAILURE!
java.lang.AssertionError: placementPolicyCheck should have executed
	at org.apache.bookkeeper.replication.AuditorPlacementPolicyCheckTest.startAuditorAndWaitForPlacementPolicyCheck(AuditorPlacementPolicyCheckTest.java:684)
	at org.apache.bookkeeper.replication.AuditorPlacementPolicyCheckTest.testZoneawarePlacementPolicyCheck(AuditorPlacementPolicyCheckTest.java:628)

@sijie sijie added this to the 4.11.0 milestone May 19, 2020
@sijie sijie merged commit f373cb5 into apache:master May 19, 2020
@eolivelli eolivelli deleted the pr2303 branch May 19, 2020 06:18
@eolivelli
Copy link
Contributor Author

@sijie did all CI tests pass this time ?
last time I saw that testZoneawarePlacementPolicyCheck did not pass.
I wouldn't want that we have introduced a new annoying flaky test

@sijie
Copy link
Member

sijie commented May 19, 2020

@eolivelli all CI tests passed.

@lamberken
Copy link
Member

lamberken commented May 20, 2020

hi @eolivelli, I have a question that why remove following code snippet? thanks

After i learning the mechanism of bookkeeper, i tried to fix this flaky test case.
please go ahead with #2337 thanks : )

image

@eolivelli
Copy link
Contributor Author

That test was wrong.
It used ensemble size = 4 and write quorum size = 1.
Then it shutdown a bookie.
If it shuts down the only bookie that contains the entry the test will fail

@lamberken
Copy link
Member

That test was wrong.
It used ensemble size = 4 and write quorum size = 1.
Then it shutdown a bookie.
If it shuts down the only bookie that contains the entry the test will fail

right, 💯

Ghatage pushed a commit to Ghatage/bookkeeper that referenced this pull request Jun 19, 2020
The original patch was contributed by ivankelly in PR apache#2303, I have only fixed checkstyle and removed two tests that were wrong.

Quorum coverage checks if we have heard from enough nodes to know that
there is no entry that can have been written to enough nodes that we
haven't heard from to have formed an ack quorum.

The coverage algorithm was correct pre-5e399df.

5e399df(BOOKKEEPER-759: Delay Ensemble Change & Disable Ensemble
Change) broke this, but it still seems to have worked because they had
a broken else statement at the end. Why a change which is 100% about
the write-path changed something in the read-path is a mystery.

dcdd1e(Small fix wrong nodesUninitialized count when checkCovered)
went on to fix the broken fix, so the whole thing ended up broke.

The change also modifies ReadLastConfirmedOp to make it testable.

Reviewers: Sijie Guo <None>, Rajan Dhabalia <rdhabalia@apache.org>

This closes apache#2333 from eolivelli/pr2303
codelipenghui pushed a commit to apache/pulsar that referenced this pull request Nov 18, 2020
It would be great to update branch-2.6 to a stable version of bk. so some issues could be fixed.
e.g. apache/bookkeeper#2333
@rdhabalia
Copy link
Contributor

just fyi: fix: #2333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants