-
Notifications
You must be signed in to change notification settings - Fork 7.3k
[ZOOKEEPER-3642] Fix potential data inconsistency due to DIFF sync after partial SNAP sync #1224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ter partial SNAP sync
|
@eolivelli @hanm interested to take a look at this, the potential issue is mentioned in the description of the Jira. |
hanm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with two nits.
| public synchronized void shutdown(boolean fullyShutDown) { | ||
| if (!canShutdown()) { | ||
| if (fullyShutDown && zkDb != null) { | ||
| zkDb.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me how we could end up in this code path - if we are here then the shutdown must have been called and that specific shut down must not be a full shutdown. Do we have a concrete example when we will hit this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the follower exited before finish syncing with leader, the ZK server will not start, canShutdown will return false, but we still need to clear the db here.
| new byte[1], Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); | ||
|
|
||
| // wait some time to let this get written to disk | ||
| Thread.sleep(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general we advocate not using sleep (at least, not using sleep alone) in unit tests as it's not reliable and proved to be a source of flaky-ness. Is there a better approach to wait for the log being flushed to disk (preferably also verify that fact in test)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a reasonable concern, I'll try to find ways to check that in the test here.
|
@eolivelli do you want to give a look at this given you were tagged? If no issues I will merge this once we get a green build on per-merge job (I think my nits are addressed - the sleeping in test is not ideal but I would not consider it's blocking). |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@hanm please go ahead and merge
|
ok, I will merge after CI is back to good shape. |
|
thanks for quick fix on CI issue @eolivelli ; looks like the test needs update to work with latest master. I'll do a rebase if @lvfangmin didn't have a chance to rebase. |
|
thank you @hanm |
|
@eolivelli #1515 is rebased version, thanks. |
…er partial SNAP sync. Based on #1224 ; fixed unit test build issue. Author: Fangmin Lyu <fangmin@apache.org> Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Originally developed by Fangmin Lyu <fangmin@apache.org> Closes #1515 from hanm/ZOOKEEPER-3642
…er partial SNAP sync. Based on #1224 ; fixed unit test build issue. Author: Fangmin Lyu <fangmin@apache.org> Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Originally developed by Fangmin Lyu <fangmin@apache.org> Closes #1515 from hanm/ZOOKEEPER-3642 (cherry picked from commit a53cfeb)
…er partial SNAP sync. Based on apache#1224 ; fixed unit test build issue. Author: Fangmin Lyu <fangmin@apache.org> Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Originally developed by Fangmin Lyu <fangmin@apache.org> Closes apache#1515 from hanm/ZOOKEEPER-3642
…er partial SNAP sync. Based on apache#1224 ; fixed unit test build issue. Author: Fangmin Lyu <fangmin@apache.org> Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Originally developed by Fangmin Lyu <fangmin@apache.org> Closes apache#1515 from hanm/ZOOKEEPER-3642
…er partial SNAP sync. Based on apache#1224 ; fixed unit test build issue. Author: Fangmin Lyu <fangmin@apache.org> Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Originally developed by Fangmin Lyu <fangmin@apache.org> Closes apache#1515 from hanm/ZOOKEEPER-3642
…er partial SNAP sync. Based on apache#1224 ; fixed unit test build issue. Author: Fangmin Lyu <fangmin@apache.org> Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Originally developed by Fangmin Lyu <fangmin@apache.org> Closes apache#1515 from hanm/ZOOKEEPER-3642
|
Superceded by #1515. |
No description provided.