Example to show infinity election when perform membership change.#943
Example to show infinity election when perform membership change.#943wojiaodoubao wants to merge 3 commits intoapache:masterfrom
Conversation
|
Thanks @wojiaodoubao for this detailed example! This case indeed may lead to indefinite unavailability. It seems like a nightmare to me :( Maybe we can allow a whitelist for request vote, as proposed in previous thread https://lists.apache.org/thread/tt1j3jkogh71k2hvq5gtltwmphxfy736. |
|
@wojiaodoubao , thanks for testing Ratis!
We probably should not allow changing a major of peers at the same time. It should replace the node one by one. |
|
Thanks @SzyWilliam @szetszwo for your great reply! IMHO, the root cause is 'We can't find any majority without new peers in C_new'.
Replace/add/remove one by one can fit most cases. But fails in case 1, change from {n0} to {n0, n1}. The solution might be:
|
|
The solution 3 seems nice. The reason of shutdown when NOT_IN_CONF is according to this #560 -- It may cause the failure of shutdowning a NOT_IN_CONF peer who still requests vote after the remove. According to the annotation below, we only shutdown candidate whose id is not included in conf when the conf is stable. ratis/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java Lines 1333 to 1341 in 639f1bf So maybe we should change the NOT_IN_CONF handler of request vote reply regardless of whether to use solution 3 or not. |
|
I find out that the NOT_IN_CONF is only generated when the candidate finds itself is not in conf when requesting vote, which is not related to what we discussed. Peer who receives vote request replies shutdown when |
|
Hi @Brokenice0415 , thanks your nice advice and explanation!
I made a test to simulate this case. The new peer did shutdown. Please have a look at TestMembership#testShutdown. |
@wojiaodoubao , This case should work well. How could it be bad? |
|
@wojiaodoubao, thanks for your case explanation! However, it seems that the new peer 3 is not added to the new configuration by client, meaning the test case only shows a new peer which is not in C_old will shutdown after asking old peers for vote. I tried to add So maybe the shutdown is reasonable. If a new peer starts for a long time but the cluster does not receive the request to change conf, it should shutdown. And in solution 3, the crash may not happen if client sets up a new conf in time. |
| } | ||
|
|
||
| // 3. Start {node-3} with {C_old, peer}. | ||
| servers[3] = startServer(RaftGroup.valueOf(GROUP_ID, peers[0], peers[1], peers[2], peers[3]), peers[3], RaftStorage.StartupOption.FORMAT, false); |
There was a problem hiding this comment.
client.admin().setConfiguration(Arrays.copyOfRange(peers, 0, 4)) is needed if making peer 3 in real C_new.
Hi @szetszwo , sorry for my late response. I made a test to simulate this case. Please have a look at TestMembership#testAddOneToOne. The procedure is as below.
I think the reason is: All new peers' conf are empty while the peers in C_old has became transitional conf(C_old_and_C_new). Peer with transitional conf needs votes from both C_old majority and C_new majority. But 'We can't find any majority without new peers in C_new'. That's why we won' get new leader. |
|
Hi @Brokenice0415, thanks your detailed explanation! If I understand correctly, 'setConfiguration before staring new peers' is right for most cases. There might be a small corner case as I'll descried below. Please help review whether it exists, thanks. I'll try to simulate it later. Firstly, we know 3 RaftServer behaviors:
The corner case is below. There is a race condition between 'leader appendEntries to peer 3' and 'peer 3 asks for vote from leader'.
|
|
Hi @wojiaodoubao , I think that race may happen. I think it's difficult to distinguish the new peer from the removed old one when receiving vote request from a peer not in conf. How about just restarting the crashed new peer, since the network partition has recovered and there is little likelihood of the partition happening again in such a short period of time? If the partition does happen again and the restarted new peer crashes again, maybe we should give up to add this unstable peer to the new conf? |
|
@wojiaodoubao , When there is NO PROGRESS, setConf will fail. However, if there is no leader, they may not be able to elect one. Good catch on the bug! |
If peer 3 starts with an empty conf, then it won't ask for vote. |
This is an important case since it is for changing from non-HA to HA. We may:
For the other cases, we should somehow time out the in-progress setConf. |
Hi @Brokenice0415 , thanks your nice comments. I think this is a good solution. Firstly it's very simple. We don't need to change any code. Providing an example and doc should be enough. Secondly, it's flexible. Allowing any conf updates as described in raft paper (JOINT consensus). I'm new to ratis and I'm worried my view might be a bit narrow. Could @szetszwo @SzyWilliam you kindly help giving more advices on this solution (Starting new peer with conf {C_old, peer_itself}) ? I also considered @szetszwo 's suggestion.
Based on the idea, I think we can set up 2 rules:
By rule 1, we reject all cases that may lead to infinite leader election, except 'adding 1 new peer to a 1 peer cluster'. By rule 2, we fix the 'adding 1 new peer to a 1 peer cluster' case. We can prove the procedure won't cause split brain and infinite leader election. The combination of new peer and old peer configurations are shown below.
The advantages are:
|
Restarting is a good workaround. However, it needs someone (a human or a program) to monitor it. Otherwise, the cluster becomes unavailable due to failing to elect a leader. If we timeout the setConf, then the cluster can recover automatically.
Move generally, we should allow changing a minority set of peers in one setCont command, except for setConf from 1 peer to 2 peers. |
|
Thanks @szetszwo your nice suggestion! Let me try to fix it. |
|
@wojiaodoubao , thanks for working on the fix! Will review it.
For removing peers, it seems okay to allow removing any number of peers. The problem is only in adding or replacing peers. |
|
Closing this since #954 is merged. |
What changes were proposed in this pull request?
This is a patch showing a bug case. Supposing we want to replace 2 peers from a 3 peers cluster. We start the 2 new peers with empty conf. This can cause an infinity election and make the cluster unavailable forever.
What is the link to the Apache JIRA
RATIS-1912
https://issues.apache.org/jira/browse/RATIS-1912
How was this patch tested?
It is not tested.