-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
usercaseImportant user case type labelImportant user case type label
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
1.0.0
What's Wrong?
Load data with error:
Failed to commit txn xxx, Tablet [xxxx] success replica num 0 is less than quorum replica ...
And the replica state is:
One replica is in state DECOMMISSION, but version is correct, eg, 21.
Another replica is in state NORMAL, but version is stale, eg, 20, and last failed version is 21.
This is table with replication_num = 1. 3 BEs.
What You Expected?
Replica can be recovered automatically and load can be succeed later.
How to Reproduce?
The following step may lead to the error:
0. Tablet 10000 with 1 replica 10001 on Backend A, version is 20.
- Begin transaction 100, which is about to write version 21.
- begin a balance clone task, to clone from Backend A to Backend B.
- clone task finished, now there are 2 replica(10001, and 10002) with version 20 on Backend A and B.
- Tablet 10000 being scheduled again, and set replica 10001's state to DECOMMISSION.
- Transaction 100 finished, and set replica 10001's version to 21. But 10002 failed to load, so its version remains 20.
- For now, there are 2 replicas, one is in state DECOMMISSION and version is 21, one is in state NORMAL with version 20.
- The following load job can not find a normal replica, so load will be failed.
In this situation, we can only restart FE to recover.
Anything Else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
usercaseImportant user case type labelImportant user case type label