-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
kind/fixCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
trunk 79fd117
What's Wrong?
At some time, we may notice that the state of a replica of a tablet has been in the DECOMMISSION state and cannot be restored.
And this tablet belongs to a colocation table. As a result, the colocation group where the colocation table is located is always in an unstable state, and the colocation plan cannot be performed.
What You Expected?
The replica state should become NORMAL after some time. And the colocation group should become STABLE.
How to Reproduce?
Hard to reproduce.
You need to have multi colocation table with high frequency load.
It may happen as follows:
- A tablet of colocation table is in COLOCATION_REDUNDANT state
- The tablet is being scheduled and set one of replica as DECOMMISSION in TabletScheduler.deleteReplicaInternal()
- The tablet will then be scheduled again
- But at that time, the BE node of the replica that was
set to the DECOMMISSION state in step 2 is returned to the colocation group.
So the tablet's health status becomes VERSION_INCOMPLETE. (because replica is DECOMMISSION state do not allow load) - However, because the replica in the DECOMMISSION state will not receive the load task, the health status of this tablet will always be VERSION INCOMPLETE.
Anything Else?
Why:
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
kind/fixCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.