Current problem
The current tablet repair and balance is based on CloneChecker, which is a fixed scheduler daemon thread. It checks all tablets at fixed period, and try to repair at most Config.clone_max_job_num tablets in one round.
- It is very slow because it just run at a fix period even if there is no clone task running now.
- User specify the priority of tablet repair, which tablet may be repaired is just a random choice.
- No way to check the tablet repair progress.
- Heavy coupling with other routine job such as ALTER.
The new tablet repair framework will solve there problem