[BUG] Fix Colocate table balance bug #4936
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix bug #4935
当前策略:
每个group中维护一个bucketId所在的be列表:backendSeq
线程每隔20s:
检测backendSeq中是否有be不可用,如果有,则选择可用的be将其在backendSeq中替换
检测group中的tablet是否与backendSeq相匹配,如果不匹配,将group设置为unstable,并且执行迁移任务
对处于stable状态的group进行均衡:根据backendSeq计算所有be中bucketId的数目,从bucketId占有高的be迁移到bucketId占有低的be。此处只更新backendSeq,实际执行迁移任务在第2步。
存在的问题:
如果在相同的时间down掉比较多的be,在第1步中,会将这些be从backendSeq中移除,并且第2步检测到backendSeq不匹配,将group标记为unstable,但是如果现有的be磁盘不能容纳down掉的be上的所有tablet,此时group会一直处于unstable状态,即使再加入新的be,也不能触发第3步,因为第3步只会在group是stable状态下才能执行。
策略更改:
将现有策略的1和3融合成一个过程:
首先检测backendSeq中是否存在不可用的be,均衡时,优先迁移不可用be的bucketId到buckedId占有低的be,其次再从bucketId占有高的be迁移到bucketId占有低的be。
同现有策略