Skip to content

Conversation

@gengjun-git
Copy link
Contributor

Fix bug #4935

当前策略:

每个group中维护一个bucketId所在的be列表:backendSeq

线程每隔20s:

  1. 检测backendSeq中是否有be不可用,如果有,则选择可用的be将其在backendSeq中替换

  2. 检测group中的tablet是否与backendSeq相匹配,如果不匹配,将group设置为unstable,并且执行迁移任务

  3. 对处于stable状态的group进行均衡:根据backendSeq计算所有be中bucketId的数目,从bucketId占有高的be迁移到bucketId占有低的be。此处只更新backendSeq,实际执行迁移任务在第2步。

存在的问题:

如果在相同的时间down掉比较多的be,在第1步中,会将这些be从backendSeq中移除,并且第2步检测到backendSeq不匹配,将group标记为unstable,但是如果现有的be磁盘不能容纳down掉的be上的所有tablet,此时group会一直处于unstable状态,即使再加入新的be,也不能触发第3步,因为第3步只会在group是stable状态下才能执行。

策略更改:

将现有策略的1和3融合成一个过程:

  1. 首先检测backendSeq中是否存在不可用的be,均衡时,优先迁移不可用be的bucketId到buckedId占有低的be,其次再从bucketId占有高的be迁移到bucketId占有低的be。

  2. 同现有策略

@kangkaisen kangkaisen added kind/fix Categorizes issue or PR as related to a bug. area/balance Issues or PRs related to data balance labels Nov 21, 2020
Copy link
Contributor

@kangkaisen kangkaisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kangkaisen kangkaisen added the approved Indicates a PR has been approved by one committer. label Nov 21, 2020
@morningman morningman merged commit 37a6731 into apache:master Nov 22, 2020
@yangzhg yangzhg mentioned this pull request Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/balance Issues or PRs related to data balance kind/fix Categorizes issue or PR as related to a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants