-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Rebalancer] support partition rebalancer #5010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fe/fe-core/src/main/java/org/apache/doris/clone/PartitionRebalancer.java
Show resolved
Hide resolved
docs/zh-CN/administrator-guide/operation/tablet-repair-and-balance.md
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/MovesInProgressCache.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/MovesInProgressCache.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/PartitionRebalancer.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/TwoDimensionalGreedyAlgo.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/PartitionRebalancer.java
Outdated
Show resolved
Hide resolved
kangkaisen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Will there be any balance-related modifications in the future? The new balance strategy also needs to be considered when create new partition or restore data, but I don't see the relevant changes. |
It's only a rebalacer strategy, not include the creation or some else. |
If partition create / restore does not consider the balance strategy of the cluster, then there will be such a situation: Tablets have just been created, and then balancer finds the cluster is imbalance, and then it will do a balance on the newly created tablets. |
Another question is has the new balance strategy been verified by the production environment? We also have similar requirements and we have implemented a similar balance strategy, but now we want to switch to this community implementation. |
Haven't yet, only in test env. And the production envs may have some differences. If you like, plz help me to test it. BTW, our team plans to use it in production env next month. |
fe/fe-core/src/main/java/org/apache/doris/catalog/TabletInvertedIndex.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/TabletInvertedIndex.java
Outdated
Show resolved
Hide resolved
| // 1. Every partition is balanced(maxPartitionSkew<=1) and any move will unbalance a partition, so there | ||
| // is no potential for the greedy algorithm to balance the cluster. | ||
| // 2. Every partition is balanced(maxPartitionSkew<=1) and the cluster as a whole is balanced(maxBeSkew<=1). | ||
| if (maxPartitionSkew == 0L || (maxPartitionSkew <= 1L && maxBeSkew <= 1L)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 1 too strict here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algo should get accurate calculation results. And I think there is no reason yet to relax the condition.
fe/fe-core/src/main/java/org/apache/doris/clone/PartitionRebalancer.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| // Move completed: fromBe doesn't have a replica and toBe has a replica | ||
| private boolean checkMoveCompleted(TabletMove move) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new replica will be added to the tablet at the beginning of clone task with state CLONE.
And the "fromBE" replica may be dropped for other reason such as "data broken on BE".
So are you sure !bes.contains(move.fromBe) && bes.contains(move.toBe) can be treated as move complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to define, but when !bes.contains(move.fromBe) && bes.contains(move.toBe), the move is useless. So why not to treat it as completed.
fe/fe-core/src/main/java/org/apache/doris/clone/TwoDimensionalGreedyRebalanceAlgo.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/TwoDimensionalGreedyRebalanceAlgo.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| LOG.debug("min_loaded_be: {}, max_loaded_be: {}", minLoadedBe, maxLoadedBe); | ||
| if (minLoadedBe.equals(maxLoadedBe)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use PICK_FIRST, in the case of fewer candidate sets, will there be a high probability of the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, we only use PICK_FIRST in UTs, to get the results that can be expected.
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
RebalancerType could be configured via Config.rebalancer_type(BeLoad, Partition). PartitionRebalancer is based on TwoDimensionalGreedyAlgo. Two dims of Doris should be cluster & partition. And we only consider about the replica count, do not consider replica size. apache#4845 for further details.
Proposed changes
RebalancerType could be configured via
Config.rebalancer_type(BeLoad, Partition).PartitionRebalancer is based on TwoDimensionalGreedyAlgo.
Two dims of Doris should be cluster & partition. And we only consider about the replica count, do not consider replica size.
#4845 for further details.
Types of changes
Checklist