-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improvement](cloud rebalance) make multiple fe consistent of tablet's backend #40771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
864a5db to
69901d8
Compare
|
run buildall |
TPC-H: Total hot run time: 41702 ms |
TPC-DS: Total hot run time: 199762 ms |
b530068 to
813d3ea
Compare
|
run buildall |
TPC-H: Total hot run time: 42157 ms |
|
run buildall |
TPC-H: Total hot run time: 42736 ms |
|
run performance |
TPC-H: Total hot run time: 41904 ms |
3c6112e to
6b000ab
Compare
|
run buildall |
TPC-H: Total hot run time: 41642 ms |
TPC-DS: Total hot run time: 196430 ms |
ClickBench: Total hot run time: 33.73 s |
| public static boolean enable_immediate_be_assign = true; | ||
| @ConfField(mutable = true, masterOnly = true, | ||
| description = {"存算分离模式下,当tablet分布的be异常,是否立即映射tablet到新的be上,默认false"}) | ||
| public static boolean enable_immediate_be_assign = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a behavior change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a behavior change
默认要开启新算法。这个开关关了,那还是走旧算法
|
Add more background and why we choose the solution and what is the trade off |
0d86f33 to
58b3daa
Compare
|
run buildall |
TPC-H: Total hot run time: 41393 ms |
TPC-DS: Total hot run time: 192726 ms |
|
run buildall |
3c96447 to
7a077a8
Compare
|
run buildall |
7a077a8 to
080f139
Compare
|
run buildall |
|
run feut |
1 similar comment
|
run feut |
TPC-H: Total hot run time: 40927 ms |
TPC-DS: Total hot run time: 191242 ms |
ClickBench: Total hot run time: 32.75 s |
|
PR approved by at least one committer and no changes requested. |
…s backend (apache#40771) When given a tablet, for multiple frontends, this tablet's backends maybe different. Because when a tablet's backend shutdown, the frontends will immedidately change the tablet to a live backend using hash alg. Because changed no write editlog, so the changed is not aync. It may result in a tablet relocating to different backends for different frontends. To fix this problem, each tablet introduces two be id: primary be id and secondary be id. Only master fe can change primary be id actively, and it will write an editlog. So it will ensure all frontends have the same primary be. For a tablet, if its primary be is alive, then use it as its backend, if its primary be is dead, then can chose a secondary be as its backend. More details: 1. when be dead over 1h(Config.rehash_tablet_after_be_dead_seconds), change none-colocate tablet's primary be id to its secondary be id; 2. don't balance on colocate table, and colocate table don't save their primary be id; 3. colocate table rehash on "alive be + be dead less 1h(Config.rehash_tablet_after_be_dead_seconds)" sets, but if the gotten be is dead, change to an alive be. This both ensure availability and avoid all tablets rehashing when be dead for a short time; 4. colocate tablet rehash on group id + bucket index; 5. rehash prefer to skip the decommission be; 6. 'show backends' will print the right tablet num of each backend; 7. cloud tablet show PrimaryBackendId
…s backend (apache#40771) When given a tablet, for multiple frontends, this tablet's backends maybe different. Because when a tablet's backend shutdown, the frontends will immedidately change the tablet to a live backend using hash alg. Because changed no write editlog, so the changed is not aync. It may result in a tablet relocating to different backends for different frontends. To fix this problem, each tablet introduces two be id: primary be id and secondary be id. Only master fe can change primary be id actively, and it will write an editlog. So it will ensure all frontends have the same primary be. For a tablet, if its primary be is alive, then use it as its backend, if its primary be is dead, then can chose a secondary be as its backend. More details: 1. when be dead over 1h(Config.rehash_tablet_after_be_dead_seconds), change none-colocate tablet's primary be id to its secondary be id; 2. don't balance on colocate table, and colocate table don't save their primary be id; 3. colocate table rehash on "alive be + be dead less 1h(Config.rehash_tablet_after_be_dead_seconds)" sets, but if the gotten be is dead, change to an alive be. This both ensure availability and avoid all tablets rehashing when be dead for a short time; 4. colocate tablet rehash on group id + bucket index; 5. rehash prefer to skip the decommission be; 6. 'show backends' will print the right tablet num of each backend; 7. cloud tablet show PrimaryBackendId
When given a tablet, for multiple frontends, this tablet's backends maybe different. Because when a tablet's backend shutdown, the frontends will immedidately change the tablet to a live backend using hash alg. Because changed no write editlog, so the changed is not aync. It may result in a tablet relocating to different backends for different frontends. To fix this problem, each tablet introduces two be id: primary be id and secondary be id. Only master fe can change primary be id actively, and it will write an editlog. So it will ensure all frontends have the same primary be. For a tablet, if its primary be is alive, then use it as its backend, if its primary be is dead, then can chose a secondary be as its backend.
More details: