Skip to content

Conversation

@deardeng
Copy link
Contributor

What problem does this PR solve?

Exposes cloud balance related metrics to show whether the compute group is performing balance scheduling. When *_balance_num metrics are all 0, the current compute group is considered to be in a balanced state.
Note: These metrics are valid only when requesting the fe master (balance scheduling is performed on the fe master)

 curl "http://175.42.1.1:8030/metrics" |rg '_balance_num'
# HELP doris_fe_cloud_table_balance_num current cluster cloud table balance sync edit log number
# TYPE doris_fe_cloud_table_balance_num counter
doris_fe_cloud_table_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 5
doris_fe_cloud_table_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_partition_balance_num current cluster cloud partition balance sync edit log number
# TYPE doris_fe_cloud_partition_balance_num counter
doris_fe_cloud_partition_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_partition_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_smooth_upgrade_balance_num current cluster cloud smooth upgrade sync edit log number
# TYPE doris_fe_cloud_smooth_upgrade_balance_num counter
doris_fe_cloud_smooth_upgrade_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_smooth_upgrade_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_global_balance_num current cluster cloud be balance sync edit log number
# TYPE doris_fe_cloud_global_balance_num counter
doris_fe_cloud_global_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_global_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_warm_up_balance_num current cluster cloud warm up cache sync edit log number
# TYPE doris_fe_cloud_warm_up_balance_num counter
doris_fe_cloud_warm_up_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_warm_up_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@deardeng
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 28.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 95d801f077426cf08d45c4e4b2a9a64b3a508416, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.05	0.06
query3	0.25	0.09	0.09
query4	1.61	0.12	0.12
query5	0.27	0.26	0.24
query6	1.20	0.63	0.64
query7	0.02	0.03	0.03
query8	0.06	0.04	0.04
query9	0.63	0.53	0.53
query10	0.59	0.58	0.57
query11	0.17	0.12	0.14
query12	0.15	0.14	0.12
query13	0.63	0.62	0.60
query14	1.02	1.05	1.01
query15	0.88	0.85	0.88
query16	0.39	0.39	0.38
query17	1.06	1.12	1.06
query18	0.22	0.20	0.20
query19	1.95	1.84	1.83
query20	0.01	0.01	0.02
query21	15.42	0.18	0.12
query22	5.11	0.07	0.05
query23	15.65	0.26	0.10
query24	2.82	1.42	0.38
query25	0.10	0.06	0.07
query26	0.14	0.13	0.13
query27	0.08	0.05	0.05
query28	4.78	1.13	0.92
query29	12.61	3.96	3.38
query30	0.29	0.14	0.11
query31	2.83	0.58	0.38
query32	3.25	0.55	0.47
query33	3.13	3.09	3.10
query34	15.98	5.48	4.87
query35	4.92	4.91	4.87
query36	0.71	0.52	0.50
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 99.64 s
Total hot run time: 28.37 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/71) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 29.58% (21/71) 🎉
Increment coverage report
Complete coverage report

@deardeng
Copy link
Contributor Author

run p0

@deardeng
Copy link
Contributor Author

run nonConcurrent

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 23, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 29.58% (21/71) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 29.58% (21/71) 🎉
Increment coverage report
Complete coverage report

@gavinchou gavinchou merged commit 5b7acb0 into apache:master Oct 27, 2025
30 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 27, 2025
Exposes cloud balance related metrics to show whether the compute group
is performing balance scheduling. When `*_balance_num` metrics are all
0, the current compute group is considered to be in a balanced state.
Note: These metrics are valid only when requesting the fe master
(balance scheduling is performed on the fe master)


```
 curl "http://175.42.1.1:8030/metrics" |rg '_balance_num'
# HELP doris_fe_cloud_table_balance_num current cluster cloud table balance sync edit log number
# TYPE doris_fe_cloud_table_balance_num counter
doris_fe_cloud_table_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 5
doris_fe_cloud_table_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_partition_balance_num current cluster cloud partition balance sync edit log number
# TYPE doris_fe_cloud_partition_balance_num counter
doris_fe_cloud_partition_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_partition_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_smooth_upgrade_balance_num current cluster cloud smooth upgrade sync edit log number
# TYPE doris_fe_cloud_smooth_upgrade_balance_num counter
doris_fe_cloud_smooth_upgrade_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_smooth_upgrade_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_global_balance_num current cluster cloud be balance sync edit log number
# TYPE doris_fe_cloud_global_balance_num counter
doris_fe_cloud_global_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_global_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_warm_up_balance_num current cluster cloud warm up cache sync edit log number
# TYPE doris_fe_cloud_warm_up_balance_num counter
doris_fe_cloud_warm_up_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_warm_up_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
```
morrySnow pushed a commit that referenced this pull request Oct 28, 2025
Cherry-picked from #57200

Co-authored-by: deardeng <dengxin@selectdb.com>
dwdwqfwe pushed a commit to dwdwqfwe/doris that referenced this pull request Oct 31, 2025
Exposes cloud balance related metrics to show whether the compute group
is performing balance scheduling. When `*_balance_num` metrics are all
0, the current compute group is considered to be in a balanced state.
Note: These metrics are valid only when requesting the fe master
(balance scheduling is performed on the fe master)


```
 curl "http://175.42.1.1:8030/metrics" |rg '_balance_num'
# HELP doris_fe_cloud_table_balance_num current cluster cloud table balance sync edit log number
# TYPE doris_fe_cloud_table_balance_num counter
doris_fe_cloud_table_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 5
doris_fe_cloud_table_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_partition_balance_num current cluster cloud partition balance sync edit log number
# TYPE doris_fe_cloud_partition_balance_num counter
doris_fe_cloud_partition_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_partition_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_smooth_upgrade_balance_num current cluster cloud smooth upgrade sync edit log number
# TYPE doris_fe_cloud_smooth_upgrade_balance_num counter
doris_fe_cloud_smooth_upgrade_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_smooth_upgrade_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_global_balance_num current cluster cloud be balance sync edit log number
# TYPE doris_fe_cloud_global_balance_num counter
doris_fe_cloud_global_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_global_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
# HELP doris_fe_cloud_warm_up_balance_num current cluster cloud warm up cache sync edit log number
# TYPE doris_fe_cloud_warm_up_balance_num counter
doris_fe_cloud_warm_up_balance_num{cluster_id="compute_cluster_id", cluster_name="compute_cluster"} 0
doris_fe_cloud_warm_up_balance_num{cluster_id="other_cluster_id", cluster_name="other_cluster"} 0
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.3-merged dev/4.0.x dev/4.0.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants