Skip to content

Conversation

@deardeng
Copy link
Contributor

@deardeng deardeng commented Jul 4, 2025

…ize invalid

  1. Fixed the problem that auto bucket will calculate wrong results when partition size is inaccurate
  • If replica.size == 0, filter out this replica. In the tablet.getDataSize function, the size is calculated by taking the average value of the replicas. When the size of a replica is 0, it will have a great impact on the average value. Therefore, the replicas with size=0 are filtered out.
  • If the partition size equals 0, do not include it in the estimation of the partition size.
  • If all versions with data partitions have sizes equal to 0, then the newly calculated bucket number for the partition will equal the bucket number of the previous version with a size greater than 0. Since we do not know the partition size of the data partitions (as stats thread have not been collected yet), we assume that the new partition's size equals the size of the previous version with a size greater than 0. Consequently, the bucket number will naturally equal that of the previous partition.
  1. Added alarm log when the bucket num calculated by auto bucket exceeds the threshold

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jul 4, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@deardeng
Copy link
Contributor Author

deardeng commented Jul 4, 2025

wait case

@deardeng
Copy link
Contributor Author

deardeng commented Jul 7, 2025

run buildall

Copy link
Contributor

@yujun777 yujun777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2025

PR approved by anyone and no changes requested.

@deardeng
Copy link
Contributor Author

deardeng commented Jul 7, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33590 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 600ed133ab25ee1d980548f0cc923bdd3e5d78b1, data reload: false

------ Round 1 ----------------------------------
q1	17584	5254	5114	5114
q2	1937	280	199	199
q3	10367	1324	726	726
q4	10266	1042	519	519
q5	8113	2500	2323	2323
q6	183	157	129	129
q7	922	743	602	602
q8	9319	1309	1115	1115
q9	6998	5219	5170	5170
q10	6899	2402	1975	1975
q11	491	283	276	276
q12	340	345	214	214
q13	17780	3676	3116	3116
q14	226	229	206	206
q15	561	485	477	477
q16	432	422	386	386
q17	594	888	350	350
q18	7407	7154	7217	7154
q19	2676	1373	612	612
q20	311	333	225	225
q21	3699	3181	2391	2391
q22	352	324	311	311
Total cold run time: 107457 ms
Total hot run time: 33590 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5364	5110	5131	5110
q2	243	323	213	213
q3	2171	2697	2287	2287
q4	1393	1828	1357	1357
q5	4227	4570	4463	4463
q6	234	171	128	128
q7	2105	1946	1798	1798
q8	2713	2486	2594	2486
q9	7311	7393	7445	7393
q10	3163	3431	2919	2919
q11	585	510	492	492
q12	697	779	577	577
q13	3702	3924	3649	3649
q14	296	292	280	280
q15	519	470	481	470
q16	454	500	448	448
q17	1222	1608	1366	1366
q18	7830	7734	7626	7626
q19	842	824	872	824
q20	1887	1964	1830	1830
q21	4736	4346	4316	4316
q22	632	591	536	536
Total cold run time: 52326 ms
Total hot run time: 50568 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185158 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 600ed133ab25ee1d980548f0cc923bdd3e5d78b1, data reload: false

query1	1011	386	380	380
query2	6564	1659	1659	1659
query3	6743	211	209	209
query4	27310	23276	23465	23276
query5	4371	605	436	436
query6	301	230	203	203
query7	4619	504	296	296
query8	269	232	231	231
query9	8632	2640	2642	2640
query10	472	344	283	283
query11	15683	15078	15081	15078
query12	155	106	102	102
query13	1635	533	394	394
query14	9398	5693	5605	5605
query15	203	192	169	169
query16	7497	631	492	492
query17	1179	705	574	574
query18	2006	403	290	290
query19	196	197	173	173
query20	122	120	113	113
query21	214	128	108	108
query22	3993	4316	3960	3960
query23	33944	32929	33128	32929
query24	8474	2397	2407	2397
query25	562	491	422	422
query26	1238	269	158	158
query27	2756	507	344	344
query28	4329	2132	2128	2128
query29	781	576	490	490
query30	285	230	191	191
query31	944	838	773	773
query32	67	67	61	61
query33	555	371	313	313
query34	827	896	504	504
query35	785	822	735	735
query36	966	964	890	890
query37	109	95	72	72
query38	4130	4128	4123	4123
query39	1483	1443	1389	1389
query40	212	115	104	104
query41	57	57	52	52
query42	127	107	167	107
query43	483	518	477	477
query44	1333	826	817	817
query45	175	168	161	161
query46	850	1027	633	633
query47	1747	1814	1774	1774
query48	398	426	298	298
query49	723	482	404	404
query50	659	694	416	416
query51	4203	4176	4198	4176
query52	121	106	100	100
query53	246	257	184	184
query54	580	570	504	504
query55	83	82	81	81
query56	312	309	308	308
query57	1183	1191	1105	1105
query58	274	251	266	251
query59	2512	2699	2491	2491
query60	315	306	294	294
query61	122	123	120	120
query62	785	725	666	666
query63	216	189	190	189
query64	4303	990	649	649
query65	4303	4207	4216	4207
query66	1082	412	321	321
query67	15693	15673	15286	15286
query68	7947	892	530	530
query69	468	292	270	270
query70	1190	1110	1129	1110
query71	471	334	311	311
query72	5626	4738	4709	4709
query73	662	590	356	356
query74	8972	8853	8723	8723
query75	3785	3198	2730	2730
query76	3608	1149	738	738
query77	813	402	294	294
query78	10108	10333	9465	9465
query79	2007	783	585	585
query80	589	516	527	516
query81	466	254	228	228
query82	413	123	102	102
query83	257	261	238	238
query84	240	96	85	85
query85	788	357	301	301
query86	403	293	307	293
query87	4419	4434	4293	4293
query88	3275	2278	2265	2265
query89	374	312	284	284
query90	1949	219	219	219
query91	139	146	109	109
query92	73	61	59	59
query93	1136	949	595	595
query94	672	418	307	307
query95	377	297	284	284
query96	504	569	287	287
query97	2723	2729	2624	2624
query98	233	228	204	204
query99	1313	1388	1279	1279
Total cold run time: 273905 ms
Total hot run time: 185158 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 600ed133ab25ee1d980548f0cc923bdd3e5d78b1, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.24	0.07	0.08
query4	1.62	0.11	0.10
query5	0.46	0.43	0.41
query6	1.17	0.70	0.67
query7	0.03	0.02	0.02
query8	0.04	0.04	0.04
query9	0.61	0.52	0.51
query10	0.58	0.57	0.57
query11	0.16	0.12	0.11
query12	0.15	0.11	0.12
query13	0.63	0.60	0.61
query14	0.79	0.81	0.82
query15	0.92	0.88	0.88
query16	0.40	0.38	0.40
query17	1.05	1.05	1.04
query18	0.23	0.23	0.21
query19	1.94	1.86	1.88
query20	0.01	0.01	0.02
query21	15.45	0.88	0.53
query22	0.76	1.18	0.67
query23	14.90	1.39	0.66
query24	7.23	1.29	0.82
query25	0.49	0.19	0.14
query26	0.55	0.17	0.15
query27	0.07	0.06	0.05
query28	9.02	0.91	0.44
query29	12.56	3.88	3.20
query30	0.26	0.09	0.07
query31	2.83	0.58	0.38
query32	3.25	0.56	0.47
query33	3.05	3.07	3.07
query34	16.05	5.41	4.77
query35	4.88	4.89	4.86
query36	0.70	0.50	0.48
query37	0.09	0.07	0.07
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.18	0.15	0.15
query41	0.09	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.04	0.04
Total cold run time: 103.71 s
Total hot run time: 29.58 s

@gavinchou gavinchou changed the title [fix](auto bucket)Fix auto bucket calc bucketnum err when partition s… [fix](auto bucket)Fix auto bucket calc bucketnum err when partition size is invalid Jul 8, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 14, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 234f13e into apache:master Jul 14, 2025
31 of 32 checks passed
deardeng added a commit to deardeng/incubator-doris that referenced this pull request Jul 15, 2025
…ize is invalid (apache#52801)

…ize invalid

1. Fixed the problem that auto bucket will calculate wrong results when
partition size is inaccurate
- If `replica.size == 0`, filter out this replica. In the
tablet.getDataSize function, the size is calculated by taking the
average value of the replicas. When the size of a replica is 0, it will
have a great impact on the average value. Therefore, the replicas with
size=0 are filtered out.
- If the partition size equals 0, do not include it in the estimation of
the partition size.
- If all versions with data partitions have sizes equal to 0, then the
newly calculated bucket number for the partition will equal the bucket
number of the previous version with a size greater than 0. Since we do
not know the partition size of the data partitions (as stats thread have
not been collected yet), we assume that the new partition's size equals
the size of the previous version with a size greater than 0.
Consequently, the bucket number will naturally equal that of the
previous partition.

2. Added alarm log when the bucket num calculated by auto bucket exceeds
the threshold
deardeng added a commit to deardeng/incubator-doris that referenced this pull request Jul 15, 2025
…ize is invalid (apache#52801)

…ize invalid

1. Fixed the problem that auto bucket will calculate wrong results when
partition size is inaccurate
- If `replica.size == 0`, filter out this replica. In the
tablet.getDataSize function, the size is calculated by taking the
average value of the replicas. When the size of a replica is 0, it will
have a great impact on the average value. Therefore, the replicas with
size=0 are filtered out.
- If the partition size equals 0, do not include it in the estimation of
the partition size.
- If all versions with data partitions have sizes equal to 0, then the
newly calculated bucket number for the partition will equal the bucket
number of the previous version with a size greater than 0. Since we do
not know the partition size of the data partitions (as stats thread have
not been collected yet), we assume that the new partition's size equals
the size of the previous version with a size greater than 0.
Consequently, the bucket number will naturally equal that of the
previous partition.

2. Added alarm log when the bucket num calculated by auto bucket exceeds
the threshold
morrySnow pushed a commit that referenced this pull request Jul 16, 2025
dataroaring pushed a commit that referenced this pull request Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.7-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants