Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #46534

…ey column. (#46534)

### What problem does this PR solve?

When doing sample analyze for partition column and key column, BE may
encounter OOM problem. The reason is, partition column need to choose at
least one tablet in each partition to calculate the NDV and couldn't use
limit in the SQL, so when the table has large number of partitions and
each tablet in each partition is quite large, the sample SQL may try to
read too many data which will cause BE OOM.
Similarly, key column couldn't use limit as well, so when one tablet is
very large, it also could cause OOM.

This pr is try to solve this problem.
For partition columns, when the selected tablets contain more than
1000000000 (one billion) rows, we use ndv() function to read up to 5
partitions to get the NDV value of this 5 partitions, say the ndv is n.
Suppose the row count in the 5 partitions is r, and the row count of tje
table is R, the table NDV would be n * R / r.
ndv() function use hll, so it only use a small amount of memory.

For key columns, when the selected tablets contain more than 1000000000
rows, we use limit 1000000000 to control the rows to read.

Reading 1000000000 rows would use at most 8GB memory in BE, which is
acceptable.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None
@github-actions github-actions bot requested a review from dataroaring as a code owner March 11, 2025 08:10
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 11, 2025
@hello-stephen
Copy link
Contributor

run buildall

@Jibing-Li
Copy link
Contributor

run feut

@doris-robot
Copy link

TPC-H: Total hot run time: 40102 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit afbab263fe8cd8c9c3508901e2e1f9fe8015478e, data reload: false

------ Round 1 ----------------------------------
q1	17581	6843	6564	6564
q2	2050	170	186	170
q3	10623	1093	1194	1093
q4	10574	782	731	731
q5	7740	2824	2880	2824
q6	219	132	131	131
q7	952	633	606	606
q8	9363	1913	2029	1913
q9	6542	6403	6364	6364
q10	7041	2285	2259	2259
q11	459	259	261	259
q12	394	211	216	211
q13	17811	3019	3040	3019
q14	229	215	222	215
q15	511	456	487	456
q16	698	609	588	588
q17	977	573	566	566
q18	7510	7017	6709	6709
q19	1390	1066	1001	1001
q20	495	203	205	203
q21	4021	3342	3243	3243
q22	1121	994	977	977
Total cold run time: 108301 ms
Total hot run time: 40102 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6966	6557	6531	6531
q2	336	251	225	225
q3	2905	2756	3006	2756
q4	2058	1796	1770	1770
q5	5729	5724	5722	5722
q6	223	130	130	130
q7	2198	1836	1849	1836
q8	3415	3589	3533	3533
q9	8799	8915	8864	8864
q10	3559	3545	3495	3495
q11	598	502	489	489
q12	820	573	597	573
q13	8561	3294	3160	3160
q14	302	281	286	281
q15	522	466	458	458
q16	686	660	644	644
q17	1814	1617	1578	1578
q18	8253	7788	7682	7682
q19	1657	1544	1567	1544
q20	2108	1810	1871	1810
q21	5461	5353	5375	5353
q22	1153	1095	1029	1029
Total cold run time: 68123 ms
Total hot run time: 59463 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196136 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit afbab263fe8cd8c9c3508901e2e1f9fe8015478e, data reload: false

query1	1288	909	929	909
query2	6235	2033	2009	2009
query3	10905	4489	4202	4202
query4	66002	28158	23179	23179
query5	4964	458	470	458
query6	421	185	178	178
query7	5596	307	311	307
query8	315	232	222	222
query9	8999	2626	2616	2616
query10	451	299	283	283
query11	17398	15264	15571	15264
query12	157	104	107	104
query13	1515	453	430	430
query14	10231	7686	6668	6668
query15	221	182	187	182
query16	6859	490	482	482
query17	1111	598	577	577
query18	1959	317	324	317
query19	222	159	159	159
query20	125	112	121	112
query21	207	106	118	106
query22	4723	4516	4625	4516
query23	34196	33845	34046	33845
query24	6144	2969	2865	2865
query25	561	431	418	418
query26	656	169	168	168
query27	1873	352	359	352
query28	4123	2461	2450	2450
query29	680	440	434	434
query30	242	166	159	159
query31	988	839	840	839
query32	66	58	53	53
query33	454	291	301	291
query34	911	499	509	499
query35	840	731	732	731
query36	1086	954	943	943
query37	114	66	76	66
query38	4065	3974	4046	3974
query39	1594	1473	1458	1458
query40	211	93	94	93
query41	49	47	47	47
query42	113	102	98	98
query43	525	492	475	475
query44	1172	816	819	816
query45	182	171	169	169
query46	1148	741	729	729
query47	2039	1893	1935	1893
query48	470	386	391	386
query49	712	394	397	394
query50	849	435	433	433
query51	7310	7184	7104	7104
query52	97	87	96	87
query53	254	179	180	179
query54	562	459	445	445
query55	80	79	73	73
query56	259	235	244	235
query57	1209	1143	1109	1109
query58	211	202	210	202
query59	3160	2968	3027	2968
query60	285	252	259	252
query61	108	106	107	106
query62	771	647	672	647
query63	217	188	190	188
query64	1379	683	626	626
query65	3251	3148	3169	3148
query66	712	297	298	297
query67	16044	15931	15680	15680
query68	3901	599	583	583
query69	426	302	270	270
query70	1181	1104	1121	1104
query71	344	259	277	259
query72	6285	4030	4002	4002
query73	751	351	347	347
query74	10032	9027	8877	8877
query75	3351	2607	2678	2607
query76	1872	1072	1149	1072
query77	479	269	272	269
query78	10650	9577	9589	9577
query79	1113	593	583	583
query80	773	434	431	431
query81	523	247	235	235
query82	186	91	93	91
query83	158	149	138	138
query84	286	87	74	74
query85	847	292	285	285
query86	341	297	302	297
query87	4508	4306	4276	4276
query88	4000	2408	2373	2373
query89	417	290	289	289
query90	2042	182	184	182
query91	183	149	149	149
query92	65	51	50	50
query93	1264	555	555	555
query94	754	279	283	279
query95	350	247	258	247
query96	607	283	281	281
query97	3276	3179	3141	3141
query98	220	198	198	198
query99	1538	1311	1327	1311
Total cold run time: 314599 ms
Total hot run time: 196136 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit afbab263fe8cd8c9c3508901e2e1f9fe8015478e, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.64	0.11	0.11
query5	0.54	0.52	0.50
query6	1.13	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.56	0.49	0.50
query10	0.54	0.54	0.54
query11	0.14	0.11	0.12
query12	0.14	0.12	0.11
query13	0.61	0.59	0.60
query14	2.87	2.76	2.84
query15	0.90	0.82	0.83
query16	0.38	0.37	0.37
query17	1.06	0.97	0.99
query18	0.24	0.21	0.22
query19	1.86	1.91	2.01
query20	0.01	0.01	0.02
query21	15.37	0.59	0.57
query22	2.71	2.52	1.73
query23	16.95	1.06	0.91
query24	3.06	2.17	1.60
query25	0.24	0.15	0.13
query26	0.44	0.14	0.13
query27	0.04	0.04	0.05
query28	8.95	0.50	0.43
query29	12.56	3.20	3.18
query30	0.25	0.07	0.07
query31	2.86	0.39	0.39
query32	3.23	0.45	0.46
query33	2.94	2.99	3.13
query34	17.25	4.46	4.49
query35	4.56	4.53	4.54
query36	0.67	0.48	0.49
query37	0.09	0.06	0.07
query38	0.05	0.03	0.03
query39	0.04	0.02	0.03
query40	0.17	0.13	0.13
query41	0.08	0.02	0.03
query42	0.03	0.03	0.02
query43	0.03	0.02	0.03
Total cold run time: 105.59 s
Total hot run time: 32.79 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 5f03c90 into branch-3.0 Mar 14, 2025
23 of 24 checks passed
@github-actions github-actions bot deleted the auto-pick-46534-branch-3.0 branch March 14, 2025 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants