Skip to content

Conversation

@Jibing-Li
Copy link
Contributor

@Jibing-Li Jibing-Li commented Feb 24, 2025

What problem does this PR solve?

The previous pr (#46534) control the memory use when sample analyzing a large partition table.
This PR make the maximum rows and partition count to sample configurable. User could set the value larger if the NDV is not accurate enough.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 24, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Jibing-Li Jibing-Li marked this pull request as ready for review February 24, 2025 03:05
@Jibing-Li
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31445 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit df43f6513b6940a11b2563a32a31d8b5111e6180, data reload: false

------ Round 1 ----------------------------------
q1	17610	5367	5118	5118
q2	2047	280	180	180
q3	10415	1260	698	698
q4	10220	983	539	539
q5	7497	2481	2288	2288
q6	184	171	136	136
q7	916	729	594	594
q8	9302	1324	1128	1128
q9	4929	4612	4600	4600
q10	6819	2335	1870	1870
q11	500	275	248	248
q12	359	361	221	221
q13	17776	3650	3072	3072
q14	232	227	204	204
q15	517	469	482	469
q16	616	611	587	587
q17	589	870	334	334
q18	6514	6333	6190	6190
q19	1066	948	543	543
q20	302	322	196	196
q21	2753	2120	1932	1932
q22	355	324	298	298
Total cold run time: 101518 ms
Total hot run time: 31445 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5115	5373	5125	5125
q2	238	335	239	239
q3	2160	2686	2326	2326
q4	1418	1842	1371	1371
q5	4248	4146	4177	4146
q6	209	164	124	124
q7	1853	1833	1679	1679
q8	2647	2511	2501	2501
q9	7357	7109	7193	7109
q10	2988	3209	2729	2729
q11	581	516	483	483
q12	703	738	609	609
q13	3526	3864	3233	3233
q14	291	310	304	304
q15	511	463	453	453
q16	624	672	623	623
q17	1147	1619	1292	1292
q18	7730	7487	7278	7278
q19	813	809	971	809
q20	1979	2030	1839	1839
q21	5353	4978	4837	4837
q22	611	585	533	533
Total cold run time: 52102 ms
Total hot run time: 49642 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184021 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit df43f6513b6940a11b2563a32a31d8b5111e6180, data reload: false

query1	968	369	389	369
query2	6530	1904	1892	1892
query3	6787	239	216	216
query4	26748	24089	23336	23336
query5	4718	678	487	487
query6	304	200	184	184
query7	4603	522	298	298
query8	301	248	253	248
query9	8650	2573	2569	2569
query10	479	345	248	248
query11	15563	15083	14938	14938
query12	155	105	107	105
query13	1655	535	397	397
query14	10317	6284	6729	6284
query15	213	199	180	180
query16	7655	631	447	447
query17	1296	721	567	567
query18	1988	407	304	304
query19	190	185	163	163
query20	128	118	121	118
query21	211	128	104	104
query22	4554	4549	4734	4549
query23	34312	33233	33025	33025
query24	7814	2362	2379	2362
query25	517	452	374	374
query26	1214	278	153	153
query27	2072	494	323	323
query28	3880	2400	2378	2378
query29	701	540	414	414
query30	229	180	154	154
query31	938	875	800	800
query32	68	61	58	58
query33	558	342	293	293
query34	771	855	492	492
query35	795	815	731	731
query36	971	986	881	881
query37	118	121	76	76
query38	4154	4150	4124	4124
query39	1449	1407	1379	1379
query40	204	108	99	99
query41	52	51	54	51
query42	124	104	104	104
query43	488	508	497	497
query44	1291	776	771	771
query45	175	171	160	160
query46	872	1017	633	633
query47	1789	1799	1782	1782
query48	383	438	311	311
query49	770	481	429	429
query50	711	734	411	411
query51	4180	4167	4105	4105
query52	106	104	92	92
query53	227	260	180	180
query54	491	506	423	423
query55	81	77	81	77
query56	292	274	237	237
query57	1128	1135	1081	1081
query58	239	235	228	228
query59	2698	2629	2539	2539
query60	282	274	243	243
query61	120	124	116	116
query62	799	753	669	669
query63	226	196	185	185
query64	4201	1002	655	655
query65	3240	3140	3142	3140
query66	1051	398	322	322
query67	15915	15428	15512	15428
query68	7036	764	532	532
query69	467	300	273	273
query70	1187	1114	1156	1114
query71	400	290	265	265
query72	5676	3554	3685	3554
query73	707	732	348	348
query74	9082	9520	8912	8912
query75	3185	3210	2705	2705
query76	3253	1159	727	727
query77	450	383	280	280
query78	10021	10111	9322	9322
query79	2244	813	597	597
query80	643	513	461	461
query81	532	287	232	232
query82	197	126	95	95
query83	177	164	160	160
query84	253	98	79	79
query85	742	340	300	300
query86	366	311	294	294
query87	4449	4451	4376	4376
query88	3741	2203	2200	2200
query89	391	318	279	279
query90	1969	197	187	187
query91	138	129	107	107
query92	77	63	57	57
query93	2388	1010	556	556
query94	672	399	280	280
query95	352	265	262	262
query96	503	555	282	282
query97	2735	2915	2731	2731
query98	232	218	195	195
query99	1342	1409	1271	1271
Total cold run time: 272316 ms
Total hot run time: 184021 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit df43f6513b6940a11b2563a32a31d8b5111e6180, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.03	0.04
query3	0.23	0.07	0.06
query4	1.62	0.10	0.11
query5	0.44	0.41	0.40
query6	1.18	0.67	0.66
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.60	0.51	0.52
query10	0.57	0.57	0.56
query11	0.15	0.11	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.61
query14	2.72	2.69	2.69
query15	0.91	0.84	0.84
query16	0.38	0.37	0.37
query17	1.02	1.07	1.02
query18	0.22	0.19	0.20
query19	1.92	1.79	1.96
query20	0.01	0.02	0.01
query21	15.35	0.91	0.54
query22	0.74	1.06	0.61
query23	15.14	1.36	0.60
query24	7.00	1.58	0.46
query25	0.48	0.26	0.08
query26	0.63	0.16	0.15
query27	0.05	0.05	0.05
query28	9.39	0.84	0.42
query29	12.53	3.94	3.29
query30	0.24	0.08	0.05
query31	2.84	0.57	0.37
query32	3.23	0.56	0.47
query33	2.99	3.09	3.07
query34	15.86	5.11	4.50
query35	4.56	4.49	4.49
query36	0.68	0.49	0.49
query37	0.09	0.07	0.06
query38	0.06	0.04	0.04
query39	0.03	0.02	0.02
query40	0.18	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.04	0.02
Total cold run time: 105.14 s
Total hot run time: 30.09 s

@Jibing-Li
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31782 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit afc7f631e4b408c754c7cd3e5baf3ee00d0172b0, data reload: false

------ Round 1 ----------------------------------
q1	17583	5197	5063	5063
q2	2054	321	187	187
q3	10371	1298	696	696
q4	10219	1020	539	539
q5	7477	2392	2393	2392
q6	188	168	133	133
q7	912	742	601	601
q8	9303	1300	1059	1059
q9	4991	4918	4872	4872
q10	6831	2337	1891	1891
q11	474	289	250	250
q12	344	370	220	220
q13	17761	3715	3051	3051
q14	228	234	205	205
q15	504	467	471	467
q16	619	617	588	588
q17	575	875	347	347
q18	7093	6248	6305	6248
q19	1210	952	547	547
q20	329	325	187	187
q21	2756	2155	1929	1929
q22	366	337	310	310
Total cold run time: 102188 ms
Total hot run time: 31782 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5121	5060	5082	5060
q2	234	334	233	233
q3	2148	2683	2375	2375
q4	1406	1817	1375	1375
q5	4263	4134	4182	4134
q6	207	166	123	123
q7	1898	1819	1669	1669
q8	2612	2573	2622	2573
q9	7327	7261	7195	7195
q10	3001	3257	2744	2744
q11	565	529	479	479
q12	700	785	613	613
q13	3555	3917	3192	3192
q14	289	291	283	283
q15	514	459	464	459
q16	646	691	637	637
q17	1151	1603	1324	1324
q18	7722	7544	7164	7164
q19	810	776	938	776
q20	1967	2073	1897	1897
q21	5459	5114	4788	4788
q22	654	572	595	572
Total cold run time: 52249 ms
Total hot run time: 49665 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184408 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit afc7f631e4b408c754c7cd3e5baf3ee00d0172b0, data reload: false

query1	1009	396	395	395
query2	6523	1877	1886	1877
query3	6795	217	216	216
query4	26854	23439	23577	23439
query5	4370	699	513	513
query6	302	195	188	188
query7	4617	504	303	303
query8	304	246	229	229
query9	8612	2522	2529	2522
query10	460	332	250	250
query11	15879	15230	14948	14948
query12	159	110	112	110
query13	1664	542	402	402
query14	10444	6206	6270	6206
query15	210	196	181	181
query16	7665	674	472	472
query17	1197	719	572	572
query18	1990	399	312	312
query19	198	193	154	154
query20	120	115	117	115
query21	209	128	111	111
query22	4230	4276	4186	4186
query23	33989	33022	32951	32951
query24	7780	2370	2402	2370
query25	545	450	388	388
query26	1230	263	164	164
query27	2105	497	321	321
query28	3891	2401	2417	2401
query29	701	546	411	411
query30	235	188	165	165
query31	944	846	765	765
query32	77	65	67	65
query33	569	361	321	321
query34	799	853	498	498
query35	805	817	725	725
query36	977	978	889	889
query37	112	95	71	71
query38	4199	4129	4102	4102
query39	1461	1383	1381	1381
query40	202	123	107	107
query41	54	53	52	52
query42	125	106	108	106
query43	497	487	482	482
query44	1293	791	790	790
query45	176	166	159	159
query46	875	1046	633	633
query47	1771	1786	1686	1686
query48	377	419	296	296
query49	818	499	423	423
query50	676	727	409	409
query51	4157	4209	4133	4133
query52	113	108	99	99
query53	228	262	185	185
query54	491	493	410	410
query55	79	85	77	77
query56	259	263	252	252
query57	1156	1148	1067	1067
query58	245	251	239	239
query59	2698	2734	2589	2589
query60	273	287	298	287
query61	140	116	135	116
query62	819	736	651	651
query63	240	189	193	189
query64	4269	1063	653	653
query65	3211	3110	3132	3110
query66	1056	408	307	307
query67	15762	15529	15400	15400
query68	8375	880	515	515
query69	449	300	266	266
query70	1129	1143	1125	1125
query71	455	290	278	278
query72	5552	3587	3847	3587
query73	798	732	349	349
query74	8998	9103	9073	9073
query75	3754	3174	2701	2701
query76	3684	1178	753	753
query77	786	365	271	271
query78	9878	10167	9323	9323
query79	2412	814	594	594
query80	609	585	431	431
query81	500	285	242	242
query82	674	131	95	95
query83	175	171	153	153
query84	243	102	80	80
query85	778	361	316	316
query86	332	292	270	270
query87	4441	4495	4313	4313
query88	3387	2229	2196	2196
query89	405	337	286	286
query90	1942	205	194	194
query91	142	137	113	113
query92	72	62	57	57
query93	1282	1054	590	590
query94	667	413	287	287
query95	355	275	268	268
query96	476	558	266	266
query97	3338	3382	3314	3314
query98	228	205	201	201
query99	1637	1438	1305	1305
Total cold run time: 274243 ms
Total hot run time: 184408 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit afc7f631e4b408c754c7cd3e5baf3ee00d0172b0, data reload: false

query1	0.04	0.03	0.04
query2	0.07	0.03	0.03
query3	0.23	0.07	0.06
query4	1.62	0.10	0.11
query5	0.57	0.56	0.55
query6	1.22	0.73	0.74
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.56	0.51
query10	0.58	0.59	0.57
query11	0.16	0.10	0.10
query12	0.15	0.11	0.11
query13	0.63	0.60	0.60
query14	2.68	2.84	2.74
query15	0.92	0.85	0.85
query16	0.39	0.38	0.39
query17	1.04	1.05	1.05
query18	0.22	0.20	0.19
query19	1.88	1.95	1.85
query20	0.01	0.01	0.01
query21	15.35	0.91	0.55
query22	0.77	1.19	0.69
query23	14.90	1.41	0.68
query24	7.56	1.08	0.80
query25	0.52	0.37	0.09
query26	0.62	0.16	0.14
query27	0.06	0.05	0.06
query28	9.00	0.94	0.44
query29	12.62	3.97	3.26
query30	0.25	0.09	0.07
query31	2.82	0.60	0.38
query32	3.22	0.56	0.47
query33	2.95	3.02	3.08
query34	15.68	5.25	4.52
query35	4.57	4.53	4.58
query36	0.68	0.49	0.48
query37	0.10	0.07	0.07
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 105.14 s
Total hot run time: 31 s

@Jibing-Li
Copy link
Contributor Author

run p0

@Jibing-Li
Copy link
Contributor Author

run cloud_p0

@Jibing-Li
Copy link
Contributor Author

run p0

@Jibing-Li
Copy link
Contributor Author

run cloud_p0

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2025

PR approved by anyone and no changes requested.

Copy link
Contributor

@zfr9527 zfr9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jibing-Li Jibing-Li merged commit 0f1aca0 into apache:master Mar 5, 2025
30 of 31 checks passed
@Jibing-Li Jibing-Li deleted the partitioncount branch March 5, 2025 07:34
github-actions bot pushed a commit that referenced this pull request Mar 14, 2025
…unt. (#48218)

### What problem does this PR solve?
The previous pr (#46534) control the
memory use when sample analyzing a large partition table.
This PR make the maximum rows and partition count to sample
configurable. User could set the value larger if the NDV is not accurate
enough.
github-actions bot pushed a commit that referenced this pull request Mar 14, 2025
…unt. (#48218)

### What problem does this PR solve?
The previous pr (#46534) control the
memory use when sample analyzing a large partition table.
This PR make the maximum rows and partition count to sample
configurable. User could set the value larger if the NDV is not accurate
enough.
dataroaring pushed a commit that referenced this pull request Mar 14, 2025
…on sample count. #48218 (#49091)

Cherry-picked from #48218

Co-authored-by: James <lijibing@selectdb.com>
yiguolei pushed a commit that referenced this pull request Mar 14, 2025
…on sample count. #48218 (#49092)

Cherry-picked from #48218

Co-authored-by: James <lijibing@selectdb.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…unt. (apache#48218)

### What problem does this PR solve?
The previous pr (apache#46534) control the
memory use when sample analyzing a large partition table.
This PR make the maximum rows and partition count to sample
configurable. User could set the value larger if the NDV is not accurate
enough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.9-merged dev/3.0.5-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants