Skip to content

Conversation

@Jibing-Li
Copy link
Contributor

For string type columns, use xxhash_64 to transfer column value to an integer, and then calculate the NDV based on the integer hash value. In this case, we can reduce the memory cost of sample analyze and improve the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory cost to calculate its NDV is reduced to 8GB from 22GB

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Jibing-Li Jibing-Li marked this pull request as ready for review September 24, 2024 07:07
@Jibing-Li
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41028 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c8af049d5d486c99bfcf315700e57608df1a0ce3, data reload: false

------ Round 1 ----------------------------------
q1	17677	7387	7228	7228
q2	2014	289	285	285
q3	12144	1057	1167	1057
q4	10558	748	715	715
q5	7777	2892	2768	2768
q6	246	153	153	153
q7	961	644	609	609
q8	9496	1953	2008	1953
q9	7611	6427	6384	6384
q10	6986	2298	2324	2298
q11	436	245	250	245
q12	410	220	215	215
q13	17773	2949	2952	2949
q14	244	223	218	218
q15	594	530	530	530
q16	684	614	606	606
q17	969	601	552	552
q18	7221	6736	6753	6736
q19	1390	1037	1064	1037
q20	585	290	297	290
q21	3937	3222	3235	3222
q22	1105	997	978	978
Total cold run time: 110818 ms
Total hot run time: 41028 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7213	7220	7205	7205
q2	324	231	229	229
q3	3034	3002	2935	2935
q4	2101	1841	1812	1812
q5	5756	5752	5754	5752
q6	230	143	142	142
q7	2235	1810	1812	1810
q8	3380	3537	3437	3437
q9	8960	8872	8813	8813
q10	3597	3575	3520	3520
q11	600	487	491	487
q12	837	642	625	625
q13	8280	3161	3172	3161
q14	299	272	287	272
q15	573	525	529	525
q16	713	678	657	657
q17	1866	1588	1594	1588
q18	8223	7780	7696	7696
q19	1703	1545	1391	1391
q20	2158	1872	1870	1870
q21	5563	5396	5559	5396
q22	1138	1053	1050	1050
Total cold run time: 68783 ms
Total hot run time: 60373 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191641 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c8af049d5d486c99bfcf315700e57608df1a0ce3, data reload: false

query1	857	397	411	397
query2	6255	2079	2110	2079
query3	8683	192	201	192
query4	33610	23542	23502	23502
query5	3460	469	463	463
query6	280	174	166	166
query7	4195	306	317	306
query8	287	222	227	222
query9	9387	2702	2705	2702
query10	463	282	291	282
query11	17837	15159	15320	15159
query12	157	101	96	96
query13	1529	434	421	421
query14	10030	7147	7404	7147
query15	261	169	178	169
query16	8087	458	417	417
query17	1610	595	581	581
query18	2184	307	323	307
query19	358	150	155	150
query20	128	113	111	111
query21	208	101	108	101
query22	4722	4539	4507	4507
query23	34914	33961	34022	33961
query24	10948	2855	2823	2823
query25	629	411	430	411
query26	1151	162	167	162
query27	2255	302	311	302
query28	7659	2447	2464	2447
query29	839	439	441	439
query30	257	154	153	153
query31	1023	820	805	805
query32	104	58	53	53
query33	760	297	308	297
query34	932	503	502	502
query35	884	729	735	729
query36	1109	957	970	957
query37	158	90	86	86
query38	4117	3830	3974	3830
query39	1469	1416	1458	1416
query40	206	98	106	98
query41	50	49	48	48
query42	123	99	98	98
query43	535	482	481	481
query44	1271	818	825	818
query45	195	166	166	166
query46	1165	709	732	709
query47	1940	1878	1843	1843
query48	469	362	364	362
query49	880	419	409	409
query50	841	412	413	412
query51	7197	7002	6875	6875
query52	99	93	86	86
query53	252	180	179	179
query54	1213	479	475	475
query55	81	78	77	77
query56	296	277	272	272
query57	1223	1092	1108	1092
query58	248	229	257	229
query59	3293	2842	3021	2842
query60	321	286	290	286
query61	127	121	140	121
query62	847	693	689	689
query63	225	189	190	189
query64	4031	760	621	621
query65	3290	3175	3179	3175
query66	768	321	301	301
query67	15845	15653	15489	15489
query68	4885	586	560	560
query69	536	295	311	295
query70	1193	1182	1165	1165
query71	402	271	271	271
query72	7505	4022	4151	4022
query73	782	357	344	344
query74	10037	9119	9026	9026
query75	3587	2714	2665	2665
query76	3412	960	896	896
query77	439	289	294	289
query78	10013	9368	9260	9260
query79	2211	598	601	598
query80	1038	443	449	443
query81	581	245	238	238
query82	614	147	143	143
query83	244	136	133	133
query84	254	78	79	78
query85	1470	289	288	288
query86	462	304	303	303
query87	4527	4316	4312	4312
query88	3619	2415	2375	2375
query89	411	286	298	286
query90	2007	190	187	187
query91	177	140	141	140
query92	78	48	49	48
query93	2167	562	561	561
query94	1022	290	290	290
query95	356	256	260	256
query96	614	280	279	279
query97	3274	3161	3084	3084
query98	219	208	192	192
query99	1541	1329	1322	1322
Total cold run time: 300504 ms
Total hot run time: 191641 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.88 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c8af049d5d486c99bfcf315700e57608df1a0ce3, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.03	0.02
query3	0.23	0.07	0.06
query4	1.64	0.10	0.10
query5	0.52	0.50	0.51
query6	1.13	0.73	0.71
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.57	0.51	0.50
query10	0.55	0.57	0.55
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.60	0.60
query14	3.01	2.96	3.08
query15	0.89	0.82	0.82
query16	0.38	0.38	0.39
query17	1.08	0.97	1.00
query18	0.20	0.20	0.20
query19	2.01	1.88	1.97
query20	0.01	0.01	0.00
query21	15.36	0.59	0.58
query22	2.56	3.50	1.75
query23	17.16	0.87	0.74
query24	2.52	1.20	1.15
query25	0.18	0.13	0.07
query26	0.54	0.14	0.13
query27	0.04	0.04	0.04
query28	10.69	1.09	1.06
query29	12.58	3.24	3.23
query30	0.24	0.06	0.05
query31	2.88	0.38	0.37
query32	3.28	0.47	0.46
query33	2.97	3.01	3.00
query34	16.96	4.44	4.51
query35	4.48	4.52	4.49
query36	0.69	0.50	0.49
query37	0.08	0.07	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.02	0.03
Total cold run time: 106.89 s
Total hot run time: 32.88 s

@Jibing-Li
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41254 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f67239687a67f154270cd1c684913956bfcdf51c, data reload: false

------ Round 1 ----------------------------------
q1	18091	7577	7367	7367
q2	2817	171	183	171
q3	11626	1208	1173	1173
q4	10361	754	731	731
q5	8183	2939	2880	2880
q6	245	153	151	151
q7	981	628	621	621
q8	9342	1955	1969	1955
q9	6661	6431	6419	6419
q10	6969	2300	2347	2300
q11	442	248	253	248
q12	407	221	224	221
q13	17778	2986	3021	2986
q14	248	215	214	214
q15	571	550	521	521
q16	678	627	631	627
q17	985	526	597	526
q18	7199	6702	6677	6677
q19	1388	1005	1075	1005
q20	603	285	287	285
q21	3971	3372	3207	3207
q22	1087	969	1019	969
Total cold run time: 110633 ms
Total hot run time: 41254 ms

----- Round 2, with runtime_filter_mode=off -----
q1	8044	7281	7201	7201
q2	335	243	229	229
q3	2952	2727	2766	2727
q4	1947	1707	1712	1707
q5	5414	5494	5493	5493
q6	226	140	142	140
q7	2091	1721	1719	1719
q8	3202	3405	3398	3398
q9	8534	8518	8521	8518
q10	3492	3420	3441	3420
q11	581	482	470	470
q12	776	588	588	588
q13	5786	2962	2997	2962
q14	312	265	265	265
q15	560	513	507	507
q16	702	662	663	662
q17	1783	1569	1583	1569
q18	7951	7498	7387	7387
q19	1654	1508	1497	1497
q20	2057	1799	1816	1799
q21	5338	5104	5244	5104
q22	1122	1005	1034	1005
Total cold run time: 64859 ms
Total hot run time: 58367 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191748 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f67239687a67f154270cd1c684913956bfcdf51c, data reload: false

query1	989	379	386	379
query2	6513	2106	2105	2105
query3	6699	213	227	213
query4	34673	23440	23501	23440
query5	4382	474	486	474
query6	251	164	166	164
query7	4612	312	312	312
query8	294	231	227	227
query9	9830	2731	2727	2727
query10	499	314	303	303
query11	18346	15121	15192	15121
query12	161	99	103	99
query13	1633	436	425	425
query14	10347	7153	7502	7153
query15	307	175	187	175
query16	8107	484	484	484
query17	1777	585	562	562
query18	2136	315	330	315
query19	358	157	155	155
query20	121	109	106	106
query21	220	109	106	106
query22	4734	4203	4024	4024
query23	34863	33928	34104	33928
query24	10975	2922	2854	2854
query25	700	413	422	413
query26	1372	166	164	164
query27	2855	303	299	299
query28	8233	2472	2465	2465
query29	907	451	439	439
query30	329	165	157	157
query31	1060	813	837	813
query32	98	62	68	62
query33	782	325	321	321
query34	953	499	510	499
query35	886	781	724	724
query36	1132	949	946	946
query37	162	97	88	88
query38	4114	3869	3813	3813
query39	1503	1421	1430	1421
query40	280	100	100	100
query41	54	50	53	50
query42	121	98	99	98
query43	541	489	479	479
query44	1271	819	828	819
query45	197	169	171	169
query46	1161	745	758	745
query47	1948	1843	1849	1843
query48	487	376	372	372
query49	1136	417	432	417
query50	813	423	431	423
query51	7156	6930	6985	6930
query52	99	93	93	93
query53	264	215	190	190
query54	1247	471	482	471
query55	78	78	87	78
query56	276	271	260	260
query57	1231	1097	1096	1096
query58	248	224	228	224
query59	3332	3034	3156	3034
query60	294	278	280	278
query61	106	107	99	99
query62	875	672	680	672
query63	223	199	187	187
query64	5305	652	678	652
query65	3288	3197	3201	3197
query66	1451	307	290	290
query67	15882	15599	15598	15598
query68	4360	581	586	581
query69	446	308	307	307
query70	1192	1144	1126	1126
query71	336	273	278	273
query72	6359	4003	3964	3964
query73	785	357	350	350
query74	9999	9042	8966	8966
query75	3368	2685	2680	2680
query76	2744	968	990	968
query77	428	307	295	295
query78	9991	9271	9298	9271
query79	1577	602	618	602
query80	1431	465	445	445
query81	554	245	244	244
query82	895	144	151	144
query83	217	140	141	140
query84	247	75	76	75
query85	1318	297	284	284
query86	411	305	305	305
query87	4461	4385	4262	4262
query88	3312	2465	2424	2424
query89	407	293	293	293
query90	2117	211	197	197
query91	176	143	144	143
query92	68	51	56	51
query93	1546	579	561	561
query94	1153	301	294	294
query95	371	260	262	260
query96	623	285	299	285
query97	3242	3150	3132	3132
query98	217	210	197	197
query99	1654	1290	1318	1290
Total cold run time: 303282 ms
Total hot run time: 191748 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f67239687a67f154270cd1c684913956bfcdf51c, data reload: false

query1	0.04	0.05	0.04
query2	0.06	0.03	0.03
query3	0.23	0.06	0.07
query4	1.64	0.10	0.10
query5	0.49	0.51	0.51
query6	1.15	0.75	0.74
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.57	0.50	0.51
query10	0.55	0.58	0.54
query11	0.14	0.10	0.10
query12	0.14	0.12	0.11
query13	0.61	0.60	0.60
query14	3.07	2.96	3.04
query15	0.90	0.82	0.81
query16	0.40	0.39	0.37
query17	1.05	1.06	1.05
query18	0.24	0.22	0.22
query19	1.85	1.78	2.00
query20	0.01	0.01	0.01
query21	15.36	0.58	0.57
query22	2.46	1.78	2.63
query23	17.20	0.98	0.85
query24	2.67	1.05	1.14
query25	0.23	0.24	0.05
query26	0.47	0.14	0.15
query27	0.04	0.04	0.04
query28	10.75	1.09	1.06
query29	12.53	3.32	3.25
query30	0.25	0.06	0.06
query31	2.88	0.38	0.38
query32	3.26	0.45	0.46
query33	2.98	3.03	3.07
query34	16.88	4.50	4.45
query35	4.52	4.50	4.50
query36	0.69	0.47	0.49
query37	0.08	0.07	0.06
query38	0.04	0.04	0.03
query39	0.03	0.03	0.02
query40	0.16	0.13	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.84 s
Total hot run time: 33 s

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 26, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@zfr9527 zfr9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jibing-Li Jibing-Li merged commit 8e33cda into apache:master Sep 26, 2024
@Jibing-Li Jibing-Li deleted the reduceMem branch September 26, 2024 08:47
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Sep 26, 2024
…sumption. (apache#41203)

For string type columns, use xxhash_64 to transfer column value to an
integer, and then calculate the NDV based on the integer hash value. In
this case, we can reduce the memory cost of sample analyze and improve
the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory
cost to calculate its NDV is reduced to 8GB from 22GB
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Sep 27, 2024
…sumption. (apache#41203)

For string type columns, use xxhash_64 to transfer column value to an
integer, and then calculate the NDV based on the integer hash value. In
this case, we can reduce the memory cost of sample analyze and improve
the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory
cost to calculate its NDV is reduced to 8GB from 22GB
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Sep 27, 2024
…sumption. (apache#41203)

For string type columns, use xxhash_64 to transfer column value to an
integer, and then calculate the NDV based on the integer hash value. In
this case, we can reduce the memory cost of sample analyze and improve
the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory
cost to calculate its NDV is reduced to 8GB from 22GB
Jibing-Li added a commit that referenced this pull request Sep 27, 2024
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Oct 8, 2024
…sumption. (apache#41203)

For string type columns, use xxhash_64 to transfer column value to an
integer, and then calculate the NDV based on the integer hash value. In
this case, we can reduce the memory cost of sample analyze and improve
the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory
cost to calculate its NDV is reduced to 8GB from 22GB
Jibing-Li added a commit that referenced this pull request Oct 8, 2024
dataroaring pushed a commit that referenced this pull request Oct 9, 2024
…sumption. (#41203)

For string type columns, use xxhash_64 to transfer column value to an
integer, and then calculate the NDV based on the integer hash value. In
this case, we can reduce the memory cost of sample analyze and improve
the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory
cost to calculate its NDV is reduced to 8GB from 22GB
cjj2010 pushed a commit to cjj2010/doris that referenced this pull request Oct 12, 2024
…sumption. (apache#41203)

For string type columns, use xxhash_64 to transfer column value to an
integer, and then calculate the NDV based on the integer hash value. In
this case, we can reduce the memory cost of sample analyze and improve
the performance.
For example, l_comment column of TPCH 100G lineitem table. The memory
cost to calculate its NDV is reduced to 8GB from 22GB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants