Skip to content

Conversation

@eldenmoon
Copy link
Member

…tself

Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…tself

Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@eldenmoon eldenmoon requested a review from xinyiZzz April 1, 2024 04:16
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon eldenmoon changed the title [Optimize] Move strings_pool from individual tree nodes to the tree i… [Optimize](Variant) Move strings_pool from individual tree nodes to the tree i… Apr 1, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39072 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a1f12d94bbc5acd95da8b584999fe88fdf7e8166, data reload: false

------ Round 1 ----------------------------------
q1	17932	4278	4195	4195
q2	2696	202	195	195
q3	11843	1251	1474	1251
q4	10518	893	1034	893
q5	8008	3030	3004	3004
q6	220	136	134	134
q7	1143	646	664	646
q8	9643	2115	2075	2075
q9	6719	6239	6208	6208
q10	8441	3553	3506	3506
q11	424	256	230	230
q12	393	216	210	210
q13	17796	2902	2922	2902
q14	274	238	235	235
q15	535	480	470	470
q16	489	387	385	385
q17	960	944	913	913
q18	7466	6570	6405	6405
q19	1612	1527	1550	1527
q20	636	303	313	303
q21	3546	3077	3125	3077
q22	371	308	318	308
Total cold run time: 111665 ms
Total hot run time: 39072 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4065	4045	4053	4045
q2	330	214	219	214
q3	2960	2955	2965	2955
q4	1896	1846	1891	1846
q5	5240	5234	5201	5201
q6	209	121	124	121
q7	2257	1800	1781	1781
q8	3225	3285	3293	3285
q9	8475	8518	8479	8479
q10	3762	3823	3830	3823
q11	535	447	435	435
q12	717	563	596	563
q13	11944	2960	2915	2915
q14	283	264	260	260
q15	508	480	475	475
q16	470	410	421	410
q17	1713	1672	1660	1660
q18	7706	7353	7212	7212
q19	1639	1643	1643	1643
q20	1950	1715	1714	1714
q21	5011	4783	4717	4717
q22	511	427	435	427
Total cold run time: 65406 ms
Total hot run time: 54181 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181420 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a1f12d94bbc5acd95da8b584999fe88fdf7e8166, data reload: false

query1	922	1116	1122	1116
query2	6476	1805	1827	1805
query3	6662	213	215	213
query4	24583	21510	21568	21510
query5	4222	396	404	396
query6	276	179	182	179
query7	4604	313	303	303
query8	236	181	187	181
query9	8490	2244	2240	2240
query10	561	252	266	252
query11	15235	14542	14514	14514
query12	153	101	96	96
query13	1655	413	392	392
query14	8636	7051	6862	6862
query15	213	179	187	179
query16	7151	278	282	278
query17	994	613	571	571
query18	2168	298	302	298
query19	219	167	187	167
query20	98	97	98	97
query21	199	138	139	138
query22	4943	4813	4747	4747
query23	33479	32804	33008	32804
query24	12587	3146	3095	3095
query25	692	406	408	406
query26	1907	165	168	165
query27	3062	330	341	330
query28	6799	1848	1837	1837
query29	1329	609	607	607
query30	300	162	152	152
query31	993	740	751	740
query32	103	64	64	64
query33	738	274	261	261
query34	1045	486	512	486
query35	828	702	707	702
query36	996	877	861	861
query37	282	76	82	76
query38	3582	3358	3423	3358
query39	1585	1580	1538	1538
query40	302	140	136	136
query41	50	47	47	47
query42	108	112	103	103
query43	431	394	388	388
query44	1107	722	714	714
query45	291	268	268	268
query46	1087	783	764	764
query47	1900	1775	1777	1775
query48	381	307	307	307
query49	1174	377	383	377
query50	808	392	408	392
query51	6845	6681	6742	6681
query52	117	93	106	93
query53	361	303	300	300
query54	334	242	251	242
query55	91	92	85	85
query56	251	227	234	227
query57	1215	1128	1121	1121
query58	254	233	232	232
query59	2513	2251	2368	2251
query60	269	247	246	246
query61	122	116	110	110
query62	701	471	456	456
query63	314	289	292	289
query64	6532	3415	3199	3199
query65	3094	3036	3016	3016
query66	1460	360	339	339
query67	15674	14779	14683	14683
query68	9044	570	583	570
query69	567	343	335	335
query70	1402	1102	1081	1081
query71	496	274	271	271
query72	6483	2573	2449	2449
query73	1525	326	339	326
query74	6818	6292	6313	6292
query75	3639	2266	2304	2266
query76	5324	1131	1210	1131
query77	618	260	263	260
query78	10784	10174	10100	10100
query79	8488	542	555	542
query80	1180	428	422	422
query81	490	231	224	224
query82	421	101	100	100
query83	218	166	185	166
query84	269	92	93	92
query85	1024	302	293	293
query86	362	298	280	280
query87	3696	3516	3491	3491
query88	3156	2310	2298	2298
query89	558	398	376	376
query90	1997	179	185	179
query91	139	111	105	105
query92	65	53	52	52
query93	4286	545	521	521
query94	1327	192	192	192
query95	442	335	329	329
query96	595	275	273	273
query97	2641	2499	2476	2476
query98	237	217	211	211
query99	1204	860	874	860
Total cold run time: 296284 ms
Total hot run time: 181420 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.56% (8843/24868)
Line Coverage: 27.29% (72521/265784)
Region Coverage: 26.49% (37540/141689)
Branch Coverage: 23.31% (19145/82148)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a1f12d94bbc5acd95da8b584999fe88fdf7e8166_a1f12d94bbc5acd95da8b584999fe88fdf7e8166/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a1f12d94bbc5acd95da8b584999fe88fdf7e8166, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.05
query3	0.24	0.04	0.05
query4	1.69	0.07	0.07
query5	0.50	0.48	0.50
query6	1.13	0.66	0.66
query7	0.02	0.01	0.02
query8	0.06	0.04	0.04
query9	0.56	0.51	0.52
query10	0.55	0.57	0.56
query11	0.14	0.11	0.11
query12	0.13	0.11	0.11
query13	0.60	0.60	0.59
query14	0.78	0.76	0.80
query15	0.86	0.83	0.83
query16	0.36	0.35	0.36
query17	0.98	0.98	0.96
query18	0.24	0.27	0.26
query19	1.83	1.70	1.73
query20	0.02	0.01	0.01
query21	15.54	0.76	0.67
query22	3.01	4.87	2.13
query23	17.71	1.38	1.06
query24	1.45	0.22	0.39
query25	0.11	0.10	0.09
query26	0.27	0.17	0.17
query27	0.08	0.09	0.09
query28	13.61	0.94	0.96
query29	12.61	3.30	3.25
query30	0.28	0.10	0.09
query31	2.79	0.40	0.40
query32	3.27	0.46	0.47
query33	2.84	2.87	2.91
query34	15.51	4.39	4.38
query35	4.40	4.39	4.37
query36	0.67	0.48	0.47
query37	0.19	0.18	0.16
query38	0.18	0.15	0.17
query39	0.05	0.04	0.04
query40	0.18	0.16	0.16
query41	0.09	0.06	0.04
query42	0.06	0.06	0.05
query43	0.04	0.04	0.04
Total cold run time: 105.75 s
Total hot run time: 30.42 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit a1f12d94bbc5acd95da8b584999fe88fdf7e8166 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       15.5 seconds inserted 10000000 Rows, about 645K ops/s

Copy link
Contributor

@xinyiZzz xinyiZzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 1, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2024

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2024

PR approved by anyone and no changes requested.

@xiaokang xiaokang changed the title [Optimize](Variant) Move strings_pool from individual tree nodes to the tree i… [Optimize](Variant) Move strings_pool from individual tree nodes to the tree itself Apr 2, 2024
Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaokang xiaokang merged commit 19e0bbf into apache:master Apr 2, 2024
yiguolei pushed a commit that referenced this pull request Apr 10, 2024
…tself (#33089)

Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.
yiguolei pushed a commit that referenced this pull request Apr 10, 2024
…tself (#33089)

Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed variant

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants