Skip to content

Conversation

@freemandealer
Copy link
Contributor

pick #51776 pick #51776 pick #51776

this pr does the following:

make file cache downloader worker pool thread num configurable make warm up job split batch size configurable
split large file downloading task to smaller ones to maintain load balance between threads, thus improve concurrency use meta info to deduce size of inverted idx file size to reduce S3 HEAD ops some log print optimization
in our test, this opt can improve more than 3x file cache warm up performance

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

pick apache#51776 pick apache#51776 pick apache#51776

this pr does the following:

make file cache downloader worker pool thread num configurable
make warm up job split batch size configurable
split large file downloading task to smaller ones to maintain load balance between threads, thus improve concurrency
use meta info to deduce size of inverted idx file size to reduce S3 HEAD ops
some log print optimization
in our test, this opt can improve more than 3x file cache warm up performance

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b6061bada22f5a78377d84a08af09032cbed9c7a, data reload: false

------ Round 1 ----------------------------------
q1	17576	6937	6656	6656
q2	2089	167	168	167
q3	10652	1098	1169	1098
q4	10579	775	713	713
q5	7749	2885	2888	2885
q6	218	138	135	135
q7	985	627	625	625
q8	9359	2028	2042	2028
q9	6568	6410	6388	6388
q10	6988	2249	2321	2249
q11	476	271	264	264
q12	397	223	212	212
q13	17779	3007	2987	2987
q14	239	219	209	209
q15	508	455	464	455
q16	474	382	378	378
q17	999	593	551	551
q18	7559	6631	6654	6631
q19	1409	1144	1053	1053
q20	484	207	202	202
q21	3964	3205	3192	3192
q22	1123	987	957	957
Total cold run time: 108174 ms
Total hot run time: 40035 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6634	6652	6638	6638
q2	329	225	226	225
q3	2888	2675	2864	2675
q4	2073	1836	1799	1799
q5	5770	5802	5755	5755
q6	211	132	136	132
q7	2240	1837	1808	1808
q8	3360	3512	3861	3512
q9	8936	8758	8939	8758
q10	3562	3525	3520	3520
q11	616	495	505	495
q12	811	594	602	594
q13	8488	3134	3205	3134
q14	300	267	264	264
q15	521	465	462	462
q16	484	419	448	419
q17	1837	1656	1627	1627
q18	8229	7752	7634	7634
q19	1707	1631	1522	1522
q20	2104	1881	1843	1843
q21	5169	5050	4887	4887
q22	1097	1053	1014	1014
Total cold run time: 67366 ms
Total hot run time: 58717 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 198281 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b6061bada22f5a78377d84a08af09032cbed9c7a, data reload: false

query1	1281	917	932	917
query2	6370	1955	1999	1955
query3	10835	4605	4408	4408
query4	61242	28823	23903	23903
query5	5193	475	455	455
query6	441	195	192	192
query7	5493	308	313	308
query8	319	230	222	222
query9	8731	2571	2564	2564
query10	471	287	273	273
query11	17571	15249	15789	15249
query12	154	107	114	107
query13	1458	458	456	456
query14	9790	7451	7355	7355
query15	200	189	172	172
query16	6842	501	479	479
query17	1169	602	577	577
query18	1788	364	321	321
query19	215	161	160	160
query20	121	112	115	112
query21	217	109	112	109
query22	4696	4569	4748	4569
query23	34398	34181	34371	34181
query24	6135	2939	2894	2894
query25	548	426	434	426
query26	663	182	205	182
query27	2194	360	343	343
query28	3755	2195	2129	2129
query29	676	456	431	431
query30	246	157	158	157
query31	972	802	822	802
query32	65	64	60	60
query33	413	315	309	309
query34	917	527	518	518
query35	845	749	747	747
query36	1095	916	951	916
query37	110	70	67	67
query38	4069	4020	3998	3998
query39	1498	1467	1520	1467
query40	213	103	102	102
query41	50	62	47	47
query42	117	99	102	99
query43	551	504	495	495
query44	1167	827	821	821
query45	185	171	175	171
query46	1139	710	718	710
query47	1988	1873	1905	1873
query48	481	375	395	375
query49	715	417	411	411
query50	833	439	430	430
query51	7460	7275	7237	7237
query52	105	96	95	95
query53	274	194	197	194
query54	588	469	465	465
query55	81	76	77	76
query56	267	301	256	256
query57	1317	1258	1195	1195
query58	222	211	213	211
query59	3366	3121	3067	3067
query60	293	280	274	274
query61	123	117	153	117
query62	793	696	675	675
query63	233	190	199	190
query64	1403	661	666	661
query65	3260	3181	3203	3181
query66	720	298	294	294
query67	15767	15639	15515	15515
query68	4351	569	573	569
query69	442	277	275	275
query70	1127	1045	1096	1045
query71	345	255	255	255
query72	6331	4101	4015	4015
query73	745	353	363	353
query74	10364	8975	9025	8975
query75	3315	2666	2658	2658
query76	1975	1185	1098	1098
query77	479	278	274	274
query78	10532	9642	9566	9566
query79	2157	603	601	601
query80	1399	425	437	425
query81	513	218	214	214
query82	1261	88	91	88
query83	265	143	148	143
query84	282	77	79	77
query85	1040	348	300	300
query86	398	296	286	286
query87	4448	4265	4227	4227
query88	3719	2428	2425	2425
query89	415	290	294	290
query90	1964	188	189	188
query91	179	147	156	147
query92	61	51	60	51
query93	2849	547	547	547
query94	791	293	308	293
query95	367	270	262	262
query96	615	289	284	284
query97	3315	3132	3135	3132
query98	224	197	198	197
query99	1591	1330	1294	1294
Total cold run time: 314766 ms
Total hot run time: 198281 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.85 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b6061bada22f5a78377d84a08af09032cbed9c7a, data reload: false

query1	0.04	0.03	0.02
query2	0.07	0.03	0.03
query3	0.23	0.06	0.07
query4	1.63	0.10	0.10
query5	0.53	0.51	0.51
query6	1.13	0.73	0.72
query7	0.02	0.02	0.04
query8	0.04	0.03	0.04
query9	0.57	0.50	0.50
query10	0.55	0.54	0.56
query11	0.14	0.10	0.09
query12	0.14	0.12	0.11
query13	0.61	0.61	0.61
query14	0.78	0.80	0.80
query15	0.84	0.83	0.83
query16	0.38	0.38	0.38
query17	1.06	1.06	1.06
query18	0.22	0.22	0.23
query19	1.92	1.89	1.87
query20	0.02	0.01	0.01
query21	15.41	0.58	0.60
query22	2.42	2.61	1.56
query23	16.93	0.92	0.78
query24	2.75	0.87	1.80
query25	0.18	0.16	0.09
query26	0.52	0.14	0.13
query27	0.04	0.04	0.05
query28	10.18	0.50	0.48
query29	12.62	3.24	3.27
query30	0.25	0.06	0.06
query31	2.85	0.41	0.39
query32	3.23	0.47	0.47
query33	2.99	2.95	3.01
query34	17.27	4.43	4.46
query35	4.52	4.49	4.56
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.07	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 104.21 s
Total hot run time: 29.85 s

@freemandealer
Copy link
Contributor Author

run beut

@gavinchou gavinchou changed the title [optimization](filecache) speed up filecache warm up [optimization](filecache) speed up filecache warm up #51776 Jul 1, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 1, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 8c6809b into apache:branch-3.0 Jul 1, 2025
23 of 26 checks passed
koarz pushed a commit to koarz/doris that referenced this pull request Jul 3, 2025
…ache#52556)

pick apache#51776 pick apache#51776 pick apache#51776

this pr does the following:

make file cache downloader worker pool thread num configurable make warm
up job split batch size configurable
split large file downloading task to smaller ones to maintain load
balance between threads, thus improve concurrency use meta info to
deduce size of inverted idx file size to reduce S3 HEAD ops some log
print optimization
in our test, this opt can improve more than 3x file cache warm up
performance


Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants