Skip to content

Conversation

@freemandealer
Copy link
Contributor

The initialization of the file cache involves asynchronous loading logic and synchronous upgrade directories. The latter mainly handles the conversion from version1 to version2 format and some fallback logic for problematic directories, which involves a large number of directory traversals and can be very slow.

Previously, in PR #44429, we changed the initialization of multiple cache directories from parallel to serial to avoid the disorder caused by concurrent initialization, which led to a long cache initialization time and affected the startup speed of the BE.

We found that the upgrade directory is only meaningful during upgrades and does not need to be executed on every restart. Therefore, if we detect that the version file has been successfully written, we consider the cache directory to have completed the upgrade and skip these redundant directory traversals

Of course, we could further optimize the directory traversal process to make it asynchronous and not block the BE startup. However, this would result in three concurrent operations on the file system: asynchronous loading, asynchronous updating, and lazy loading on query. This would increase code complexity, the likelihood of errors, and the difficulty of troubleshooting. Considering that old clusters are not very common and that a cluster only needs to go through such an upgrade once in its lifecycle, we assessed that this optimization would have low cost-effectiveness and decided not to pursue it.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

The initialization of the file cache involves asynchronous loading logic and synchronous upgrade directories. The latter mainly handles the conversion from version1 to version2 format and some fallback logic for problematic directories, which involves a large number of directory traversals and can be very slow.

Previously, in PR apache#44429, we changed the initialization of multiple cache directories from parallel to serial to avoid the disorder caused by concurrent initialization, which led to a long cache initialization time and affected the startup speed of the BE.

We found that the upgrade directory is only meaningful during upgrades and does not need to be executed on every restart. Therefore, if we detect that the version file has been successfully written, we consider the cache directory to have completed the upgrade and skip these redundant directory traversals

Of course, we could further optimize the directory traversal process to make it asynchronous and not block the BE startup. However, this would result in three concurrent operations on the file system: asynchronous loading, asynchronous updating, and lazy loading on query. This would increase code complexity, the likelihood of errors, and the difficulty of troubleshooting. Considering that old clusters are not very common and that a cluster only needs to go through such an upgrade once in its lifecycle, we assessed that this optimization would have low cost-effectiveness and decided not to pursue it.

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Contributor Author

run buildall

1 similar comment
@freemandealer
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32557 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit db85f6018e9ad70d06bab584353b118e113732f6, data reload: false

------ Round 1 ----------------------------------
q1	17619	5201	5130	5130
q2	2044	301	168	168
q3	10455	1281	702	702
q4	10213	1016	543	543
q5	7559	2378	2370	2370
q6	193	169	136	136
q7	911	752	619	619
q8	9325	1309	1103	1103
q9	4927	4785	4868	4785
q10	6868	2303	1903	1903
q11	483	276	255	255
q12	353	353	214	214
q13	17749	3645	3070	3070
q14	239	222	209	209
q15	544	492	483	483
q16	634	610	566	566
q17	589	853	335	335
q18	6750	6431	6363	6363
q19	2122	948	550	550
q20	314	321	192	192
q21	2803	2169	1912	1912
q22	1083	1022	949	949
Total cold run time: 103777 ms
Total hot run time: 32557 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5283	5143	5133	5133
q2	238	343	238	238
q3	2142	2662	2288	2288
q4	1409	1833	1383	1383
q5	4224	4100	4162	4100
q6	206	162	123	123
q7	1886	1957	1813	1813
q8	2595	2579	2571	2571
q9	7160	7236	7168	7168
q10	2981	3197	2761	2761
q11	576	530	491	491
q12	675	757	580	580
q13	3511	3879	3193	3193
q14	275	302	270	270
q15	518	470	484	470
q16	653	674	629	629
q17	1141	1563	1358	1358
q18	7762	7505	7375	7375
q19	834	844	893	844
q20	2004	1992	1853	1853
q21	5394	4909	4944	4909
q22	1138	1123	1038	1038
Total cold run time: 52605 ms
Total hot run time: 50588 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184843 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit db85f6018e9ad70d06bab584353b118e113732f6, data reload: false

query1	974	389	379	379
query2	6517	1962	1946	1946
query3	6784	213	207	207
query4	26373	23157	23157	23157
query5	4318	666	487	487
query6	311	200	205	200
query7	4607	493	292	292
query8	296	266	229	229
query9	8606	2505	2512	2505
query10	458	344	258	258
query11	15300	14971	14916	14916
query12	163	111	105	105
query13	1661	525	407	407
query14	9057	7095	6265	6265
query15	202	191	176	176
query16	7338	640	507	507
query17	1213	712	583	583
query18	1949	416	301	301
query19	197	204	165	165
query20	123	116	118	116
query21	212	122	100	100
query22	4122	4428	4216	4216
query23	33839	32942	32891	32891
query24	7791	2405	2446	2405
query25	578	455	393	393
query26	1219	274	154	154
query27	2133	502	330	330
query28	3903	2396	2376	2376
query29	748	553	411	411
query30	281	232	191	191
query31	943	833	754	754
query32	73	65	65	65
query33	578	412	312	312
query34	802	880	506	506
query35	790	802	732	732
query36	967	996	900	900
query37	112	97	74	74
query38	4169	4101	4120	4101
query39	1432	1388	1393	1388
query40	204	116	105	105
query41	54	49	50	49
query42	131	107	103	103
query43	510	504	465	465
query44	1318	799	784	784
query45	174	176	163	163
query46	844	1026	635	635
query47	1724	1772	1696	1696
query48	379	412	307	307
query49	790	490	428	428
query50	731	757	427	427
query51	4179	4167	4148	4148
query52	103	105	100	100
query53	239	278	186	186
query54	498	487	392	392
query55	82	88	85	85
query56	259	263	243	243
query57	1136	1135	1067	1067
query58	242	241	235	235
query59	2698	2649	2584	2584
query60	293	267	263	263
query61	125	140	116	116
query62	795	730	702	702
query63	250	193	198	193
query64	4355	1008	652	652
query65	4374	4306	4295	4295
query66	1133	411	314	314
query67	15481	15666	15212	15212
query68	8473	878	503	503
query69	476	299	275	275
query70	1228	1137	1141	1137
query71	447	294	276	276
query72	5275	3442	3713	3442
query73	781	715	348	348
query74	8920	9091	8944	8944
query75	3800	3168	2683	2683
query76	3666	1166	756	756
query77	794	393	282	282
query78	9887	10193	9279	9279
query79	2526	818	585	585
query80	789	502	450	450
query81	478	269	215	215
query82	700	129	95	95
query83	212	174	162	162
query84	285	104	81	81
query85	805	351	308	308
query86	388	305	296	296
query87	4369	4533	4285	4285
query88	3463	2180	2215	2180
query89	385	324	288	288
query90	1896	196	190	190
query91	144	139	112	112
query92	77	63	54	54
query93	1795	1060	575	575
query94	659	392	305	305
query95	358	267	262	262
query96	479	607	267	267
query97	3269	3367	3312	3312
query98	218	217	198	198
query99	1452	1389	1249	1249
Total cold run time: 272734 ms
Total hot run time: 184843 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit db85f6018e9ad70d06bab584353b118e113732f6, data reload: false

query1	0.04	0.03	0.04
query2	0.07	0.04	0.04
query3	0.24	0.06	0.06
query4	1.62	0.10	0.10
query5	0.56	0.56	0.54
query6	1.20	0.71	0.71
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.55	0.52
query10	0.57	0.57	0.57
query11	0.15	0.11	0.11
query12	0.14	0.11	0.11
query13	0.61	0.59	0.59
query14	2.66	2.72	2.71
query15	0.92	0.84	0.85
query16	0.37	0.38	0.38
query17	1.02	1.00	1.00
query18	0.20	0.20	0.20
query19	1.89	1.80	1.93
query20	0.02	0.01	0.01
query21	15.38	0.90	0.54
query22	0.76	1.14	0.66
query23	14.98	1.34	0.64
query24	7.01	1.80	1.33
query25	0.49	0.27	0.12
query26	0.59	0.16	0.15
query27	0.04	0.05	0.04
query28	9.67	0.78	0.44
query29	12.59	3.99	3.26
query30	0.26	0.09	0.06
query31	2.82	0.58	0.38
query32	3.22	0.55	0.47
query33	2.98	3.06	2.98
query34	15.74	5.12	4.51
query35	4.57	4.54	4.51
query36	0.67	0.50	0.48
query37	0.10	0.06	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 105.19 s
Total hot run time: 31.24 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 13.46% (7/52) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.93% (12277/26727)
Line Coverage 35.39% (103655/292897)
Region Coverage 34.55% (53094/153667)
Branch Coverage 30.23% (26880/88912)

Status FSFileCacheStorage::upgrade_cache_dir_if_necessary() const {
/// version 1.0: cache_base_path / key / offset
/// version 2.0: cache_base_path / key_prefix / key / offset
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more log to show whether we need to "upgrade", print previous and current version before right after "read_file_cache_version"
add some stats like file count, time consumption and etc when upgrade

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
@freemandealer
Copy link
Contributor Author

run buildall

1 similar comment
@freemandealer
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 6, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2025

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 32687 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0b45bab3c9bcaada3645b731ce83ea0217a3525a, data reload: false

------ Round 1 ----------------------------------
q1	17586	5202	5126	5126
q2	2044	296	164	164
q3	10422	1243	751	751
q4	10223	1024	543	543
q5	7570	2412	2356	2356
q6	186	173	133	133
q7	918	775	625	625
q8	9316	1289	1145	1145
q9	4975	4826	4705	4705
q10	6878	2319	1896	1896
q11	485	286	251	251
q12	349	353	227	227
q13	17768	3632	3073	3073
q14	222	216	207	207
q15	538	498	480	480
q16	627	626	593	593
q17	584	860	356	356
q18	6899	6369	6346	6346
q19	1524	949	559	559
q20	318	322	190	190
q21	2825	2196	1981	1981
q22	1036	1024	980	980
Total cold run time: 103293 ms
Total hot run time: 32687 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5208	5108	5090	5090
q2	235	332	229	229
q3	2169	2643	2298	2298
q4	1470	1865	1405	1405
q5	4250	4147	4161	4147
q6	217	163	126	126
q7	1886	1942	1773	1773
q8	2641	2618	2687	2618
q9	7274	7114	7212	7114
q10	3025	3232	2807	2807
q11	589	508	481	481
q12	697	777	585	585
q13	3446	3933	3252	3252
q14	279	296	279	279
q15	528	477	476	476
q16	641	674	638	638
q17	1150	1626	1333	1333
q18	7729	7644	7471	7471
q19	835	803	852	803
q20	1963	2040	1846	1846
q21	5480	4989	4755	4755
q22	1094	1073	1079	1073
Total cold run time: 52806 ms
Total hot run time: 50599 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191997 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0b45bab3c9bcaada3645b731ce83ea0217a3525a, data reload: false

query1	1385	1011	1030	1011
query2	6281	1894	1825	1825
query3	11032	4570	4399	4399
query4	25190	23797	23464	23464
query5	4678	666	485	485
query6	295	200	188	188
query7	3998	504	304	304
query8	292	243	232	232
query9	8510	2538	2556	2538
query10	509	323	247	247
query11	15442	15212	14899	14899
query12	167	112	104	104
query13	1568	516	409	409
query14	9783	6471	6281	6281
query15	207	197	173	173
query16	7660	634	471	471
query17	1228	785	605	605
query18	2034	440	351	351
query19	208	204	171	171
query20	125	124	121	121
query21	211	129	112	112
query22	4597	4932	4548	4548
query23	34315	33590	33378	33378
query24	7768	2426	2485	2426
query25	500	474	412	412
query26	1227	274	155	155
query27	2174	522	327	327
query28	4180	2464	2415	2415
query29	688	566	458	458
query30	282	230	218	218
query31	935	872	793	793
query32	79	67	63	63
query33	559	359	300	300
query34	792	869	507	507
query35	857	855	758	758
query36	966	1001	900	900
query37	118	105	74	74
query38	4134	4188	4125	4125
query39	1524	1453	1457	1453
query40	213	115	116	115
query41	54	53	55	53
query42	130	107	110	107
query43	516	512	479	479
query44	1293	802	812	802
query45	177	168	171	168
query46	866	1021	645	645
query47	1819	1866	1791	1791
query48	390	427	313	313
query49	751	519	432	432
query50	706	741	414	414
query51	4339	4307	4276	4276
query52	107	111	92	92
query53	235	261	203	203
query54	499	498	418	418
query55	95	84	85	84
query56	274	266	253	253
query57	1188	1185	1141	1141
query58	244	249	243	243
query59	2751	2868	2665	2665
query60	285	282	261	261
query61	119	117	118	117
query62	768	730	672	672
query63	223	220	196	196
query64	4128	999	698	698
query65	4518	4425	4451	4425
query66	990	400	305	305
query67	16161	15609	15297	15297
query68	9734	908	499	499
query69	470	342	269	269
query70	1213	1091	1090	1090
query71	463	298	261	261
query72	5141	3506	3842	3506
query73	797	737	347	347
query74	9333	9126	8946	8946
query75	4238	3179	2745	2745
query76	4051	1195	763	763
query77	999	382	307	307
query78	9897	10093	9257	9257
query79	1967	865	584	584
query80	689	520	451	451
query81	463	266	221	221
query82	248	125	91	91
query83	191	180	152	152
query84	281	95	71	71
query85	746	346	308	308
query86	324	294	287	287
query87	4526	4460	4449	4449
query88	2812	2206	2198	2198
query89	413	324	291	291
query90	2033	209	208	208
query91	138	137	112	112
query92	73	60	54	54
query93	1865	1068	578	578
query94	657	385	286	286
query95	347	263	250	250
query96	496	559	276	276
query97	3312	3402	3319	3319
query98	225	212	201	201
query99	1414	1397	1255	1255
Total cold run time: 279893 ms
Total hot run time: 191997 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0b45bab3c9bcaada3645b731ce83ea0217a3525a, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.25	0.06	0.06
query4	1.61	0.10	0.10
query5	0.55	0.56	0.54
query6	1.19	0.72	0.73
query7	0.03	0.01	0.02
query8	0.05	0.04	0.04
query9	0.58	0.54	0.53
query10	0.58	0.59	0.58
query11	0.16	0.10	0.11
query12	0.14	0.11	0.12
query13	0.61	0.60	0.60
query14	2.70	2.73	2.72
query15	0.90	0.84	0.85
query16	0.39	0.39	0.38
query17	1.02	1.07	1.05
query18	0.21	0.20	0.20
query19	1.88	1.85	1.97
query20	0.01	0.01	0.01
query21	15.35	0.90	0.54
query22	0.75	1.14	0.65
query23	15.00	1.38	0.57
query24	7.17	2.19	0.52
query25	0.49	0.19	0.08
query26	0.55	0.17	0.14
query27	0.05	0.05	0.04
query28	9.43	0.86	0.43
query29	12.53	4.03	3.35
query30	0.25	0.10	0.06
query31	2.82	0.59	0.38
query32	3.23	0.55	0.46
query33	2.97	3.10	2.97
query34	15.84	5.08	4.47
query35	4.53	4.51	4.52
query36	0.66	0.50	0.48
query37	0.09	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.13	0.12
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 105.1 s
Total hot run time: 30.46 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 23.88% (16/67) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 46.74% (12500/26745)
Line Coverage 36.34% (106507/293068)
Region Coverage 35.41% (54434/153725)
Branch Coverage 30.78% (27381/88944)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring added usercase Important user case type label and removed p0_c labels Mar 7, 2025
@dataroaring dataroaring merged commit dc626d1 into apache:master Mar 7, 2025
28 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 7, 2025
The initialization of the file cache involves asynchronous loading logic
and synchronous upgrade directories. The latter mainly handles the
conversion from version1 to version2 format and some fallback logic for
problematic directories, which involves a large number of directory
traversals and can be very slow.

Previously, in PR #44429, we changed the initialization of multiple
cache directories from parallel to serial to avoid the disorder caused
by concurrent initialization, which led to a long cache initialization
time and affected the startup speed of the BE.

We found that the upgrade directory is only meaningful during upgrades
and does not need to be executed on every restart. Therefore, if we
detect that the version file has been successfully written, we consider
the cache directory to have completed the upgrade and skip these
redundant directory traversals

Of course, we could further optimize the directory traversal process to
make it asynchronous and not block the BE startup. However, this would
result in three concurrent operations on the file system: asynchronous
loading, asynchronous updating, and lazy loading on query. This would
increase code complexity, the likelihood of errors, and the difficulty
of troubleshooting. Considering that old clusters are not very common
and that a cluster only needs to go through such an upgrade once in its
lifecycle, we assessed that this optimization would have low
cost-effectiveness and decided not to pursue it.

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
dataroaring pushed a commit that referenced this pull request Mar 10, 2025
…8798)

Cherry-picked from #48687

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
Co-authored-by: zhengyu <zhangzhengyu@selectdb.com>
@gavinchou gavinchou mentioned this pull request Apr 23, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
The initialization of the file cache involves asynchronous loading logic
and synchronous upgrade directories. The latter mainly handles the
conversion from version1 to version2 format and some fallback logic for
problematic directories, which involves a large number of directory
traversals and can be very slow.

Previously, in PR apache#44429, we changed the initialization of multiple
cache directories from parallel to serial to avoid the disorder caused
by concurrent initialization, which led to a long cache initialization
time and affected the startup speed of the BE.

We found that the upgrade directory is only meaningful during upgrades
and does not need to be executed on every restart. Therefore, if we
detect that the version file has been successfully written, we consider
the cache directory to have completed the upgrade and skip these
redundant directory traversals

Of course, we could further optimize the directory traversal process to
make it asynchronous and not block the BE startup. However, this would
result in three concurrent operations on the file system: asynchronous
loading, asynchronous updating, and lazy loading on query. This would
increase code complexity, the likelihood of errors, and the difficulty
of troubleshooting. Considering that old clusters are not very common
and that a cluster only needs to go through such an upgrade once in its
lifecycle, we assessed that this optimization would have low
cost-effectiveness and decided not to pursue it.

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.5-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants