Skip to content

Conversation

@platoneko
Copy link
Contributor

@platoneko platoneko commented Apr 15, 2024

Proposed changes

Fix hdfs file writer

  • Sync hdfs file when close file writer by default
  • Fix leaky hdfs file handler if deconstruct file writers which have not been closed

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dataroaring
Copy link
Contributor

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dataroaring
Copy link
Contributor

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.55% (8904/25046)
Line Coverage: 27.28% (73135/268108)
Region Coverage: 26.40% (37824/143255)
Branch Coverage: 23.18% (19271/83138)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d03f811cf4901205940abc660e156c68167ccb74_d03f811cf4901205940abc660e156c68167ccb74/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38366 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d03f811cf4901205940abc660e156c68167ccb74, data reload: false

------ Round 1 ----------------------------------
q1	17625	4333	4277	4277
q2	2019	183	183	183
q3	10473	1143	1139	1139
q4	10200	815	745	745
q5	7510	2675	2628	2628
q6	217	129	131	129
q7	997	609	579	579
q8	9231	2074	2024	2024
q9	7338	6573	6522	6522
q10	8520	3527	3517	3517
q11	446	227	227	227
q12	462	214	216	214
q13	18579	2922	2906	2906
q14	272	227	237	227
q15	523	482	480	480
q16	527	377	372	372
q17	956	622	731	622
q18	7332	6719	6732	6719
q19	6839	1545	1501	1501
q20	638	308	304	304
q21	3482	2761	2780	2761
q22	361	290	299	290
Total cold run time: 114547 ms
Total hot run time: 38366 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4334	4188	4193	4188
q2	371	281	268	268
q3	2950	2727	2690	2690
q4	1866	1567	1569	1567
q5	5318	5292	5346	5292
q6	209	122	124	122
q7	2227	1873	1867	1867
q8	3190	3338	3348	3338
q9	8570	8548	8666	8548
q10	4064	3813	4019	3813
q11	610	498	496	496
q12	797	639	632	632
q13	16165	3198	3115	3115
q14	309	289	288	288
q15	513	475	484	475
q16	514	429	436	429
q17	1819	1559	1518	1518
q18	8023	7923	7908	7908
q19	1663	1550	1544	1544
q20	2108	1863	1826	1826
q21	9570	4906	4941	4906
q22	541	468	464	464
Total cold run time: 75731 ms
Total hot run time: 55294 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184193 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d03f811cf4901205940abc660e156c68167ccb74, data reload: false

query1	892	368	368	368
query2	6381	2622	2302	2302
query3	6656	203	203	203
query4	23483	21418	21288	21288
query5	4201	408	413	408
query6	289	183	183	183
query7	4596	291	287	287
query8	223	175	180	175
query9	8502	2334	2299	2299
query10	403	234	253	234
query11	14802	14386	14135	14135
query12	134	86	84	84
query13	1622	351	358	351
query14	9195	7642	8078	7642
query15	264	181	193	181
query16	8240	289	249	249
query17	1973	580	544	544
query18	2090	273	266	266
query19	323	148	155	148
query20	89	82	82	82
query21	197	127	126	126
query22	4978	4837	4818	4818
query23	33739	33195	33484	33195
query24	10966	3087	3066	3066
query25	622	405	390	390
query26	703	157	166	157
query27	2250	358	357	357
query28	6080	2100	2066	2066
query29	872	620	642	620
query30	290	183	188	183
query31	969	777	742	742
query32	93	51	53	51
query33	666	257	248	248
query34	903	476	499	476
query35	843	713	712	712
query36	1053	949	917	917
query37	114	69	67	67
query38	3523	3318	3308	3308
query39	1613	1562	1588	1562
query40	173	128	128	128
query41	46	49	49	49
query42	109	97	101	97
query43	593	549	551	549
query44	1113	755	768	755
query45	309	270	274	270
query46	1099	766	729	729
query47	2022	1940	1940	1940
query48	366	304	307	304
query49	815	398	382	382
query50	791	421	388	388
query51	6870	6860	6728	6728
query52	108	89	97	89
query53	362	272	275	272
query54	298	233	233	233
query55	79	72	76	72
query56	274	226	239	226
query57	1211	1145	1168	1145
query58	225	204	210	204
query59	3641	3071	3138	3071
query60	285	232	234	232
query61	88	85	86	85
query62	624	443	444	443
query63	303	280	275	275
query64	4922	3922	3854	3854
query65	3081	3031	3031	3031
query66	743	330	331	330
query67	15438	15053	14864	14864
query68	5081	524	524	524
query69	497	297	296	296
query70	1213	1174	1158	1158
query71	414	275	269	269
query72	6338	2612	2438	2438
query73	708	310	311	310
query74	6858	6405	6273	6273
query75	3371	2713	2683	2683
query76	2996	959	1033	959
query77	572	264	265	264
query78	10856	10206	10115	10115
query79	5999	526	512	512
query80	1683	432	439	432
query81	513	242	246	242
query82	1260	99	99	99
query83	318	172	162	162
query84	268	80	83	80
query85	1709	273	300	273
query86	451	292	302	292
query87	3475	3306	3331	3306
query88	4802	2333	2338	2333
query89	480	377	381	377
query90	1836	180	180	180
query91	125	96	93	93
query92	56	47	48	47
query93	5591	492	494	492
query94	1082	177	180	177
query95	392	291	303	291
query96	606	261	261	261
query97	3142	2954	2948	2948
query98	227	222	222	222
query99	1253	850	853	850
Total cold run time: 284992 ms
Total hot run time: 184193 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d03f811cf4901205940abc660e156c68167ccb74, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.03
query3	0.22	0.05	0.05
query4	1.63	0.08	0.07
query5	0.49	0.49	0.51
query6	1.44	0.72	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.53	0.49	0.51
query10	0.56	0.55	0.56
query11	0.15	0.12	0.11
query12	0.13	0.12	0.12
query13	0.59	0.58	0.58
query14	0.76	0.78	0.78
query15	0.82	0.80	0.82
query16	0.36	0.37	0.37
query17	1.01	1.04	1.00
query18	0.23	0.24	0.22
query19	1.78	1.65	1.67
query20	0.01	0.01	0.01
query21	15.40	0.64	0.65
query22	4.22	6.53	2.87
query23	18.28	1.40	1.26
query24	2.06	0.22	0.20
query25	0.14	0.09	0.08
query26	0.27	0.16	0.16
query27	0.08	0.08	0.08
query28	13.27	1.00	0.99
query29	12.68	3.26	3.26
query30	0.28	0.07	0.07
query31	2.84	0.38	0.38
query32	3.27	0.46	0.46
query33	2.83	2.82	2.91
query34	17.09	4.37	4.38
query35	4.45	4.49	4.54
query36	0.65	0.46	0.45
query37	0.19	0.17	0.16
query38	0.15	0.13	0.14
query39	0.04	0.03	0.04
query40	0.19	0.14	0.17
query41	0.09	0.05	0.04
query42	0.06	0.05	0.04
query43	0.04	0.04	0.03
Total cold run time: 109.47 s
Total hot run time: 31.27 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit d03f811cf4901205940abc660e156c68167ccb74 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       13.4 seconds inserted 10000000 Rows, about 746K ops/s

@platoneko
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.50% (8905/25083)
Line Coverage: 27.23% (73143/268650)
Region Coverage: 26.35% (37816/143518)
Branch Coverage: 23.13% (19270/83320)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6a5f24492e27d0eb6cd4621b11ee0e48a8fd57c9_6a5f24492e27d0eb6cd4621b11ee0e48a8fd57c9/report/index.html

@platoneko
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.50% (8908/25092)
Line Coverage: 27.22% (73165/268745)
Region Coverage: 26.35% (37826/143555)
Branch Coverage: 23.13% (19280/83338)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6db0397b6c75e27d43e6c10657ad64ebf500f867_6db0397b6c75e27d43e6c10657ad64ebf500f867/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38488 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6db0397b6c75e27d43e6c10657ad64ebf500f867, data reload: false

------ Round 1 ----------------------------------
q1	17588	4544	4453	4453
q2	1999	184	175	175
q3	10469	1149	1171	1149
q4	10184	779	807	779
q5	7528	2665	2628	2628
q6	217	129	129	129
q7	1006	605	599	599
q8	9229	2032	2035	2032
q9	7322	6984	6486	6486
q10	8576	3507	3481	3481
q11	447	232	226	226
q12	422	222	211	211
q13	18743	2906	2952	2906
q14	279	238	233	233
q15	506	467	473	467
q16	519	399	380	380
q17	957	659	737	659
q18	7251	6770	6684	6684
q19	6475	1496	1509	1496
q20	647	328	308	308
q21	3439	2709	2829	2709
q22	368	298	304	298
Total cold run time: 114171 ms
Total hot run time: 38488 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4341	4201	4170	4170
q2	378	279	263	263
q3	2993	2703	2722	2703
q4	1834	1565	1558	1558
q5	5357	5312	5304	5304
q6	209	152	123	123
q7	2182	1869	1852	1852
q8	3190	3360	3289	3289
q9	8555	8513	8528	8513
q10	3934	3715	3689	3689
q11	567	488	484	484
q12	763	588	590	588
q13	17469	2998	2954	2954
q14	311	280	270	270
q15	501	464	468	464
q16	469	418	432	418
q17	1766	1459	1440	1440
q18	8007	7880	8166	7880
q19	1693	1542	1559	1542
q20	2002	1855	1854	1854
q21	10585	5056	4993	4993
q22	573	485	441	441
Total cold run time: 77679 ms
Total hot run time: 54792 ms

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 17, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 7408df2 into apache:master Apr 17, 2024
@caizj
Copy link

caizj commented Apr 18, 2024

This PR makes [BE UT (macOS)] fail:
BE UT (macOS):

/Users/runner/work/doris/doris/be/src/io/fs/hdfs_file_writer.cpp:63:19: error: use of undeclared identifier 'hdfsHSync'; did you mean 'hdfsSync'?
int ret = hdfsHSync(_hdfs_handler->hdfs_fs, _hdfs_file);
^~~~~~~~~
hdfsSync

morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Apr 18, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR apache#33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR apache#33265
morrySnow added a commit that referenced this pull request Apr 18, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR #33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR #33265
dataroaring pushed a commit to dataroaring/incubator-doris that referenced this pull request Apr 20, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR apache#33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR apache#33265
cambyzju added a commit to cambyzju/incubator-doris that referenced this pull request Apr 26, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR apache#33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR apache#33265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants