Skip to content

Conversation

@zxealous
Copy link
Contributor

@zxealous zxealous commented Feb 1, 2024

Proposed changes

Issue Number: close #30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7, version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2024

clang-tidy review says "All clean, LGTM! 👍"

@wm1581066 wm1581066 added dev/2.0.5 usercase Important user case type label labels Feb 1, 2024
@hello-stephen
Copy link
Contributor

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 37078 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 19545d02a7be8046eb78f1ed6a6484cec888f63d, data reload: false

------ Round 1 ----------------------------------
q1	17751	4959	4545	4545
q2	2313	144	134	134
q3	10670	944	953	944
q4	4657	781	730	730
q5	7681	3203	2751	2751
q6	184	120	119	119
q7	1137	728	717	717
q8	9300	2022	2021	2021
q9	7185	6352	6325	6325
q10	8080	2412	2402	2402
q11	421	198	212	198
q12	770	298	282	282
q13	17999	3355	3310	3310
q14	270	240	239	239
q15	531	487	484	484
q16	459	412	424	412
q17	941	581	533	533
q18	6883	5975	5861	5861
q19	1572	1370	1424	1370
q20	621	348	346	346
q21	6683	3171	3064	3064
q22	782	322	291	291
Total cold run time: 106890 ms
Total hot run time: 37078 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4515	4472	4472	4472
q2	318	248	232	232
q3	2998	2877	2829	2829
q4	1884	1633	1661	1633
q5	5217	5188	5242	5188
q6	191	114	114	114
q7	2143	1777	1791	1777
q8	3097	3243	3319	3243
q9	8314	8340	8312	8312
q10	5794	3553	3589	3553
q11	539	452	479	452
q12	736	575	605	575
q13	13742	3087	3082	3082
q14	278	259	268	259
q15	528	492	481	481
q16	494	479	480	479
q17	1846	1686	1746	1686
q18	8045	7731	7520	7520
q19	8975	1539	1549	1539
q20	2125	1905	1916	1905
q21	4931	4922	4539	4539
q22	600	469	497	469
Total cold run time: 77310 ms
Total hot run time: 54339 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174837 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 19545d02a7be8046eb78f1ed6a6484cec888f63d, data reload: false

query1	934	345	329	329
query2	6554	2079	1878	1878
query3	6697	201	199	199
query4	31429	22116	22121	22116
query5	4452	406	394	394
query6	244	156	154	154
query7	4597	259	258	258
query8	251	169	175	169
query9	8973	2260	2246	2246
query10	414	209	193	193
query11	18829	15317	15431	15317
query12	123	67	66	66
query13	1638	379	365	365
query14	9531	7365	7402	7365
query15	257	178	185	178
query16	7435	271	251	251
query17	1867	497	473	473
query18	1911	252	248	248
query19	343	133	131	131
query20	77	69	66	66
query21	205	133	126	126
query22	4931	4763	4656	4656
query23	31211	30368	30218	30218
query24	7635	2805	2802	2802
query25	507	305	303	303
query26	710	140	139	139
query27	2132	282	277	277
query28	5270	1846	1816	1816
query29	902	615	607	607
query30	278	133	136	133
query31	902	699	733	699
query32	94	56	51	51
query33	477	208	207	207
query34	835	445	460	445
query35	870	793	748	748
query36	1220	1155	1216	1155
query37	93	61	57	57
query38	3370	3173	3146	3146
query39	1291	1264	1254	1254
query40	192	90	82	82
query41	36	38	35	35
query42	82	81	84	81
query43	538	509	496	496
query44	1038	678	694	678
query45	195	179	175	175
query46	1049	639	629	629
query47	1636	1512	1512	1512
query48	389	302	303	302
query49	1087	289	272	272
query50	691	303	310	303
query51	5229	5176	5154	5154
query52	99	90	77	77
query53	318	259	253	253
query54	220	177	188	177
query55	78	75	78	75
query56	181	166	163	163
query57	958	889	899	889
query58	178	152	158	152
query59	2585	2350	2336	2336
query60	190	172	179	172
query61	81	82	87	82
query62	598	362	354	354
query63	279	268	259	259
query64	4718	3773	3288	3288
query65	3270	3244	3215	3215
query66	941	313	308	308
query67	14426	14188	13949	13949
query68	4362	483	485	483
query69	436	296	312	296
query70	1525	1551	1508	1508
query71	291	217	212	212
query72	5802	3117	2837	2837
query73	679	307	313	307
query74	6648	6246	6264	6246
query75	2966	2287	2247	2247
query76	2521	1004	919	919
query77	363	233	222	222
query78	9096	8741	8475	8475
query79	2631	499	499	499
query80	2114	315	325	315
query81	524	195	198	195
query82	836	86	77	77
query83	254	117	116	116
query84	287	73	69	69
query85	1975	354	329	329
query86	515	385	411	385
query87	3451	3232	3251	3232
query88	3750	2173	2174	2173
query89	424	346	351	346
query90	1930	185	184	184
query91	172	114	117	114
query92	55	45	42	42
query93	3887	416	411	411
query94	1272	163	156	156
query95	500	449	447	447
query96	617	316	322	316
query97	4239	4103	4119	4103
query98	216	188	189	188
query99	1198	685	717	685
Total cold run time: 276307 ms
Total hot run time: 174837 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.41 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 19545d02a7be8046eb78f1ed6a6484cec888f63d, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.02
query3	0.24	0.07	0.06
query4	1.68	0.10	0.09
query5	0.52	0.51	0.51
query6	1.19	0.65	0.65
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.55	0.51	0.48
query10	0.55	0.53	0.55
query11	0.12	0.08	0.09
query12	0.10	0.08	0.09
query13	0.60	0.60	0.61
query14	0.80	0.79	0.79
query15	0.80	0.79	0.80
query16	0.39	0.37	0.37
query17	1.01	0.99	1.01
query18	0.25	0.27	0.23
query19	1.86	1.80	1.73
query20	0.01	0.02	0.02
query21	15.40	0.58	0.58
query22	2.66	2.35	2.45
query23	17.24	0.70	0.71
query24	3.17	1.02	0.90
query25	0.34	0.18	0.04
query26	0.54	0.14	0.13
query27	0.06	0.06	0.05
query28	11.41	0.85	0.83
query29	12.48	3.22	3.28
query30	0.65	0.55	0.56
query31	2.78	0.36	0.36
query32	3.36	0.48	0.49
query33	3.21	3.21	3.21
query34	15.85	4.31	4.36
query35	4.37	4.31	4.32
query36	1.09	1.05	1.06
query37	0.06	0.04	0.05
query38	0.04	0.02	0.02
query39	0.02	0.02	0.02
query40	0.17	0.13	0.13
query41	0.06	0.01	0.02
query42	0.02	0.01	0.02
query43	0.03	0.02	0.02
Total cold run time: 105.83 s
Total hot run time: 31.41 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 19545d02a7be8046eb78f1ed6a6484cec888f63d with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       14.0 seconds inserted 10000000 Rows, about 714K ops/s

@zxealous zxealous force-pushed the fix-outfile-empty-data branch from 19545d0 to 5e0e28f Compare February 2, 2024 09:31
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2024

clang-tidy review says "All clean, LGTM! 👍"

@zxealous
Copy link
Contributor Author

zxealous commented Feb 4, 2024

run buildall

@zxealous zxealous changed the title [fix-wip](outfile) Fix unable to export empty data [fix](outfile) Fix unable to export empty data Feb 4, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 37046 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5e0e28f37868fea0d4bceb71280f9ce8d64a4cac, data reload: false

------ Round 1 ----------------------------------
q1	17637	4549	4529	4529
q2	2035	140	136	136
q3	10597	905	949	905
q4	4648	718	725	718
q5	7720	2849	2804	2804
q6	185	120	121	120
q7	1161	727	727	727
q8	9340	2027	2064	2027
q9	7324	6337	6359	6337
q10	8116	2425	2458	2425
q11	415	209	209	209
q12	814	273	266	266
q13	18036	3294	3253	3253
q14	291	243	251	243
q15	531	493	478	478
q16	490	403	404	403
q17	966	534	518	518
q18	6880	6051	5935	5935
q19	1570	1362	1396	1362
q20	625	342	331	331
q21	7203	3094	3022	3022
q22	813	317	298	298
Total cold run time: 107397 ms
Total hot run time: 37046 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4446	4400	4469	4400
q2	332	229	240	229
q3	2975	2905	2814	2814
q4	1904	1720	1645	1645
q5	5175	5225	5274	5225
q6	187	113	115	113
q7	2171	1759	1788	1759
q8	3103	3222	3245	3222
q9	8340	8307	8297	8297
q10	5841	3525	3516	3516
q11	537	469	464	464
q12	735	600	588	588
q13	14572	3143	3078	3078
q14	281	243	261	243
q15	534	485	491	485
q16	503	475	467	467
q17	1839	1655	1650	1650
q18	8032	7630	7629	7629
q19	9713	1521	1488	1488
q20	2139	1891	1908	1891
q21	4950	4637	4552	4552
q22	558	474	455	455
Total cold run time: 78867 ms
Total hot run time: 54210 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.15% (8627/23864)
Line Coverage: 28.22% (70553/250051)
Region Coverage: 27.23% (36404/133703)
Branch Coverage: 24.02% (18647/77626)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5e0e28f37868fea0d4bceb71280f9ce8d64a4cac_5e0e28f37868fea0d4bceb71280f9ce8d64a4cac/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 181119 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5e0e28f37868fea0d4bceb71280f9ce8d64a4cac, data reload: false

query1	935	342	338	338
query2	6550	2058	1916	1916
query3	6703	206	200	200
query4	31760	22014	22068	22014
query5	4243	355	356	355
query6	253	165	164	164
query7	4612	282	288	282
query8	249	172	173	172
query9	8976	2257	2242	2242
query10	412	245	213	213
query11	17855	15495	15288	15288
query12	129	77	73	73
query13	1654	419	409	409
query14	9077	7134	6596	6596
query15	255	189	182	182
query16	8152	274	251	251
query17	1922	535	499	499
query18	2100	280	267	267
query19	360	146	142	142
query20	83	78	83	78
query21	210	123	124	123
query22	4692	4593	4579	4579
query23	30787	29935	30022	29935
query24	9915	2803	2763	2763
query25	546	340	334	334
query26	702	145	147	145
query27	2179	302	297	297
query28	5762	1811	1833	1811
query29	889	608	605	605
query30	277	132	141	132
query31	917	718	711	711
query32	93	59	51	51
query33	577	222	213	213
query34	821	461	476	461
query35	830	772	736	736
query36	943	972	940	940
query37	97	60	59	59
query38	3231	3096	3085	3085
query39	1307	1257	1243	1243
query40	179	94	91	91
query41	38	35	35	35
query42	101	88	90	88
query43	518	468	485	468
query44	1040	690	693	690
query45	194	182	174	174
query46	1031	649	644	644
query47	1588	1467	1531	1467
query48	427	350	362	350
query49	1044	290	283	283
query50	777	368	380	368
query51	5222	5132	5197	5132
query52	118	85	87	85
query53	336	263	273	263
query54	255	211	220	211
query55	78	72	78	72
query56	221	196	202	196
query57	998	922	908	908
query58	205	176	178	176
query59	2501	2299	2444	2299
query60	234	211	210	210
query61	82	81	87	81
query62	615	361	363	361
query63	310	281	261	261
query64	4674	3782	3540	3540
query65	3263	3235	3222	3222
query66	828	312	311	311
query67	14276	14228	13829	13829
query68	5412	534	547	534
query69	486	318	319	318
query70	1277	1238	1228	1228
query71	337	251	244	244
query72	6360	2813	2668	2668
query73	713	323	325	323
query74	6755	6319	6157	6157
query75	3046	2320	2296	2296
query76	2986	936	985	936
query77	350	249	239	239
query78	9425	8866	8532	8532
query79	3060	493	490	490
query80	2101	354	345	345
query81	537	197	199	197
query82	860	82	81	81
query83	240	127	126	126
query84	280	85	83	83
query85	2193	333	327	327
query86	488	308	300	300
query87	3404	3190	3216	3190
query88	4289	2359	2359	2359
query89	477	357	368	357
query90	1989	164	166	164
query91	150	119	120	119
query92	53	39	41	39
query93	5232	496	510	496
query94	1277	184	181	181
query95	8068	7805	7850	7805
query96	618	279	273	273
query97	4189	4106	4129	4106
query98	209	200	192	192
query99	1167	701	695	695
Total cold run time: 290236 ms
Total hot run time: 181119 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.03 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5e0e28f37868fea0d4bceb71280f9ce8d64a4cac, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.02	0.02
query3	0.24	0.07	0.06
query4	1.66	0.10	0.10
query5	0.52	0.52	0.51
query6	1.19	0.65	0.64
query7	0.01	0.01	0.01
query8	0.04	0.03	0.03
query9	0.57	0.49	0.51
query10	0.54	0.54	0.54
query11	0.12	0.09	0.08
query12	0.11	0.09	0.09
query13	0.61	0.61	0.60
query14	0.79	0.81	0.79
query15	0.79	0.78	0.76
query16	0.38	0.38	0.39
query17	1.03	0.99	0.99
query18	0.24	0.25	0.23
query19	1.80	1.77	1.80
query20	0.02	0.01	0.01
query21	15.41	0.56	0.56
query22	2.81	2.10	1.79
query23	17.26	0.80	0.82
query24	2.29	1.43	1.17
query25	0.37	0.07	0.19
query26	0.65	0.13	0.13
query27	0.05	0.05	0.05
query28	10.68	0.84	0.84
query29	12.53	3.10	3.17
query30	0.71	0.56	0.53
query31	2.79	0.34	0.36
query32	3.35	0.47	0.49
query33	3.19	3.22	3.25
query34	16.25	4.24	4.22
query35	4.33	4.31	4.29
query36	1.12	1.06	1.06
query37	0.07	0.05	0.04
query38	0.04	0.03	0.03
query39	0.02	0.01	0.02
query40	0.17	0.13	0.13
query41	0.07	0.02	0.02
query42	0.02	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 104.97 s
Total hot run time: 31.03 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 5e0e28f37868fea0d4bceb71280f9ce8d64a4cac with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       14.5 seconds inserted 10000000 Rows, about 689K ops/s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please add regression test case

DCHECK(buf != nullptr);
buf->set_upload_to_remote([this](UploadFileBuffer& b) { _put_object(b); });
} else {
// if there is no pending buffer, we need to create a empty file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if there is no pending buffer, we need to create a empty file
// if there is no pending buffer, we need to create an empty file

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zxealous
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.77% (8561/23935)
Line Coverage: 27.73% (69441/250434)
Region Coverage: 26.86% (36036/134185)
Branch Coverage: 23.66% (18428/77882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/eeb49199380956dc3627e38326923cc84f61815a_eeb49199380956dc3627e38326923cc84f61815a/report/index.html

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zxealous zxealous force-pushed the fix-outfile-empty-data branch from 918fc88 to bc04c05 Compare February 18, 2024 11:31
@zxealous
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.77% (8561/23935)
Line Coverage: 27.74% (69461/250434)
Region Coverage: 26.86% (36047/134185)
Branch Coverage: 23.67% (18433/77882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/bc04c055ddbcf59cdda5d4a61108d6523c438700_bc04c055ddbcf59cdda5d4a61108d6523c438700/report/index.html

@zxealous
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.77% (8561/23935)
Line Coverage: 27.73% (69450/250434)
Region Coverage: 26.85% (36031/134185)
Branch Coverage: 23.66% (18428/77882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/40cbdd920c4a1ef07ab3f25b1a5771da3b79057b_40cbdd920c4a1ef07ab3f25b1a5771da3b79057b/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 19, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@BePPPower BePPPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

zxealous added a commit to zxealous/doris that referenced this pull request Feb 19, 2024
@morningman morningman merged commit 99c3b5f into apache:master Feb 19, 2024
morningman pushed a commit to zxealous/doris that referenced this pull request Feb 19, 2024
kaijchen added a commit to kaijchen/doris that referenced this pull request Feb 20, 2024
@kaijchen
Copy link
Member

Hi @ zxealous, thank you for contributing to Apache Doris.

Unfortunately, this PR caused a problem in the cloud pipeline, see #31148.
I haven't found the exact reason right now.
Please take a look if you have time.

morningman pushed a commit to morningman/doris that referenced this pull request Feb 21, 2024
Issue Number: close apache#30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.
morningman pushed a commit to morningman/doris that referenced this pull request Feb 21, 2024
Issue Number: close apache#30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.
yiguolei pushed a commit that referenced this pull request Feb 21, 2024
…#28983 #30703 #31169 (#31213)

* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)

* [fix](outfile) Fix unable to export empty data (#30703)

Issue Number: close #30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.

* [fix](file-writer) avoid empty file for segment writer (#31169)

---------

Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>
@xiaokang xiaokang mentioned this pull request Feb 23, 2024
@zxealous zxealous deleted the fix-outfile-empty-data branch April 28, 2024 09:30
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.5-merged dev/2.1.0-merged kind/behavior-changed reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Version 2.0.4 Unable to export empty data to hdfs / S3

9 participants