Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jul 11, 2024

Proposed changes

Supports reading CSV data using LF and CRLF as line separators.

csv file:

1,abc
2,def\r
3,qwe
4,hello\r

if you set keep_carriage_return = false
you will get :

1   abc
2   def 
3   qwe
4   hello 

Here, both \r\n and \n are used as delimiters.

if you set keep_carriage_return = true
you will get :

1   abc
2   def\r
3   qwe
4   hello\r 

Here only \n is used as a delimiter.

warning

It should be noted that set keep_carriage_return = true is valid for tvf, but not for stream load/mysql load. This means that when you perform stream load/mysql load, crlf and lf will be automatically used as delimiters, even if you set keep_carriage_return = true.

Issue Number: close #xxx

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@hubgeter
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

[[nodiscard]] inline size_t line_delimiter_length() const final { return line_delimiter_len; }
[[nodiscard]] inline size_t line_delimiter_length() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: annotate this function with 'override' or (rarely) 'final' [modernize-use-override]

Suggested change
[[nodiscard]] inline size_t line_delimiter_length() const {
[[nodiscard]] inline size_t line_delimiter_length() const override {

@hubgeter hubgeter force-pushed the support_csv_crlf_lf branch from 4a05198 to 6fc9f08 Compare July 11, 2024 15:33
@hubgeter
Copy link
Contributor Author

run buildall

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40112 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7a951eaa70e9c1e53958bb8a6db118b6ea788df4, data reload: false

------ Round 1 ----------------------------------
q1	17836	4536	4301	4301
q2	2020	193	185	185
q3	10527	1187	1077	1077
q4	10197	864	837	837
q5	7572	2694	2656	2656
q6	220	138	141	138
q7	971	606	609	606
q8	9301	2127	2084	2084
q9	8801	6622	6600	6600
q10	8852	3752	3794	3752
q11	456	233	243	233
q12	396	226	226	226
q13	17779	3015	2999	2999
q14	272	238	245	238
q15	539	490	494	490
q16	495	396	380	380
q17	984	662	712	662
q18	8173	7563	7330	7330
q19	7359	1521	1436	1436
q20	663	333	324	324
q21	4867	3272	3924	3272
q22	367	286	290	286
Total cold run time: 118647 ms
Total hot run time: 40112 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4398	4272	4263	4263
q2	377	267	277	267
q3	3161	2891	2928	2891
q4	1929	1706	1767	1706
q5	5627	5552	5444	5444
q6	227	137	135	135
q7	2293	1860	1868	1860
q8	3324	3479	3451	3451
q9	8809	8826	8856	8826
q10	4152	3785	3880	3785
q11	605	510	493	493
q12	808	649	603	603
q13	17025	3154	3209	3154
q14	310	304	277	277
q15	541	489	478	478
q16	501	430	447	430
q17	1840	1562	1505	1505
q18	8121	8054	7778	7778
q19	1768	1458	1586	1458
q20	2233	1890	1870	1870
q21	5876	4898	4685	4685
q22	606	504	515	504
Total cold run time: 74531 ms
Total hot run time: 55863 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174019 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7a951eaa70e9c1e53958bb8a6db118b6ea788df4, data reload: false

query1	928	378	364	364
query2	6425	1889	1786	1786
query3	6638	205	219	205
query4	28124	17568	17302	17302
query5	3654	470	482	470
query6	266	170	167	167
query7	4580	297	281	281
query8	249	197	197	197
query9	8445	2368	2348	2348
query10	429	262	262	262
query11	11931	10019	10135	10019
query12	114	85	85	85
query13	1627	359	366	359
query14	10315	7549	7645	7549
query15	224	165	163	163
query16	7454	311	308	308
query17	1819	557	558	557
query18	1851	272	275	272
query19	198	157	157	157
query20	86	79	80	79
query21	198	124	123	123
query22	4374	4055	3944	3944
query23	34073	33714	33510	33510
query24	11088	2992	2917	2917
query25	579	388	391	388
query26	700	148	150	148
query27	2302	277	285	277
query28	6506	2021	2036	2021
query29	890	656	651	651
query30	253	155	157	155
query31	993	769	770	769
query32	97	51	55	51
query33	766	310	292	292
query34	997	500	515	500
query35	695	583	580	580
query36	1151	1003	956	956
query37	150	81	84	81
query38	3032	2868	2852	2852
query39	902	804	818	804
query40	207	127	120	120
query41	51	46	45	45
query42	118	102	108	102
query43	529	453	489	453
query44	1192	743	730	730
query45	201	173	159	159
query46	1102	714	733	714
query47	1862	1738	1801	1738
query48	363	294	299	294
query49	850	420	434	420
query50	772	398	393	393
query51	6952	6750	6824	6750
query52	104	94	93	93
query53	359	291	290	290
query54	873	449	467	449
query55	77	75	73	73
query56	314	288	303	288
query57	1143	1114	1111	1111
query58	271	271	271	271
query59	2781	2678	2599	2599
query60	323	297	298	297
query61	118	117	119	117
query62	812	643	641	641
query63	323	302	296	296
query64	9213	2292	7561	2292
query65	3150	3088	3101	3088
query66	766	344	341	341
query67	15408	15132	15031	15031
query68	5326	524	530	524
query69	717	462	358	358
query70	1164	1163	1165	1163
query71	479	285	281	281
query72	8675	5575	5569	5569
query73	769	325	327	325
query74	6090	5557	5552	5552
query75	4232	2680	2672	2672
query76	3741	937	945	937
query77	676	308	308	308
query78	12384	9478	8978	8978
query79	11404	517	568	517
query80	1354	475	476	475
query81	573	224	223	223
query82	513	134	137	134
query83	320	170	165	165
query84	274	85	84	84
query85	721	300	291	291
query86	460	311	299	299
query87	3336	3137	3099	3099
query88	5580	2469	2485	2469
query89	481	372	389	372
query90	1933	191	194	191
query91	131	99	99	99
query92	65	50	49	49
query93	4985	495	488	488
query94	1238	206	211	206
query95	412	322	323	322
query96	609	276	273	273
query97	3224	2986	3023	2986
query98	214	196	204	196
query99	1609	1270	1301	1270
Total cold run time: 299026 ms
Total hot run time: 174019 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.32 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7a951eaa70e9c1e53958bb8a6db118b6ea788df4, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.66	0.08	0.07
query5	0.50	0.48	0.48
query6	1.13	0.72	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.54	0.49	0.50
query10	0.55	0.55	0.54
query11	0.16	0.12	0.12
query12	0.14	0.12	0.13
query13	0.59	0.58	0.58
query14	0.75	0.78	0.77
query15	0.84	0.81	0.81
query16	0.36	0.36	0.36
query17	1.02	0.95	1.04
query18	0.22	0.22	0.22
query19	1.91	1.81	1.79
query20	0.02	0.01	0.00
query21	15.40	0.75	0.65
query22	4.72	6.88	1.63
query23	18.28	1.31	1.32
query24	2.11	0.23	0.22
query25	0.16	0.09	0.08
query26	0.29	0.21	0.21
query27	0.46	0.23	0.23
query28	13.34	1.01	1.00
query29	12.62	3.33	3.28
query30	0.26	0.06	0.06
query31	2.86	0.39	0.39
query32	3.28	0.47	0.48
query33	2.90	2.90	2.88
query34	17.15	4.34	4.37
query35	4.41	4.38	4.38
query36	0.66	0.46	0.46
query37	0.19	0.15	0.14
query38	0.16	0.15	0.14
query39	0.05	0.04	0.03
query40	0.15	0.11	0.12
query41	0.10	0.04	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 110.45 s
Total hot run time: 30.32 s

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39956 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e13733074bf4b1af26392d64aef141f2533dc3c7, data reload: false

------ Round 1 ----------------------------------
q1	18547	4405	4292	4292
q2	2025	193	188	188
q3	10527	1256	1138	1138
q4	10224	798	783	783
q5	7537	2681	2677	2677
q6	221	140	147	140
q7	980	596	612	596
q8	9217	2084	2103	2084
q9	8750	6588	6605	6588
q10	8838	3746	3830	3746
q11	456	237	239	237
q12	400	223	221	221
q13	17766	2980	2986	2980
q14	277	231	233	231
q15	521	481	467	467
q16	490	386	380	380
q17	989	616	711	616
q18	8097	7576	7380	7380
q19	5853	1499	1406	1406
q20	647	308	316	308
q21	4920	3217	3241	3217
q22	346	281	290	281
Total cold run time: 117628 ms
Total hot run time: 39956 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4456	4276	4243	4243
q2	381	282	278	278
q3	3023	2943	2918	2918
q4	1998	1691	1682	1682
q5	5707	5548	5492	5492
q6	227	137	156	137
q7	2198	1889	1855	1855
q8	3279	3439	3455	3439
q9	8733	8813	8908	8813
q10	4148	3776	3801	3776
q11	591	484	516	484
q12	826	662	619	619
q13	17243	3218	3203	3203
q14	314	284	274	274
q15	534	485	497	485
q16	504	440	456	440
q17	1837	1541	1497	1497
q18	8211	7892	7926	7892
q19	1722	1634	1580	1580
q20	2137	1876	1861	1861
q21	5169	4776	4723	4723
q22	623	547	520	520
Total cold run time: 73861 ms
Total hot run time: 56211 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174024 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e13733074bf4b1af26392d64aef141f2533dc3c7, data reload: false

query1	932	376	365	365
query2	6462	1885	1901	1885
query3	6630	204	216	204
query4	27922	17661	17235	17235
query5	3639	499	465	465
query6	257	200	169	169
query7	4554	290	291	290
query8	251	189	192	189
query9	8269	2350	2342	2342
query10	424	308	284	284
query11	11638	10084	10119	10084
query12	119	86	84	84
query13	1616	372	379	372
query14	10403	7828	7837	7828
query15	232	172	176	172
query16	7912	325	355	325
query17	1755	543	522	522
query18	2039	276	277	276
query19	196	151	145	145
query20	88	83	82	82
query21	202	149	126	126
query22	4415	4081	4072	4072
query23	34131	33698	33665	33665
query24	10951	2894	2970	2894
query25	590	406	387	387
query26	694	152	153	152
query27	2211	274	270	270
query28	5910	2017	2038	2017
query29	894	650	673	650
query30	249	149	150	149
query31	938	776	760	760
query32	93	52	54	52
query33	674	317	300	300
query34	886	487	515	487
query35	686	584	603	584
query36	1109	977	966	966
query37	152	88	94	88
query38	2935	2900	2847	2847
query39	872	858	858	858
query40	202	134	120	120
query41	79	47	52	47
query42	107	98	103	98
query43	503	458	449	449
query44	1136	729	740	729
query45	204	164	164	164
query46	1094	695	739	695
query47	1854	1783	1778	1778
query48	366	286	286	286
query49	837	406	415	406
query50	789	393	390	390
query51	6937	6833	6822	6822
query52	106	96	93	93
query53	362	291	294	291
query54	847	447	447	447
query55	73	74	72	72
query56	290	270	280	270
query57	1137	1080	1074	1074
query58	237	252	274	252
query59	2877	2687	2637	2637
query60	303	292	283	283
query61	132	95	96	95
query62	770	642	661	642
query63	317	298	289	289
query64	9160	2214	1664	1664
query65	3155	3109	3126	3109
query66	753	331	339	331
query67	15430	15066	15030	15030
query68	4489	528	530	528
query69	540	392	403	392
query70	1192	1109	1088	1088
query71	376	280	281	280
query72	7380	5562	5952	5562
query73	760	320	323	320
query74	5889	5594	5518	5518
query75	3401	2691	2690	2690
query76	2403	954	928	928
query77	469	306	315	306
query78	9827	9155	8927	8927
query79	3176	521	516	516
query80	1508	474	462	462
query81	557	224	226	224
query82	761	140	142	140
query83	193	171	173	171
query84	268	89	87	87
query85	1256	329	315	315
query86	460	328	286	286
query87	3323	3113	3090	3090
query88	4360	2439	2463	2439
query89	483	388	379	379
query90	1754	201	201	201
query91	142	117	111	111
query92	65	125	48	48
query93	4160	505	502	502
query94	1027	213	223	213
query95	414	323	327	323
query96	614	271	273	271
query97	3213	3033	3086	3033
query98	227	198	193	193
query99	1728	1276	1279	1276
Total cold run time: 280402 ms
Total hot run time: 174024 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e13733074bf4b1af26392d64aef141f2533dc3c7, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.03
query3	0.23	0.06	0.05
query4	1.69	0.07	0.06
query5	0.51	0.48	0.48
query6	1.16	0.73	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.49
query10	0.54	0.55	0.55
query11	0.15	0.12	0.11
query12	0.14	0.11	0.13
query13	0.59	0.59	0.58
query14	0.75	0.79	0.76
query15	0.85	0.82	0.81
query16	0.35	0.36	0.38
query17	1.05	1.01	1.01
query18	0.24	0.22	0.21
query19	1.91	1.75	1.73
query20	0.01	0.00	0.01
query21	15.40	0.73	0.65
query22	3.66	7.19	2.85
query23	18.25	1.41	1.24
query24	2.15	0.24	0.22
query25	0.16	0.09	0.09
query26	0.29	0.21	0.21
query27	0.45	0.22	0.23
query28	13.22	1.01	1.00
query29	12.59	3.34	3.29
query30	0.25	0.06	0.05
query31	2.89	0.39	0.40
query32	3.26	0.50	0.47
query33	2.88	2.99	2.91
query34	17.07	4.34	4.34
query35	4.40	4.40	4.38
query36	0.65	0.49	0.47
query37	0.20	0.15	0.15
query38	0.16	0.15	0.15
query39	0.04	0.04	0.04
query40	0.15	0.12	0.13
query41	0.09	0.05	0.04
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.22 s
Total hot run time: 31.51 s

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40221 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f71f19c9026e0c5e378abcba172bf3d4f4c0591b, data reload: false

------ Round 1 ----------------------------------
q1	17869	4520	4260	4260
q2	2017	191	184	184
q3	10630	1183	1126	1126
q4	10213	843	842	842
q5	7558	2817	2732	2732
q6	223	138	141	138
q7	960	612	606	606
q8	9225	2078	2109	2078
q9	8820	6585	6590	6585
q10	8753	3829	3826	3826
q11	487	235	240	235
q12	394	228	220	220
q13	17754	2993	2983	2983
q14	270	229	244	229
q15	522	493	479	479
q16	499	387	376	376
q17	975	727	714	714
q18	8128	7608	7432	7432
q19	6022	1414	1440	1414
q20	671	316	326	316
q21	4881	3164	3271	3164
q22	341	282	285	282
Total cold run time: 117212 ms
Total hot run time: 40221 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4446	4303	4284	4284
q2	387	259	259	259
q3	3008	2956	2936	2936
q4	2072	1731	1726	1726
q5	5665	5558	5497	5497
q6	227	132	134	132
q7	2211	1856	1862	1856
q8	3305	3457	3464	3457
q9	8843	8794	8948	8794
q10	4097	3955	3734	3734
q11	596	487	488	487
q12	814	643	619	619
q13	15863	3168	3186	3168
q14	322	277	290	277
q15	528	490	487	487
q16	496	454	435	435
q17	1811	1529	1501	1501
q18	8088	7983	7723	7723
q19	1794	1644	1599	1599
q20	2143	1936	1859	1859
q21	7895	4764	4861	4764
q22	563	508	523	508
Total cold run time: 75174 ms
Total hot run time: 56102 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172871 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f71f19c9026e0c5e378abcba172bf3d4f4c0591b, data reload: false

query1	930	368	358	358
query2	6483	1868	1867	1867
query3	6636	208	217	208
query4	28346	17513	17076	17076
query5	3660	493	475	475
query6	261	179	149	149
query7	4573	292	283	283
query8	242	202	199	199
query9	8537	2374	2364	2364
query10	440	279	265	265
query11	10547	9977	10079	9977
query12	118	81	86	81
query13	1641	373	369	369
query14	10256	7820	7741	7741
query15	226	169	174	169
query16	7679	320	323	320
query17	1799	570	536	536
query18	1944	283	287	283
query19	195	152	158	152
query20	104	84	85	84
query21	210	132	133	132
query22	4151	4000	4041	4000
query23	33944	33685	33675	33675
query24	10691	2972	2927	2927
query25	615	439	400	400
query26	713	162	155	155
query27	2136	274	280	274
query28	6041	2062	2071	2062
query29	908	682	636	636
query30	256	159	161	159
query31	982	802	767	767
query32	95	61	59	59
query33	702	326	308	308
query34	889	503	518	503
query35	702	580	617	580
query36	1146	1006	1014	1006
query37	148	80	85	80
query38	2961	2887	2895	2887
query39	991	857	858	857
query40	210	117	116	116
query41	47	44	45	44
query42	114	104	98	98
query43	495	477	479	477
query44	1114	731	729	729
query45	192	159	161	159
query46	1088	733	720	720
query47	1872	1754	1789	1754
query48	372	305	291	291
query49	821	394	410	394
query50	778	389	393	389
query51	6908	6817	6661	6661
query52	100	94	95	94
query53	352	290	293	290
query54	867	445	440	440
query55	77	73	76	73
query56	298	267	310	267
query57	1123	1084	1053	1053
query58	250	241	246	241
query59	2896	2649	2664	2649
query60	296	279	277	277
query61	98	91	92	91
query62	826	643	619	619
query63	314	286	289	286
query64	9125	2217	1669	1669
query65	3172	3123	3080	3080
query66	741	335	326	326
query67	15561	14771	14923	14771
query68	7627	530	531	530
query69	770	433	343	343
query70	1269	1075	1165	1075
query71	512	282	285	282
query72	8834	6126	5111	5111
query73	816	324	325	324
query74	5975	5622	5704	5622
query75	4497	2717	2657	2657
query76	4629	920	942	920
query77	785	297	305	297
query78	9739	9722	8969	8969
query79	8370	529	517	517
query80	1064	478	481	478
query81	576	215	224	215
query82	708	137	132	132
query83	339	168	163	163
query84	270	86	86	86
query85	1280	324	296	296
query86	410	321	330	321
query87	3316	3115	3130	3115
query88	4761	2517	2468	2468
query89	492	392	397	392
query90	2049	198	187	187
query91	130	101	99	99
query92	64	48	52	48
query93	6308	507	504	504
query94	1297	214	216	214
query95	404	319	311	311
query96	623	276	275	275
query97	3170	3003	3040	3003
query98	223	203	197	197
query99	1547	1286	1239	1239
Total cold run time: 295859 ms
Total hot run time: 172871 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f71f19c9026e0c5e378abcba172bf3d4f4c0591b, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.23	0.05	0.05
query4	1.66	0.11	0.10
query5	0.50	0.48	0.50
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.05	0.05
query9	0.55	0.47	0.50
query10	0.53	0.53	0.53
query11	0.15	0.12	0.12
query12	0.15	0.13	0.12
query13	0.59	0.58	0.60
query14	0.77	0.77	0.78
query15	0.86	0.81	0.83
query16	0.34	0.36	0.37
query17	0.99	0.96	0.99
query18	0.23	0.22	0.22
query19	1.80	1.71	1.78
query20	0.01	0.02	0.01
query21	15.41	0.74	0.67
query22	4.02	6.68	2.44
query23	18.34	1.43	1.33
query24	2.14	0.22	0.22
query25	0.16	0.08	0.09
query26	0.29	0.21	0.20
query27	0.46	0.23	0.23
query28	13.27	1.02	1.00
query29	12.63	3.35	3.32
query30	0.25	0.06	0.06
query31	2.87	0.41	0.39
query32	3.27	0.48	0.46
query33	2.94	2.90	2.93
query34	16.88	4.44	4.39
query35	4.44	4.39	4.47
query36	0.65	0.47	0.48
query37	0.20	0.16	0.17
query38	0.15	0.15	0.14
query39	0.04	0.04	0.04
query40	0.15	0.12	0.12
query41	0.09	0.05	0.04
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 109.44 s
Total hot run time: 31.28 s

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39651 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 38eb0b76cd5f5fb8d00b8af491d243e15bdfed93, data reload: false

------ Round 1 ----------------------------------
q1	18927	7075	4291	4291
q2	2016	192	191	191
q3	10580	1167	1055	1055
q4	10359	756	827	756
q5	7637	2729	2674	2674
q6	221	138	137	137
q7	958	602	604	602
q8	9211	2057	2101	2057
q9	8781	6515	6562	6515
q10	8627	3787	3753	3753
q11	446	228	240	228
q12	393	228	229	228
q13	18839	2969	3020	2969
q14	285	235	245	235
q15	529	466	476	466
q16	507	404	379	379
q17	973	629	659	629
q18	8037	7446	7433	7433
q19	6548	1379	1285	1285
q20	653	319	330	319
q21	4928	3266	3176	3176
q22	343	285	273	273
Total cold run time: 119798 ms
Total hot run time: 39651 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4393	4230	4226	4226
q2	370	279	298	279
q3	3173	2983	2921	2921
q4	2023	1781	1662	1662
q5	5569	5482	5584	5482
q6	233	134	134	134
q7	2232	1816	1821	1816
q8	3295	3420	3405	3405
q9	8788	8985	8770	8770
q10	4041	3777	3869	3777
q11	609	503	492	492
q12	811	632	628	628
q13	16199	3149	3155	3149
q14	304	273	302	273
q15	540	479	499	479
q16	473	441	422	422
q17	1809	1519	1524	1519
q18	8212	7932	7847	7847
q19	1729	1693	1628	1628
q20	2236	1887	1853	1853
q21	5094	4863	4867	4863
q22	596	537	516	516
Total cold run time: 72729 ms
Total hot run time: 56141 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173674 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 38eb0b76cd5f5fb8d00b8af491d243e15bdfed93, data reload: false

query1	914	383	369	369
query2	6402	1900	1838	1838
query3	6637	208	215	208
query4	28232	17345	17319	17319
query5	3628	487	495	487
query6	281	179	164	164
query7	4590	297	289	289
query8	245	194	193	193
query9	8491	2339	2336	2336
query10	428	279	274	274
query11	10678	9924	10039	9924
query12	123	81	81	81
query13	1640	360	360	360
query14	10190	7794	7604	7604
query15	225	170	169	169
query16	7715	315	310	310
query17	1524	569	532	532
query18	1823	282	284	282
query19	196	156	158	156
query20	91	84	80	80
query21	213	131	131	131
query22	4321	4047	3955	3955
query23	34064	33711	33743	33711
query24	11246	2930	2982	2930
query25	633	418	413	413
query26	854	162	160	160
query27	2351	289	285	285
query28	6715	2040	2018	2018
query29	936	665	692	665
query30	263	160	162	160
query31	946	744	761	744
query32	102	59	61	59
query33	774	334	312	312
query34	896	515	510	510
query35	741	590	616	590
query36	1144	1014	994	994
query37	151	85	89	85
query38	2989	2864	2841	2841
query39	952	862	843	843
query40	224	125	123	123
query41	50	45	47	45
query42	116	99	95	95
query43	502	455	476	455
query44	1247	735	745	735
query45	195	164	165	164
query46	1106	730	713	713
query47	1826	1767	1746	1746
query48	376	313	296	296
query49	855	420	441	420
query50	776	393	396	393
query51	6980	6790	6779	6779
query52	107	94	94	94
query53	371	302	302	302
query54	891	462	458	458
query55	80	76	76	76
query56	317	286	301	286
query57	1083	1040	1045	1040
query58	247	257	296	257
query59	2804	2721	2679	2679
query60	326	298	302	298
query61	122	115	119	115
query62	792	658	646	646
query63	327	391	292	292
query64	9158	2223	1671	1671
query65	3319	3126	3120	3120
query66	764	334	332	332
query67	15509	14965	14802	14802
query68	8111	539	550	539
query69	783	481	363	363
query70	1239	1155	1123	1123
query71	530	289	292	289
query72	9272	5532	5653	5532
query73	831	323	324	323
query74	6085	5691	5633	5633
query75	4924	2716	2753	2716
query76	4764	893	921	893
query77	772	320	312	312
query78	9678	9244	8900	8900
query79	10428	538	528	528
query80	1139	487	490	487
query81	583	226	219	219
query82	771	137	134	134
query83	328	170	163	163
query84	275	89	84	84
query85	1350	343	303	303
query86	404	311	295	295
query87	3324	3140	3141	3140
query88	4795	2470	2434	2434
query89	554	389	376	376
query90	2090	197	199	197
query91	130	107	101	101
query92	66	51	49	49
query93	6877	512	510	510
query94	1319	213	221	213
query95	414	310	325	310
query96	621	279	273	273
query97	3205	2977	2991	2977
query98	222	217	233	217
query99	1553	1228	1301	1228
Total cold run time: 302251 ms
Total hot run time: 173674 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 38eb0b76cd5f5fb8d00b8af491d243e15bdfed93, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.06	0.07
query4	1.64	0.09	0.09
query5	0.49	0.50	0.47
query6	1.14	0.72	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.55	0.49	0.50
query10	0.55	0.54	0.55
query11	0.15	0.11	0.11
query12	0.14	0.13	0.12
query13	0.59	0.58	0.58
query14	0.77	0.75	0.78
query15	0.86	0.83	0.82
query16	0.36	0.36	0.36
query17	1.01	0.98	0.94
query18	0.22	0.21	0.22
query19	1.78	1.69	1.69
query20	0.01	0.01	0.01
query21	15.40	0.77	0.66
query22	4.76	7.47	1.40
query23	18.20	1.36	1.34
query24	2.13	0.22	0.22
query25	0.14	0.08	0.08
query26	0.30	0.21	0.21
query27	0.46	0.23	0.23
query28	13.34	1.03	1.02
query29	12.65	3.30	3.27
query30	0.24	0.06	0.06
query31	2.87	0.40	0.38
query32	3.27	0.47	0.47
query33	2.82	2.98	2.92
query34	17.26	4.37	4.35
query35	4.43	4.44	4.41
query36	0.66	0.48	0.49
query37	0.18	0.16	0.16
query38	0.16	0.15	0.14
query39	0.05	0.03	0.03
query40	0.15	0.12	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 110.34 s
Total hot run time: 30.17 s

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39877 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7a7cde2c2f05a7b1df265d3f8b9bdd045f2f7833, data reload: false

------ Round 1 ----------------------------------
q1	18849	6963	4286	4286
q2	2025	193	190	190
q3	10591	1142	1100	1100
q4	10356	818	767	767
q5	7552	2724	2683	2683
q6	221	143	143	143
q7	957	611	607	607
q8	9213	2090	2058	2058
q9	8601	6549	6515	6515
q10	8802	3831	3795	3795
q11	452	233	243	233
q12	393	227	230	227
q13	17774	2959	3007	2959
q14	276	239	240	239
q15	526	476	476	476
q16	493	388	385	385
q17	974	705	711	705
q18	7984	7488	7300	7300
q19	6643	1452	1424	1424
q20	670	329	324	324
q21	4910	3183	3232	3183
q22	341	288	278	278
Total cold run time: 118603 ms
Total hot run time: 39877 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4412	4253	4275	4253
q2	365	275	264	264
q3	3203	2942	2943	2942
q4	2017	1698	1704	1698
q5	5574	5494	5588	5494
q6	224	137	139	137
q7	2275	1866	1853	1853
q8	3289	3423	3434	3423
q9	8819	9001	8772	8772
q10	4013	3809	3846	3809
q11	595	510	503	503
q12	791	610	642	610
q13	16144	3182	3173	3173
q14	327	294	304	294
q15	564	477	499	477
q16	509	444	444	444
q17	1796	1512	1513	1512
q18	8106	8001	7716	7716
q19	1748	1595	1596	1595
q20	2946	1860	1869	1860
q21	5085	4771	4845	4771
q22	567	514	484	484
Total cold run time: 73369 ms
Total hot run time: 56084 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174213 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7a7cde2c2f05a7b1df265d3f8b9bdd045f2f7833, data reload: false

query1	926	381	374	374
query2	6473	1877	1809	1809
query3	6631	204	213	204
query4	28124	17788	17343	17343
query5	3773	475	481	475
query6	277	176	163	163
query7	4578	294	275	275
query8	233	198	194	194
query9	8456	2397	2390	2390
query10	452	278	274	274
query11	10873	10130	9925	9925
query12	114	85	86	85
query13	1641	361	366	361
query14	10379	7062	7796	7062
query15	223	166	165	165
query16	7363	322	325	322
query17	1309	530	517	517
query18	1834	276	275	275
query19	196	144	149	144
query20	91	78	82	78
query21	211	135	128	128
query22	4299	4179	3992	3992
query23	34042	33970	33707	33707
query24	9247	2916	2920	2916
query25	596	400	415	400
query26	711	156	151	151
query27	2189	276	284	276
query28	6077	2087	2096	2087
query29	908	643	666	643
query30	290	152	155	152
query31	1012	774	762	762
query32	95	54	58	54
query33	634	293	302	293
query34	919	509	503	503
query35	677	618	625	618
query36	1149	996	985	985
query37	148	92	93	92
query38	2930	2843	2837	2837
query39	901	858	878	858
query40	221	130	127	127
query41	49	47	47	47
query42	121	106	103	103
query43	506	463	456	456
query44	1138	731	735	731
query45	197	171	166	166
query46	1092	765	736	736
query47	1865	1773	1756	1756
query48	375	296	290	290
query49	867	424	424	424
query50	774	390	390	390
query51	6878	6754	6829	6754
query52	100	93	92	92
query53	355	287	300	287
query54	885	467	481	467
query55	80	79	78	78
query56	309	283	285	283
query57	1127	1042	1082	1042
query58	260	263	260	260
query59	2961	2651	2531	2531
query60	322	295	289	289
query61	117	115	116	115
query62	801	630	640	630
query63	330	291	297	291
query64	9240	2301	1750	1750
query65	3180	3100	3092	3092
query66	733	343	350	343
query67	15548	14961	15054	14961
query68	5660	548	563	548
query69	710	451	377	377
query70	1200	1134	1120	1120
query71	450	305	298	298
query72	9266	5536	5546	5536
query73	769	336	323	323
query74	6114	5667	5685	5667
query75	4429	2717	2730	2717
query76	3572	962	903	903
query77	681	302	318	302
query78	9756	14506	9893	9893
query79	6557	531	529	529
query80	1677	483	477	477
query81	596	225	225	225
query82	361	143	138	138
query83	314	209	166	166
query84	281	85	92	85
query85	712	324	304	304
query86	468	330	306	306
query87	3369	3068	3104	3068
query88	4127	2380	2398	2380
query89	483	404	391	391
query90	1992	198	200	198
query91	131	99	102	99
query92	68	54	51	51
query93	1290	523	516	516
query94	1370	219	212	212
query95	443	330	320	320
query96	593	275	276	275
query97	3209	2986	2992	2986
query98	217	206	192	192
query99	1567	1246	1231	1231
Total cold run time: 283844 ms
Total hot run time: 174213 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.23 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7a7cde2c2f05a7b1df265d3f8b9bdd045f2f7833, data reload: false

query1	0.04	0.04	0.02
query2	0.08	0.04	0.04
query3	0.22	0.06	0.06
query4	1.66	0.09	0.09
query5	0.50	0.50	0.50
query6	1.13	0.73	0.74
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.47
query10	0.55	0.55	0.54
query11	0.15	0.12	0.11
query12	0.15	0.12	0.12
query13	0.60	0.59	0.59
query14	0.76	0.79	0.76
query15	0.86	0.82	0.81
query16	0.37	0.37	0.37
query17	1.03	1.06	1.00
query18	0.23	0.23	0.22
query19	1.84	1.70	1.69
query20	0.01	0.01	0.02
query21	15.40	0.73	0.65
query22	4.04	7.33	2.18
query23	18.32	1.41	1.84
query24	2.17	0.23	0.23
query25	0.16	0.09	0.09
query26	0.31	0.22	0.22
query27	0.46	0.24	0.24
query28	13.23	1.04	1.00
query29	12.59	3.31	3.31
query30	0.25	0.06	0.06
query31	2.89	0.41	0.40
query32	3.23	0.49	0.47
query33	2.87	2.94	2.95
query34	16.97	4.37	4.39
query35	4.40	4.42	4.39
query36	0.65	0.50	0.50
query37	0.19	0.16	0.16
query38	0.16	0.16	0.15
query39	0.04	0.03	0.04
query40	0.15	0.12	0.12
query41	0.08	0.04	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.05
Total cold run time: 109.45 s
Total hot run time: 31.23 s

@morningman
Copy link
Contributor

run p0

@morningman
Copy link
Contributor

run p1

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 16, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@Hastyshell Hastyshell self-requested a review July 17, 2024 12:19
Copy link
Contributor

@suxiaogang223 suxiaogang223 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Hastyshell Hastyshell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 5c3d39c into apache:master Jul 18, 2024
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 18, 2024
…arators. (apache#37687)

Supports reading CSV data using LF and CRLF as line separators.

csv file:
```
1,abc
2,def\r
3,qwe
4,hello\r
```
if you  `set keep_carriage_return = false`
you will get :
```mysql
1   abc
2   def
3   qwe
4   hello
```
Here, both \r\n and \n are used as delimiters.

if you  `set keep_carriage_return = true`
you will get :
```mysql
1   abc
2   def\r
3   qwe
4   hello\r
```
Here only \n is used as a delimiter.

It should be noted that `set keep_carriage_return = true` is valid for
tvf, but not for stream load/mysql load. This means that when you
perform stream load/mysql load, crlf and lf will be automatically used
as delimiters, even if you `set keep_carriage_return = true`.
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 18, 2024
…arators. (apache#37687)

## Proposed changes

Supports reading CSV data using LF and CRLF as line separators.

csv file:
```
1,abc
2,def\r
3,qwe
4,hello\r
```
if you  `set keep_carriage_return = false`
you will get : 
```mysql
1   abc
2   def 
3   qwe
4   hello 
```
Here, both \r\n and \n are used as delimiters.

if you  `set keep_carriage_return = true`
you will get : 
```mysql
1   abc
2   def\r
3   qwe
4   hello\r 
```
Here only \n is used as a delimiter.

## warning
It should be noted that `set keep_carriage_return = true` is valid for
tvf, but not for stream load/mysql load. This means that when you
perform stream load/mysql load, crlf and lf will be automatically used
as delimiters, even if you `set keep_carriage_return = true`.
morningman pushed a commit to morningman/doris that referenced this pull request Jul 23, 2024
…arators. (apache#37687)

## Proposed changes

Supports reading CSV data using LF and CRLF as line separators.

csv file:
```
1,abc
2,def\r
3,qwe
4,hello\r
```
if you  `set keep_carriage_return = false`
you will get : 
```mysql
1   abc
2   def 
3   qwe
4   hello 
```
Here, both \r\n and \n are used as delimiters.

if you  `set keep_carriage_return = true`
you will get : 
```mysql
1   abc
2   def\r
3   qwe
4   hello\r 
```
Here only \n is used as a delimiter.

## warning
It should be noted that `set keep_carriage_return = true` is valid for
tvf, but not for stream load/mysql load. This means that when you
perform stream load/mysql load, crlf and lf will be automatically used
as delimiters, even if you `set keep_carriage_return = true`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.2-merged meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants