Skip to content

Conversation

@seawinde
Copy link
Contributor

@seawinde seawinde commented Nov 12, 2023

Proposed changes

Infer name if it is an expression and doesn't alias artificially when select outfile stmt in nereids.
The infer name strategy is the same as #24990.

Disable infer name when query, because it make wrong in BI query scene.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 64ada780f0d6eb02b1204a6beeb2580c0cd6f0d6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5192	5025	5020	5020
q2	371	180	199	180
q3	2076	2038	2039	2038
q4	1455	1441	1441	1441
q5	4117	4127	4104	4104
q6	252	137	133	133
q7	2074	1602	1617	1602
q8	2753	2738	2746	2738
q9	10393	10289	10217	10217
q10	3482	3569	3546	3546
q11	374	252	255	252
q12	462	295	292	292
q13	4519	4137	4093	4093
q14	331	291	284	284
q15	669	567	569	567
q16	705	628	590	590
q17	1137	1080	1076	1076
q18	7736	7439	7349	7349
q19	1682	1693	1670	1670
q20	585	372	359	359
q21	4897	4545	4528	4528
q22	538	437	433	433
Total cold run time: 55800 ms
Total hot run time: 52512 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5015	4983	4986	4983
q2	327	241	237	237
q3	4019	3926	3944	3926
q4	2773	2729	2747	2729
q5	6677	6586	6680	6586
q6	244	125	124	124
q7	3182	2719	2686	2686
q8	4823	4814	4815	4814
q9	17740	17657	17600	17600
q10	4084	4155	4172	4155
q11	758	658	642	642
q12	992	813	845	813
q13	4265	3924	3895	3895
q14	404	375	352	352
q15	642	570	579	570
q16	789	694	708	694
q17	3886	3859	3939	3859
q18	9283	9106	9217	9106
q19	1809	1787	1791	1787
q20	2407	2057	2047	2047
q21	8683	8543	8689	8543
q22	969	864	862	862
Total cold run time: 83771 ms
Total hot run time: 81010 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.72 seconds
stream load tsv: 553 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162419737 Bytes

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@seawinde seawinde force-pushed the file_sink_support_infer_column_name branch from f9aac53 to 64ada78 Compare November 13, 2023 03:46
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 34404ec7b4b26c5c3849f6c249d10eec6e05ed6b, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5285	5020	5051	5020
q2	368	177	203	177
q3	2090	2049	2049	2049
q4	1505	1430	1437	1430
q5	4092	4130	4087	4087
q6	251	130	130	130
q7	1518	933	954	933
q8	2833	2845	2824	2824
q9	9798	9631	9461	9461
q10	3528	3588	3580	3580
q11	391	280	263	263
q12	458	301	295	295
q13	4530	4135	4154	4135
q14	324	288	284	284
q15	586	559	551	551
q16	695	589	606	589
q17	1152	1106	1072	1072
q18	8008	7386	7551	7386
q19	1706	1700	1727	1700
q20	604	391	357	357
q21	4990	4603	4660	4603
q22	530	430	423	423
Total cold run time: 55242 ms
Total hot run time: 51349 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5029	5025	4947	4947
q2	354	222	236	222
q3	4031	4022	4022	4022
q4	2823	2789	2769	2769
q5	9552	9547	9474	9474
q6	249	123	126	123
q7	3051	2526	2544	2526
q8	4781	4773	4756	4756
q9	13035	12784	12770	12770
q10	4118	4206	4184	4184
q11	773	657	642	642
q12	1011	811	795	795
q13	4303	3910	3905	3905
q14	383	361	353	353
q15	584	546	554	546
q16	751	731	701	701
q17	3939	3887	3891	3887
q18	9577	9431	9466	9431
q19	1908	1816	1788	1788
q20	2404	2082	2049	2049
q21	8953	8896	8717	8717
q22	960	898	917	898
Total cold run time: 82569 ms
Total hot run time: 79505 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.46 seconds
stream load tsv: 557 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162289627 Bytes

@seawinde seawinde force-pushed the file_sink_support_infer_column_name branch from 34404ec to 8bd8fce Compare November 20, 2023 14:01
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 8bd8fce26da0b905a4ff6afd95a161082dc72bdb, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4926	4649	4649	4649
q2	357	173	159	159
q3	2028	1923	1898	1898
q4	1391	1247	1231	1231
q5	4009	3987	4045	3987
q6	249	128	129	128
q7	1407	899	899	899
q8	2753	2808	2773	2773
q9	9729	9809	9945	9809
q10	3496	3551	3560	3551
q11	373	249	240	240
q12	440	292	295	292
q13	4585	3821	3830	3821
q14	311	278	283	278
q15	595	531	536	531
q16	663	590	578	578
q17	1133	950	935	935
q18	7767	7310	7294	7294
q19	1671	1681	1672	1672
q20	562	301	318	301
q21	4392	3959	4003	3959
q22	474	370	366	366
Total cold run time: 53311 ms
Total hot run time: 49351 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4583	4582	4574	4574
q2	327	241	273	241
q3	4035	4008	4023	4008
q4	2705	2688	2687	2687
q5	9731	9855	9827	9827
q6	247	122	122	122
q7	3026	2475	2460	2460
q8	4470	4465	4485	4465
q9	13231	13045	13079	13045
q10	4111	4221	4197	4197
q11	751	663	642	642
q12	987	829	806	806
q13	4280	3558	3548	3548
q14	383	351	351	351
q15	567	520	519	519
q16	724	652	690	652
q17	3920	3811	3891	3811
q18	9642	8863	8983	8863
q19	1823	1767	1789	1767
q20	2388	2092	2044	2044
q21	8821	8572	8636	8572
q22	873	765	766	765
Total cold run time: 81625 ms
Total hot run time: 77966 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.84 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17099753522 Bytes

@seawinde seawinde closed this Nov 21, 2023
@seawinde seawinde force-pushed the file_sink_support_infer_column_name branch from 8bd8fce to 840f3b6 Compare November 21, 2023 02:46
@seawinde seawinde reopened this Nov 21, 2023
@seawinde
Copy link
Contributor Author

run buildall

2 similar comments
@seawinde
Copy link
Contributor Author

run buildall

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit c1aa544dccb973155d421960e7e5384682c6b1b6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4930	4669	4668	4668
q2	365	154	159	154
q3	2048	1926	1927	1926
q4	1389	1288	1259	1259
q5	3904	3922	3999	3922
q6	246	130	132	130
q7	1407	898	897	897
q8	2739	2776	2749	2749
q9	9784	9531	9478	9478
q10	10270	3543	3534	3534
q11	378	244	244	244
q12	443	290	287	287
q13	4559	3818	3800	3800
q14	332	301	280	280
q15	610	546	537	537
q16	665	589	580	580
q17	1127	944	925	925
q18	7717	7358	7352	7352
q19	1649	1685	1685	1685
q20	578	300	291	291
q21	4393	3974	3993	3974
q22	470	370	367	367
Total cold run time: 60003 ms
Total hot run time: 49039 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4576	4610	4573	4573
q2	332	222	259	222
q3	4037	3996	3997	3996
q4	2704	2685	2692	2685
q5	9760	9719	9727	9719
q6	240	124	124	124
q7	3009	2476	2465	2465
q8	4428	4407	4448	4407
q9	13220	13195	13082	13082
q10	4093	4191	4168	4168
q11	812	645	675	645
q12	974	811	804	804
q13	4284	3557	3577	3557
q14	378	346	360	346
q15	574	521	521	521
q16	734	671	658	658
q17	3811	3934	3902	3902
q18	9521	9106	9094	9094
q19	1803	1765	1758	1758
q20	2430	2093	2049	2049
q21	8830	8659	8516	8516
q22	869	810	766	766
Total cold run time: 81419 ms
Total hot run time: 78057 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.44 seconds
stream load tsv: 574 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 27.9 seconds inserted 10000000 Rows, about 358K ops/s
storage size: 17099406237 Bytes

@seawinde seawinde force-pushed the file_sink_support_infer_column_name branch from 5bcbaf5 to 609dd7b Compare November 22, 2023 08:53
@seawinde seawinde force-pushed the file_sink_support_infer_column_name branch from 609dd7b to 6ee45c4 Compare November 22, 2023 08:54
@seawinde
Copy link
Contributor Author

run buildall

@seawinde seawinde changed the title [opt](nereids) infer result column name in select outfile stmt [opt](nereids) infer result column name in select outfile stmt and disable infer name when query Nov 22, 2023
@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.96 seconds
stream load tsv: 573 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17100131251 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 6ee45c4ded3278045937d9a3beddc6ba788658b0, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4869	4638	4709	4638
q2	356	150	159	150
q3	2029	1925	1871	1871
q4	1379	1258	1230	1230
q5	3958	3892	3988	3892
q6	247	131	128	128
q7	1405	867	891	867
q8	2718	2757	2740	2740
q9	9727	9546	9570	9546
q10	3450	3528	3517	3517
q11	379	232	245	232
q12	428	286	293	286
q13	4549	3815	3804	3804
q14	329	289	282	282
q15	586	539	525	525
q16	668	581	580	580
q17	1130	941	920	920
q18	7761	7319	7297	7297
q19	1660	1696	1678	1678
q20	551	312	303	303
q21	4401	3933	3958	3933
q22	468	362	368	362
Total cold run time: 53048 ms
Total hot run time: 48781 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4581	4572	4563	4563
q2	332	225	267	225
q3	4017	3982	3999	3982
q4	2705	2693	2688	2688
q5	9706	9562	9664	9562
q6	242	124	121	121
q7	3030	2489	2493	2489
q8	4447	4433	4433	4433
q9	13132	13023	13069	13023
q10	4073	4171	4178	4171
q11	738	646	651	646
q12	979	806	823	806
q13	4284	3592	3601	3592
q14	381	353	347	347
q15	572	528	530	528
q16	725	679	681	679
q17	3853	3922	3810	3810
q18	9385	9045	9096	9045
q19	1804	1766	1780	1766
q20	2386	2074	2052	2052
q21	8783	8725	8419	8419
q22	919	818	770	770
Total cold run time: 81074 ms
Total hot run time: 77717 ms

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 14cea6a4b11ad1e3605659a827e120d24f376390, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4948	4656	4674	4656
q2	359	152	150	150
q3	2041	1933	1919	1919
q4	1406	1260	1276	1260
q5	3967	3949	4039	3949
q6	269	131	135	131
q7	1443	896	889	889
q8	2804	2805	2784	2784
q9	10280	9493	9654	9493
q10	3454	3504	3507	3504
q11	379	252	250	250
q12	439	291	294	291
q13	4580	3835	3837	3835
q14	322	294	289	289
q15	585	534	535	534
q16	660	588	584	584
q17	1141	967	918	918
q18	7918	7378	7499	7378
q19	1668	1674	1690	1674
q20	552	317	339	317
q21	4395	3984	4020	3984
q22	476	390	373	373
Total cold run time: 54086 ms
Total hot run time: 49162 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4591	4578	4577	4577
q2	341	220	237	220
q3	4019	4001	4005	4001
q4	2708	2690	2696	2690
q5	9527	9592	9603	9592
q6	251	124	125	124
q7	3044	2514	2478	2478
q8	4469	4485	4456	4456
q9	12926	12735	12888	12735
q10	4078	4142	4157	4142
q11	802	648	629	629
q12	971	804	810	804
q13	4279	3588	3601	3588
q14	375	346	346	346
q15	559	533	529	529
q16	740	692	685	685
q17	3813	3943	3897	3897
q18	9653	9163	9021	9021
q19	1836	1782	1791	1782
q20	2370	2071	2039	2039
q21	8859	8713	8684	8684
q22	864	792	861	792
Total cold run time: 81075 ms
Total hot run time: 77811 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.28 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17098593884 Bytes

@morrySnow morrySnow marked this pull request as draft November 28, 2023 02:24
@morrySnow morrySnow closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants