Skip to content

Conversation

@DongLiang-0
Copy link
Contributor

Proposed changes

avro-jni scanner add projection push down

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@DongLiang-0 DongLiang-0 marked this pull request as draft November 13, 2023 09:04
@DongLiang-0 DongLiang-0 marked this pull request as ready for review November 14, 2023 12:37
@DongLiang-0
Copy link
Contributor Author

run buildall

@DongLiang-0 DongLiang-0 force-pushed the avro-projection branch 3 times, most recently from b35b230 to bbb5363 Compare November 14, 2023 13:01
@DongLiang-0
Copy link
Contributor Author

run buildall

@DongLiang-0 DongLiang-0 force-pushed the avro-projection branch 2 times, most recently from c438782 to 69a782a Compare November 15, 2023 02:46
@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.22 seconds
stream load tsv: 553 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17094884620 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 69a782a1d9c38940153ca369c79835c3dac90d1f, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5351	5024	5067	5024
q2	360	166	160	160
q3	2040	2011	1990	1990
q4	1404	1358	1356	1356
q5	3964	3961	4012	3961
q6	254	128	127	127
q7	1471	871	893	871
q8	2781	2775	2766	2766
q9	9790	9612	9609	9609
q10	3454	3543	3533	3533
q11	384	253	254	253
q12	426	282	280	280
q13	4563	4109	4108	4108
q14	316	285	295	285
q15	612	534	538	534
q16	676	582	579	579
q17	1136	1074	1074	1074
q18	8164	7639	7625	7625
q19	1674	1698	1680	1680
q20	572	300	303	300
q21	4722	4363	4339	4339
q22	514	400	407	400
Total cold run time: 54628 ms
Total hot run time: 50854 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5045	5038	4963	4963
q2	347	221	226	221
q3	4033	4035	4012	4012
q4	2796	2744	2745	2744
q5	9656	9639	9657	9639
q6	246	120	119	119
q7	3065	2514	2509	2509
q8	4896	4874	4835	4835
q9	13300	13023	13143	13023
q10	4094	4179	4197	4179
q11	742	684	657	657
q12	969	830	802	802
q13	4285	3895	3928	3895
q14	387	364	372	364
q15	628	545	532	532
q16	820	674	672	672
q17	3940	3880	3845	3845
q18	9694	9432	9486	9432
q19	1846	1785	1763	1763
q20	2391	2064	2063	2063
q21	8743	8659	8749	8659
q22	911	831	878	831
Total cold run time: 82834 ms
Total hot run time: 79759 ms

@DongLiang-0
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit e15768ed727b15031b2b2a16406482893e301a6c, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5329	5221	5180	5180
q2	359	166	155	155
q3	2040	2011	1990	1990
q4	1424	1373	1357	1357
q5	3978	4001	3990	3990
q6	256	128	134	128
q7	1448	881	908	881
q8	2798	2777	2776	2776
q9	10082	9693	9718	9693
q10	3454	3565	3520	3520
q11	387	246	245	245
q12	439	302	287	287
q13	4588	4183	4156	4156
q14	315	284	283	283
q15	624	558	531	531
q16	683	588	592	588
q17	1134	1106	1097	1097
q18	7952	7636	7673	7636
q19	1666	1712	1696	1696
q20	537	316	315	315
q21	4700	4354	4420	4354
q22	518	423	413	413
Total cold run time: 54711 ms
Total hot run time: 51271 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5100	5086	5124	5086
q2	357	254	245	245
q3	4192	4083	4110	4083
q4	2853	2841	2823	2823
q5	9604	9620	9657	9620
q6	249	125	123	123
q7	3093	2511	2556	2511
q8	4772	4800	4792	4792
q9	13342	13153	13181	13153
q10	4080	4183	4181	4181
q11	757	650	663	650
q12	1016	829	831	829
q13	4276	3930	3893	3893
q14	381	367	366	366
q15	620	553	559	553
q16	787	718	694	694
q17	3851	3843	3856	3843
q18	9779	9534	9646	9534
q19	1907	1790	1775	1775
q20	2469	2066	2066	2066
q21	8999	8903	8784	8784
q22	931	875	836	836
Total cold run time: 83415 ms
Total hot run time: 80440 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.58 seconds
stream load tsv: 552 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17097104253 Bytes

@DongLiang-0
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.41 seconds
stream load tsv: 551 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17096852983 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 4c961fd079ad55ad1793b200e0ced3706d2a3de0, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5373	5146	5038	5038
q2	369	142	139	139
q3	2054	1923	1856	1856
q4	1388	1278	1276	1276
q5	3971	3969	4045	3969
q6	253	133	137	133
q7	1413	864	895	864
q8	2754	2782	2770	2770
q9	28225	9875	9606	9606
q10	10249	3521	3534	3521
q11	375	258	250	250
q12	497	281	297	281
q13	4575	3831	3805	3805
q14	320	287	293	287
q15	617	575	550	550
q16	683	585	589	585
q17	1132	952	944	944
q18	7871	7413	7290	7290
q19	1691	1651	1677	1651
q20	569	326	329	326
q21	4692	4051	4038	4038
q22	544	424	424	424
Total cold run time: 79615 ms
Total hot run time: 49603 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5034	5042	5006	5006
q2	335	221	236	221
q3	4046	4006	3988	3988
q4	2796	2765	2778	2765
q5	9501	9491	9499	9491
q6	241	126	125	125
q7	2687	2319	2309	2309
q8	4820	4791	4794	4791
q9	13259	13134	13149	13134
q10	4095	4181	4169	4169
q11	727	643	668	643
q12	988	859	882	859
q13	4299	3571	3583	3571
q14	398	349	342	342
q15	624	555	562	555
q16	746	686	672	672
q17	3971	3908	3833	3833
q18	9602	9089	9122	9089
q19	1809	1766	1792	1766
q20	2384	2083	2039	2039
q21	8709	8596	8423	8423
q22	931	886	854	854
Total cold run time: 82002 ms
Total hot run time: 78645 ms

@DongLiang-0
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.49 seconds
stream load tsv: 565 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17098126512 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit ff14b0583d7d5f2d39ce44fedb5b305df2b0daef, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4916	4695	4642	4642
q2	359	167	163	163
q3	2027	1944	1851	1851
q4	1378	1285	1222	1222
q5	3948	3959	3999	3959
q6	248	126	128	126
q7	1402	871	880	871
q8	2746	2783	2759	2759
q9	9797	9676	9563	9563
q10	3474	3510	3504	3504
q11	375	253	237	237
q12	437	294	286	286
q13	4560	3799	3772	3772
q14	321	283	289	283
q15	588	547	529	529
q16	662	581	580	580
q17	1126	981	922	922
q18	7796	7365	7339	7339
q19	1680	1686	1674	1674
q20	538	301	292	292
q21	4433	3979	3993	3979
q22	475	377	371	371
Total cold run time: 53286 ms
Total hot run time: 48924 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4578	4572	4595	4572
q2	333	224	255	224
q3	3996	3989	3981	3981
q4	2691	2696	2684	2684
q5	9537	9460	9577	9460
q6	245	121	123	121
q7	2583	2227	2299	2227
q8	4441	4466	4469	4466
q9	13199	13054	13090	13054
q10	4091	4170	4173	4170
q11	771	649	618	618
q12	971	825	823	823
q13	4292	3568	3561	3561
q14	388	349	351	349
q15	578	531	520	520
q16	721	674	670	670
q17	3963	3905	3890	3890
q18	9485	9158	9081	9081
q19	1827	1772	1779	1772
q20	2366	2070	2063	2063
q21	8724	8400	8404	8400
q22	875	821	832	821
Total cold run time: 80655 ms
Total hot run time: 77527 ms

</exclusion>
</exclusions>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add dependencies in preload-extensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed it.
It seems that the current class load and system class load are not consistent.

@DongLiang-0
Copy link
Contributor Author

run buildall

@DongLiang-0
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 107eddff2f0edbfd634044390f31d39357e844d3, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4931	4695	4645	4645
q2	356	158	160	158
q3	2015	1869	1854	1854
q4	1382	1258	1208	1208
q5	3966	3965	4028	3965
q6	249	130	125	125
q7	1437	883	889	883
q8	2768	2797	2773	2773
q9	10037	9664	9634	9634
q10	3492	3606	3555	3555
q11	379	253	248	248
q12	435	295	290	290
q13	4593	3810	3791	3791
q14	318	295	282	282
q15	580	526	519	519
q16	671	580	581	580
q17	1152	977	925	925
q18	7900	7400	7305	7305
q19	1659	1664	1719	1664
q20	533	308	294	294
q21	4378	3952	4007	3952
q22	476	372	379	372
Total cold run time: 53707 ms
Total hot run time: 49022 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4589	4580	4587	4580
q2	345	213	253	213
q3	4019	4014	4008	4008
q4	2684	2690	2686	2686
q5	9746	9762	9783	9762
q6	236	123	124	123
q7	3031	2462	2469	2462
q8	4476	4499	4470	4470
q9	13222	13136	13127	13127
q10	4117	4202	4188	4188
q11	776	643	634	634
q12	987	805	819	805
q13	4312	3582	3538	3538
q14	384	353	344	344
q15	580	516	522	516
q16	754	660	689	660
q17	3871	3829	3953	3829
q18	9617	9080	9032	9032
q19	1867	1773	1767	1767
q20	2417	2060	2054	2054
q21	8760	8449	8633	8449
q22	868	799	815	799
Total cold run time: 81658 ms
Total hot run time: 78046 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 50.03 seconds
stream load tsv: 584 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 30.3 seconds inserted 10000000 Rows, about 330K ops/s
storage size: 17099195955 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.21 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17100914948 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.67 seconds
stream load tsv: 575 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17099208154 Bytes

@DongLiang-0
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit a26b1b2d44cad8ce8c88fe4e96cf431c9ad5738b, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4932	4693	4687	4687
q2	361	154	161	154
q3	2024	1940	1899	1899
q4	1390	1237	1244	1237
q5	3978	3928	3970	3928
q6	247	130	126	126
q7	1430	875	889	875
q8	2739	2816	2754	2754
q9	9873	9796	10137	9796
q10	3453	3545	3542	3542
q11	373	242	247	242
q12	440	297	292	292
q13	4563	3820	3827	3820
q14	320	279	283	279
q15	600	536	520	520
q16	668	581	589	581
q17	1141	987	952	952
q18	7787	7477	7427	7427
q19	1688	1680	1677	1677
q20	526	309	311	309
q21	4411	3966	3910	3910
q22	471	376	365	365
Total cold run time: 53415 ms
Total hot run time: 49372 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4636	4583	4559	4559
q2	341	235	252	235
q3	4026	3991	4002	3991
q4	2708	2700	2689	2689
q5	9710	9677	9662	9662
q6	243	124	120	120
q7	3014	2454	2489	2454
q8	4475	4487	4519	4487
q9	13173	13155	13095	13095
q10	4108	4179	4204	4179
q11	811	629	641	629
q12	983	809	809	809
q13	4316	3587	3574	3574
q14	371	348	352	348
q15	579	525	533	525
q16	736	660	676	660
q17	3853	3870	3891	3870
q18	9605	9074	9021	9021
q19	1822	1774	1794	1774
q20	2384	2077	2074	2074
q21	8779	8598	8491	8491
q22	913	766	802	766
Total cold run time: 81586 ms
Total hot run time: 78012 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.18 seconds
stream load tsv: 575 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17099965921 Bytes

@AshinGau
Copy link
Member

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 24, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@zy-kkk zy-kkk merged commit cd6c613 into apache:master Nov 27, 2023
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
DongLiang-0 added a commit to DongLiang-0/doris that referenced this pull request Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants