Skip to content

Conversation

@924060929
Copy link
Contributor

@924060929 924060929 commented Mar 21, 2024

Proposed changes

this pr can improve the performance of the nereids planner, in plan stage.

  1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like SimplifyArithmeticRule.
  2. replace Collection.stream() to ImmutableXxx.Builder to avoid useless method call
  3. loop unrolling some codes, like Expression.<init>, PlanTreeRewriteBottomUpJob.pushChildrenJobs
  4. use type/arity specified-code, like OneRangePartitionEvaluator.toNereidsLiterals(), PartitionRangeExpander.tryExpandRange(), PartitionRangeExpander.enumerableCount()
  5. refactor ExtractCommonFactorRule, now we can extract more cases, and I fix the deed loop when use ExtractCommonFactorRule and SimplifyRange in one iterative, because SimplifyRange generate right deep tree, but ExtractCommonFactorRule generate left deep tree
  6. refactor FoldConstantRuleOnFE, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
  7. lazy compute and cache some operation
  8. use int field to compare date
  9. use BitSet to find disableNereidsRules
  10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
  11. PlanTreeRewriteBottomUpJob don't need to clearStatePhase any more

test case

100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache

select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl 
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04' 
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1 
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38033 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit eb5415a077293360df435cfa4c279dd49484bfa0, data reload: false

------ Round 1 ----------------------------------
q1	17603	4397	4149	4149
q2	2022	149	148	148
q3	11227	1195	1234	1195
q4	10330	777	788	777
q5	7706	3109	3042	3042
q6	206	124	128	124
q7	1101	612	600	600
q8	9501	2062	2004	2004
q9	7185	6621	6561	6561
q10	9418	3366	3478	3366
q11	419	219	209	209
q12	381	194	189	189
q13	17785	2851	2884	2851
q14	227	210	210	210
q15	500	454	445	445
q16	481	366	356	356
q17	962	552	568	552
q18	7455	6652	6490	6490
q19	3039	1434	1393	1393
q20	542	262	243	243
q21	3577	2829	2993	2829
q22	341	336	300	300
Total cold run time: 112008 ms
Total hot run time: 38033 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4138	4099	4067	4067
q2	318	214	218	214
q3	2976	2861	2865	2861
q4	1840	1562	1545	1545
q5	5275	5307	5299	5299
q6	195	116	117	116
q7	2264	1869	1879	1869
q8	3178	3307	3297	3297
q9	8665	8584	8593	8584
q10	3753	3712	3653	3653
q11	538	442	439	439
q12	730	557	587	557
q13	16915	2873	2844	2844
q14	267	248	251	248
q15	481	459	446	446
q16	462	407	417	407
q17	1742	1508	1484	1484
q18	7436	7111	7136	7111
q19	1638	1512	1596	1512
q20	1897	1685	1705	1685
q21	4946	4803	4716	4716
q22	523	452	432	432
Total cold run time: 70177 ms
Total hot run time: 53386 ms

@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37866 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b555386589af0f898593a9f1164abfe9810f4bcd, data reload: false

------ Round 1 ----------------------------------
q1	17622	6181	4232	4232
q2	2025	156	144	144
q3	10777	1095	1198	1095
q4	10424	758	696	696
q5	7494	3039	3005	3005
q6	210	126	126	126
q7	1081	586	567	567
q8	9310	1973	2015	1973
q9	7085	6490	6514	6490
q10	8393	3403	3521	3403
q11	431	219	221	219
q12	368	194	190	190
q13	17806	2854	2881	2854
q14	235	207	207	207
q15	507	463	464	463
q16	472	358	344	344
q17	966	565	539	539
q18	7325	6566	6556	6556
q19	2856	1396	1382	1382
q20	550	259	239	239
q21	3536	2820	2980	2820
q22	365	332	322	322
Total cold run time: 109838 ms
Total hot run time: 37866 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4287	4123	4097	4097
q2	315	228	226	226
q3	3078	2882	2806	2806
q4	1908	1620	1541	1541
q5	5246	5246	5234	5234
q6	200	118	118	118
q7	2281	1856	1845	1845
q8	3146	3312	3319	3312
q9	8598	8570	8568	8568
q10	3730	3714	3735	3714
q11	533	443	451	443
q12	718	535	541	535
q13	16897	2889	2847	2847
q14	280	246	254	246
q15	477	446	448	446
q16	469	418	400	400
q17	1750	1505	1488	1488
q18	7493	7313	7151	7151
q19	1649	1528	1537	1528
q20	1897	1729	1731	1729
q21	4864	4757	4658	4658
q22	526	495	474	474
Total cold run time: 70342 ms
Total hot run time: 53406 ms

@924060929 924060929 force-pushed the expression_pattern_match branch from b555386 to e0de4d3 Compare March 25, 2024 08:00
@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38610 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c69b999ad69007612861d271ec74b20031bbe87b, data reload: false

------ Round 1 ----------------------------------
q1	17619	4434	4163	4163
q2	2024	153	148	148
q3	10956	1254	1244	1244
q4	10923	788	815	788
q5	8804	3078	3104	3078
q6	211	130	127	127
q7	1076	644	601	601
q8	9648	2128	2052	2052
q9	7472	6769	6721	6721
q10	8423	3455	3590	3455
q11	439	221	225	221
q12	402	206	205	205
q13	17795	2865	2882	2865
q14	244	199	208	199
q15	505	456	448	448
q16	492	363	368	363
q17	960	651	588	588
q18	7181	6629	6540	6540
q19	2723	1435	1499	1435
q20	552	252	255	252
q21	3610	2986	2819	2819
q22	343	298	310	298
Total cold run time: 112402 ms
Total hot run time: 38610 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4190	4070	4056	4056
q2	324	224	238	224
q3	2986	2845	2860	2845
q4	1849	1568	1538	1538
q5	5354	5365	5369	5365
q6	196	115	118	115
q7	2266	1856	1832	1832
q8	3215	3295	3329	3295
q9	8754	8712	8707	8707
q10	3769	3846	3814	3814
q11	549	440	436	436
q12	707	526	548	526
q13	16986	2953	2869	2869
q14	279	261	257	257
q15	492	448	453	448
q16	467	447	416	416
q17	1764	1470	1467	1467
q18	7543	7318	7071	7071
q19	1610	1497	1455	1455
q20	1906	1722	1711	1711
q21	4754	4741	4756	4741
q22	515	448	444	444
Total cold run time: 70475 ms
Total hot run time: 53632 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185821 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c69b999ad69007612861d271ec74b20031bbe87b, data reload: false

query1	925	352	348	348
query2	7370	2014	1902	1902
query3	6704	211	212	211
query4	32014	21242	21102	21102
query5	4384	393	413	393
query6	267	176	168	168
query7	4621	282	302	282
query8	237	166	168	166
query9	9209	2260	2257	2257
query10	600	278	276	276
query11	15495	14459	14452	14452
query12	188	114	106	106
query13	1626	418	416	416
query14	15160	11210	10741	10741
query15	272	213	201	201
query16	8233	257	243	243
query17	1986	571	547	547
query18	2109	285	281	281
query19	334	154	159	154
query20	128	115	124	115
query21	201	129	129	129
query22	5083	4837	4905	4837
query23	33622	33035	32694	32694
query24	10761	2886	2879	2879
query25	612	373	363	363
query26	1355	152	152	152
query27	2957	347	343	343
query28	7780	1865	1861	1861
query29	895	624	625	624
query30	295	168	148	148
query31	967	714	714	714
query32	94	53	51	51
query33	776	249	235	235
query34	1053	479	489	479
query35	840	610	590	590
query36	1003	848	882	848
query37	120	76	74	74
query38	3566	3459	3454	3454
query39	1495	1414	1437	1414
query40	202	108	110	108
query41	46	44	44	44
query42	99	95	97	95
query43	479	449	448	448
query44	1227	720	717	717
query45	279	253	247	247
query46	1112	705	710	705
query47	1957	1837	1875	1837
query48	433	353	356	353
query49	1109	340	337	337
query50	760	383	370	370
query51	6673	6660	6607	6607
query52	113	85	94	85
query53	347	277	282	277
query54	304	244	254	244
query55	86	73	81	73
query56	249	223	233	223
query57	1218	1161	1177	1161
query58	246	207	209	207
query59	2713	2527	2457	2457
query60	266	240	257	240
query61	108	105	109	105
query62	658	454	458	454
query63	302	276	275	275
query64	5868	4203	4123	4123
query65	3118	3045	3019	3019
query66	1475	357	348	348
query67	15718	14735	14906	14735
query68	5400	513	520	513
query69	606	359	371	359
query70	1249	1178	1160	1160
query71	431	287	282	282
query72	6451	2829	2646	2646
query73	707	310	316	310
query74	8141	6714	6589	6589
query75	3506	2797	2828	2797
query76	3991	951	854	854
query77	616	261	243	243
query78	10968	10287	10142	10142
query79	8406	515	526	515
query80	1656	388	384	384
query81	536	215	214	214
query82	1628	193	203	193
query83	203	137	143	137
query84	282	78	78	78
query85	1508	311	283	283
query86	461	276	295	276
query87	3746	3533	3560	3533
query88	4953	2229	2241	2229
query89	508	372	356	356
query90	1982	184	171	171
query91	169	127	155	127
query92	59	46	47	46
query93	6901	493	471	471
query94	1151	176	164	164
query95	429	346	329	329
query96	587	272	266	266
query97	3078	2888	2890	2888
query98	247	221	213	213
query99	1185	925	909	909
Total cold run time: 314243 ms
Total hot run time: 185821 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit c69b999ad69007612861d271ec74b20031bbe87b with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.3 seconds inserted 10000000 Rows, about 469K ops/s

@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 force-pushed the expression_pattern_match branch from 6842b3c to d590399 Compare March 26, 2024 03:30
@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37945 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d5903994b9ad460c39b913fe956d8655e2e4da2f, data reload: false

------ Round 1 ----------------------------------
q1	17618	4165	4114	4114
q2	2039	169	152	152
q3	10570	1188	1234	1188
q4	10231	741	751	741
q5	7466	3019	2965	2965
q6	205	127	125	125
q7	1058	609	570	570
q8	9326	2037	2018	2018
q9	7292	6578	6575	6575
q10	8393	3430	3567	3430
q11	434	226	214	214
q12	437	197	195	195
q13	17781	2843	2868	2843
q14	256	197	206	197
q15	509	476	451	451
q16	517	378	356	356
q17	956	522	628	522
q18	7284	6514	6363	6363
q19	1618	1453	1384	1384
q20	555	255	255	255
q21	3678	2993	3235	2993
q22	342	307	294	294
Total cold run time: 108565 ms
Total hot run time: 37945 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4087	4051	4044	4044
q2	319	227	234	227
q3	2948	2853	2873	2853
q4	1831	1510	1577	1510
q5	5291	5350	5319	5319
q6	192	114	115	114
q7	2236	1865	1888	1865
q8	3142	3284	3257	3257
q9	8691	8560	8678	8560
q10	3772	3935	3768	3768
q11	571	438	440	438
q12	711	552	540	540
q13	16920	2887	2850	2850
q14	276	250	261	250
q15	497	458	453	453
q16	472	430	414	414
q17	1730	1486	1470	1470
q18	7565	7170	7155	7155
q19	1624	1475	1512	1475
q20	1903	1704	1703	1703
q21	4793	4805	4827	4805
q22	522	470	452	452
Total cold run time: 70093 ms
Total hot run time: 53522 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 180954 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d5903994b9ad460c39b913fe956d8655e2e4da2f, data reload: false

query1	922	349	344	344
query2	6525	2034	1849	1849
query3	6703	204	219	204
query4	31702	21438	21239	21239
query5	4311	399	394	394
query6	261	175	170	170
query7	4620	296	285	285
query8	225	164	163	163
query9	9107	2357	2313	2313
query10	601	253	253	253
query11	14688	14151	14137	14137
query12	128	88	87	87
query13	1620	419	416	416
query14	9998	7618	7038	7038
query15	260	194	192	192
query16	8174	258	255	255
query17	1961	557	525	525
query18	2106	271	274	271
query19	290	148	151	148
query20	92	85	82	82
query21	200	127	119	119
query22	5006	4818	4763	4763
query23	33609	32836	32895	32836
query24	10770	2872	2885	2872
query25	628	374	373	373
query26	1399	155	156	155
query27	3015	363	352	352
query28	7698	1871	1871	1871
query29	898	633	629	629
query30	296	149	148	148
query31	994	720	740	720
query32	94	58	54	54
query33	765	249	239	239
query34	1069	477	493	477
query35	835	611	607	607
query36	999	894	887	887
query37	109	64	64	64
query38	3575	3428	3416	3416
query39	1474	1420	1410	1410
query40	208	112	110	110
query41	49	47	52	47
query42	101	92	94	92
query43	477	447	457	447
query44	1219	733	714	714
query45	274	261	249	249
query46	1120	700	701	700
query47	1938	1876	1846	1846
query48	442	341	342	341
query49	1131	325	331	325
query50	761	368	371	368
query51	6835	6941	6713	6713
query52	116	92	89	89
query53	349	278	274	274
query54	309	249	235	235
query55	92	71	79	71
query56	245	230	229	229
query57	1210	1137	1153	1137
query58	232	199	210	199
query59	2723	2573	2562	2562
query60	268	241	250	241
query61	109	110	111	110
query62	645	437	445	437
query63	301	276	275	275
query64	5866	4083	4012	4012
query65	3218	3028	3050	3028
query66	1430	358	348	348
query67	15236	14846	14783	14783
query68	5701	526	515	515
query69	598	399	377	377
query70	1200	1191	1135	1135
query71	456	267	269	267
query72	6838	2716	2534	2534
query73	705	315	323	315
query74	7638	6402	6530	6402
query75	3203	2208	2192	2192
query76	4130	841	972	841
query77	622	250	249	249
query78	11023	10349	10157	10157
query79	7928	513	522	513
query80	1839	367	382	367
query81	567	217	210	210
query82	1568	87	86	86
query83	340	141	149	141
query84	288	78	80	78
query85	1631	297	291	291
query86	488	312	302	302
query87	3744	3673	3685	3673
query88	5030	2379	2388	2379
query89	489	374	386	374
query90	1993	177	180	177
query91	168	133	135	133
query92	63	47	47	47
query93	5878	504	487	487
query94	1235	181	179	179
query95	435	338	338	338
query96	605	279	276	276
query97	2650	2515	2483	2483
query98	231	215	204	204
query99	1258	910	893	893
Total cold run time: 305173 ms
Total hot run time: 180954 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit d5903994b9ad460c39b913fe956d8655e2e4da2f with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.4 seconds inserted 10000000 Rows, about 467K ops/s

@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 force-pushed the expression_pattern_match branch from b35d902 to 54f9e96 Compare March 26, 2024 10:03
@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37755 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 761534cc184a6bc967b856699ad49381b31eb671, data reload: false

------ Round 1 ----------------------------------
q1	17608	4194	4256	4194
q2	2018	149	144	144
q3	10601	1137	1165	1137
q4	10226	743	762	743
q5	7482	3034	2958	2958
q6	208	126	125	125
q7	1064	577	556	556
q8	9334	1971	1966	1966
q9	7207	6587	6557	6557
q10	8418	3399	3608	3399
q11	443	220	215	215
q12	432	196	190	190
q13	17784	2834	2832	2832
q14	247	214	206	206
q15	517	471	456	456
q16	507	369	364	364
q17	943	548	582	548
q18	7066	6411	6456	6411
q19	4829	1430	1452	1430
q20	544	268	237	237
q21	3516	2861	2787	2787
q22	345	301	300	300
Total cold run time: 111339 ms
Total hot run time: 37755 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4142	4055	4074	4055
q2	326	229	232	229
q3	2947	2807	2831	2807
q4	1848	1580	1522	1522
q5	5282	5321	5307	5307
q6	198	116	116	116
q7	2253	1836	1859	1836
q8	3135	3297	3260	3260
q9	8660	8632	8666	8632
q10	3793	3758	3788	3758
q11	539	440	446	440
q12	700	527	559	527
q13	16906	2859	2856	2856
q14	289	246	272	246
q15	506	454	448	448
q16	474	416	411	411
q17	1739	1477	1464	1464
q18	7521	7226	7094	7094
q19	1604	1479	1552	1479
q20	1944	1729	1708	1708
q21	4871	4591	4607	4591
q22	517	442	435	435
Total cold run time: 70194 ms
Total hot run time: 53221 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181208 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 761534cc184a6bc967b856699ad49381b31eb671, data reload: false

query1	917	364	361	361
query2	6536	1878	1775	1775
query3	6704	221	218	218
query4	31483	21315	21254	21254
query5	4275	393	386	386
query6	274	173	187	173
query7	4634	288	291	288
query8	231	164	166	164
query9	9148	2226	2263	2226
query10	600	258	252	252
query11	14660	14247	14236	14236
query12	133	91	82	82
query13	1627	438	423	423
query14	9053	8043	7872	7872
query15	251	202	195	195
query16	8238	255	248	248
query17	1999	548	540	540
query18	2077	270	276	270
query19	352	150	147	147
query20	89	81	87	81
query21	196	130	127	127
query22	4983	4793	4733	4733
query23	33415	32889	33074	32889
query24	10772	2865	2880	2865
query25	622	372	389	372
query26	1243	155	158	155
query27	2770	345	348	345
query28	7373	1838	1845	1838
query29	878	663	617	617
query30	307	144	144	144
query31	997	730	723	723
query32	91	57	55	55
query33	776	264	249	249
query34	1000	472	485	472
query35	834	606	614	606
query36	1002	865	869	865
query37	128	62	66	62
query38	3522	3450	3420	3420
query39	1491	1421	1426	1421
query40	216	114	113	113
query41	50	48	48	48
query42	113	96	96	96
query43	487	457	456	456
query44	1150	735	725	725
query45	279	280	268	268
query46	1099	699	689	689
query47	1905	1828	1827	1827
query48	450	363	368	363
query49	1104	350	323	323
query50	752	373	369	369
query51	6677	6568	6482	6482
query52	110	90	94	90
query53	348	274	280	274
query54	302	237	241	237
query55	78	81	81	81
query56	250	221	233	221
query57	1206	1130	1137	1130
query58	236	209	210	209
query59	2769	2538	2533	2533
query60	265	244	254	244
query61	112	120	110	110
query62	661	475	434	434
query63	311	280	278	278
query64	5778	4087	4058	4058
query65	3131	3054	3042	3042
query66	1436	364	350	350
query67	15264	14651	14632	14632
query68	9017	548	528	528
query69	646	395	403	395
query70	1394	1099	1143	1099
query71	518	267	268	267
query72	6802	2690	2512	2512
query73	1653	316	316	316
query74	7934	6271	6434	6271
query75	3827	2254	2229	2229
query76	5503	874	945	874
query77	627	259	259	259
query78	11005	10160	10159	10159
query79	11107	525	515	515
query80	1786	366	356	356
query81	522	213	218	213
query82	323	82	86	82
query83	218	144	153	144
query84	284	81	79	79
query85	1113	296	292	292
query86	365	309	293	293
query87	3656	3601	3568	3568
query88	5085	2364	2358	2358
query89	488	377	370	370
query90	2050	176	173	173
query91	172	132	128	128
query92	61	46	48	46
query93	6812	490	482	482
query94	1323	174	172	172
query95	439	332	334	332
query96	617	273	269	269
query97	2678	2501	2500	2500
query98	238	219	202	202
query99	1151	894	933	894
Total cold run time: 311634 ms
Total hot run time: 181208 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 761534cc184a6bc967b856699ad49381b31eb671 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       21.8 seconds inserted 10000000 Rows, about 458K ops/s

@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37990 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 68ab0f95f39a510d316d09260dedac4b98f771f8, data reload: false

------ Round 1 ----------------------------------
q1	17772	4403	4267	4267
q2	2343	159	154	154
q3	11027	1170	1218	1170
q4	10761	782	766	766
q5	7797	3038	3043	3038
q6	208	128	127	127
q7	1069	625	586	586
q8	9331	2056	1968	1968
q9	7206	6628	6561	6561
q10	8342	3395	3577	3395
q11	413	222	216	216
q12	375	199	195	195
q13	17793	2848	2851	2848
q14	237	198	200	198
q15	508	457	458	457
q16	452	368	359	359
q17	942	565	573	565
q18	7224	6434	6386	6386
q19	1571	1429	1495	1429
q20	563	256	240	240
q21	3526	2933	2779	2779
q22	345	286	293	286
Total cold run time: 109805 ms
Total hot run time: 37990 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4076	4090	4183	4090
q2	328	220	218	218
q3	2982	2831	2864	2831
q4	1833	1533	1575	1533
q5	5323	5322	5326	5322
q6	198	117	118	117
q7	2232	1824	1869	1824
q8	3135	3267	3266	3266
q9	8750	8699	8685	8685
q10	3785	3783	3771	3771
q11	564	448	436	436
q12	700	529	549	529
q13	16226	2810	2862	2810
q14	282	248	263	248
q15	494	457	464	457
q16	473	414	444	414
q17	1731	1466	1451	1451
q18	7431	7094	7006	7006
q19	1605	1485	1532	1485
q20	1921	1701	1678	1678
q21	4879	4720	4718	4718
q22	506	437	455	437
Total cold run time: 69454 ms
Total hot run time: 53326 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 180560 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 68ab0f95f39a510d316d09260dedac4b98f771f8, data reload: false

query1	960	353	338	338
query2	6531	1935	1841	1841
query3	6704	211	219	211
query4	31710	21147	21091	21091
query5	4338	395	383	383
query6	263	185	190	185
query7	4614	288	298	288
query8	225	165	168	165
query9	9430	2362	2332	2332
query10	597	258	263	258
query11	15157	14093	14351	14093
query12	134	90	82	82
query13	1627	407	429	407
query14	10762	7426	7937	7426
query15	248	199	194	194
query16	8044	252	257	252
query17	2008	556	526	526
query18	1374	272	267	267
query19	331	153	157	153
query20	99	86	82	82
query21	196	122	122	122
query22	4982	4840	4805	4805
query23	33552	32788	32962	32788
query24	12131	2825	2889	2825
query25	665	380	385	380
query26	1771	162	164	162
query27	2984	354	353	353
query28	7044	1921	1923	1921
query29	963	619	624	619
query30	301	146	143	143
query31	1005	718	723	718
query32	88	57	58	57
query33	768	249	247	247
query34	1131	480	501	480
query35	827	610	600	600
query36	1013	869	878	869
query37	261	67	64	64
query38	3509	3430	3392	3392
query39	1467	1415	1399	1399
query40	284	115	109	109
query41	51	48	47	47
query42	105	97	95	95
query43	482	452	458	452
query44	1215	738	741	738
query45	286	268	268	268
query46	1100	716	695	695
query47	1899	1839	1829	1829
query48	454	350	364	350
query49	1227	341	338	338
query50	758	371	376	371
query51	6631	6591	6584	6584
query52	105	94	97	94
query53	341	273	273	273
query54	326	240	242	240
query55	78	72	78	72
query56	243	223	228	223
query57	1220	1115	1130	1115
query58	236	210	209	209
query59	2798	2483	2595	2483
query60	276	244	269	244
query61	116	114	112	112
query62	628	451	439	439
query63	301	278	279	278
query64	6976	4122	4092	4092
query65	3128	3044	3022	3022
query66	1355	346	352	346
query67	15391	15106	14692	14692
query68	8911	553	543	543
query69	621	379	392	379
query70	1339	1188	1150	1150
query71	505	278	273	273
query72	6523	2718	2494	2494
query73	1484	323	340	323
query74	7851	6398	6332	6332
query75	3819	2218	2184	2184
query76	5250	921	854	854
query77	629	261	257	257
query78	10993	10145	10077	10077
query79	10504	526	527	526
query80	2053	366	362	362
query81	505	211	222	211
query82	661	81	83	81
query83	230	138	141	138
query84	285	81	76	76
query85	1178	289	292	289
query86	395	306	308	306
query87	3799	3494	3523	3494
query88	4785	2322	2339	2322
query89	497	361	363	361
query90	2053	172	170	170
query91	165	137	132	132
query92	59	46	46	46
query93	6898	505	510	505
query94	1220	174	172	172
query95	411	337	333	333
query96	601	273	263	263
query97	2638	2497	2452	2452
query98	234	219	201	201
query99	1214	950	926	926
Total cold run time: 316134 ms
Total hot run time: 180560 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 68ab0f95f39a510d316d09260dedac4b98f771f8 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       14.1 seconds inserted 10000000 Rows, about 709K ops/s

@924060929 924060929 force-pushed the expression_pattern_match branch from dfc4406 to c2c4b34 Compare March 30, 2024 16:50
@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38823 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c2c4b34376af0a8821388ff73effbbe6ed5c0e70, data reload: false

------ Round 1 ----------------------------------
q1	17635	4204	4265	4204
q2	1999	176	177	176
q3	10505	1186	1392	1186
q4	10193	887	954	887
q5	7509	2955	2898	2898
q6	213	131	137	131
q7	1105	624	594	594
q8	9401	2229	2015	2015
q9	6721	6197	6131	6131
q10	8442	3528	3482	3482
q11	411	230	230	230
q12	382	215	215	215
q13	17782	2910	2894	2894
q14	271	245	248	245
q15	521	478	479	478
q16	518	377	399	377
q17	949	900	884	884
q18	7104	6569	6598	6569
q19	2612	1569	1549	1549
q20	609	322	317	317
q21	3506	3124	3058	3058
q22	349	303	305	303
Total cold run time: 108737 ms
Total hot run time: 38823 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4093	4025	4055	4025
q2	333	222	217	217
q3	2934	2948	2956	2948
q4	1863	1825	1834	1825
q5	5236	5188	5185	5185
q6	209	126	126	126
q7	2241	1791	1810	1791
q8	3219	3299	3289	3289
q9	8447	8455	8443	8443
q10	3753	3801	3812	3801
q11	548	451	454	451
q12	733	524	559	524
q13	16810	2911	2878	2878
q14	285	258	265	258
q15	515	476	459	459
q16	462	421	396	396
q17	1706	1669	1678	1669
q18	7575	7768	7575	7575
q19	1680	1659	1694	1659
q20	2020	1867	1876	1867
q21	5294	5084	4962	4962
q22	501	453	445	445
Total cold run time: 70457 ms
Total hot run time: 54793 ms

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 1, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2024

PR approved by anyone and no changes requested.

@924060929 924060929 merged commit 7338683 into apache:master Apr 1, 2024
@924060929 924060929 deleted the expression_pattern_match branch April 1, 2024 05:10
924060929 added a commit that referenced this pull request Apr 2, 2024
#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Apr 10, 2024
…pache#32617)

this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683)
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Apr 10, 2024
apache#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it

(cherry picked from commit 8b070d1)
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Apr 10, 2024
…pache#32617)

this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683)
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Apr 10, 2024
apache#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it

(cherry picked from commit 8b070d1)
yiguolei pushed a commit that referenced this pull request Apr 10, 2024
…33460)

* [enhancement](Nereids) refactor expression rewriter to pattern match (#32617)

this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683)

* [fix](Nereids) fix link children failed (#33134)

#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it

(cherry picked from commit 8b070d1)
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Apr 10, 2024
…pache#32617)

this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683)
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Apr 10, 2024
apache#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it

(cherry picked from commit 8b070d1)
yiguolei pushed a commit that referenced this pull request Apr 10, 2024
…32617)

this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683)
yiguolei pushed a commit that referenced this pull request Apr 10, 2024
#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it

(cherry picked from commit 8b070d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants