Skip to content

Conversation

@keanji-x
Copy link
Contributor

@keanji-x keanji-x commented Jun 7, 2024

Proposed changes

This PR optimizes query performance by pushing down aggregations through joins when grouped by a foreign key. This adjustment reduces data processing overhead above the join, improving both speed and resource efficiency.

Transformation Example:

Before Optimization:

Aggregation(group by fk)
     |
   Join(pk = fk)
   /  \
  pk  fk

After Optimization:

 Join(pk = fk)
 /     \
pk  Aggregation(group by fk)
       |
      fk

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@keanji-x
Copy link
Contributor Author

keanji-x commented Jun 7, 2024

run buildall

@keanji-x keanji-x force-pushed the add_agg_push_foreign branch from ddf7409 to d4631c4 Compare June 7, 2024 06:27
@keanji-x
Copy link
Contributor Author

keanji-x commented Jun 7, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41103 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d4631c41bcf0857b42c032d658f3efb8f358e8e9, data reload: false

------ Round 1 ----------------------------------
q1	17616	4515	4354	4354
q2	2041	201	210	201
q3	10421	1254	1112	1112
q4	10204	817	839	817
q5	7496	2709	2669	2669
q6	224	139	137	137
q7	973	635	615	615
q8	9224	2149	2101	2101
q9	9122	6708	6685	6685
q10	9266	3937	3979	3937
q11	456	257	238	238
q12	472	248	252	248
q13	17233	3296	3183	3183
q14	285	225	235	225
q15	525	463	474	463
q16	531	389	411	389
q17	1003	840	643	643
q18	8450	7871	7774	7774
q19	7149	1373	1250	1250
q20	642	330	336	330
q21	5125	3382	4050	3382
q22	413	350	350	350
Total cold run time: 118871 ms
Total hot run time: 41103 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4582	4464	4459	4459
q2	374	281	273	273
q3	3162	2913	2936	2913
q4	1959	1734	1662	1662
q5	5303	5725	5545	5545
q6	230	128	128	128
q7	2209	1811	1807	1807
q8	3255	3366	3377	3366
q9	8706	8570	8626	8570
q10	4080	3838	3889	3838
q11	589	487	478	478
q12	766	606	616	606
q13	16092	3035	3192	3035
q14	324	266	286	266
q15	518	477	510	477
q16	477	434	430	430
q17	1801	1529	1468	1468
q18	8027	7782	7276	7276
q19	1762	1654	1495	1495
q20	2264	1791	1781	1781
q21	7673	4762	4760	4760
q22	636	534	546	534
Total cold run time: 74789 ms
Total hot run time: 55167 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173088 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d4631c41bcf0857b42c032d658f3efb8f358e8e9, data reload: false

query1	919	377	371	371
query2	6457	2399	2302	2302
query3	6638	206	211	206
query4	20316	17446	17206	17206
query5	4118	466	468	466
query6	252	168	161	161
query7	4556	306	304	304
query8	330	287	289	287
query9	8392	2425	2420	2420
query10	454	307	306	306
query11	10594	10180	9990	9990
query12	136	84	85	84
query13	1626	377	369	369
query14	9546	7508	7696	7508
query15	227	189	182	182
query16	7841	262	266	262
query17	1705	538	526	526
query18	1973	286	276	276
query19	200	158	152	152
query20	92	84	85	84
query21	204	134	131	131
query22	4228	4046	4086	4046
query23	33400	32915	32905	32905
query24	11683	2848	2812	2812
query25	647	356	361	356
query26	1740	156	160	156
query27	2964	331	318	318
query28	7418	2066	2054	2054
query29	1058	621	610	610
query30	265	149	150	149
query31	925	732	729	729
query32	96	54	54	54
query33	748	290	306	290
query34	934	494	472	472
query35	729	608	642	608
query36	1081	949	915	915
query37	286	71	76	71
query38	2879	2719	2696	2696
query39	850	801	800	800
query40	260	124	123	123
query41	61	52	55	52
query42	132	98	100	98
query43	583	546	543	543
query44	1220	729	754	729
query45	199	172	166	166
query46	1072	727	720	720
query47	1866	1769	1760	1760
query48	372	300	298	298
query49	1006	414	425	414
query50	794	393	398	393
query51	6867	6681	6656	6656
query52	134	92	93	92
query53	356	288	294	288
query54	909	450	447	447
query55	77	73	74	73
query56	288	260	262	260
query57	1133	1079	1070	1070
query58	260	251	255	251
query59	3362	3159	3059	3059
query60	300	275	286	275
query61	94	90	91	90
query62	635	452	435	435
query63	312	296	285	285
query64	9831	2241	1772	1772
query65	3357	3093	3134	3093
query66	1346	334	348	334
query67	15450	15214	14943	14943
query68	4385	541	546	541
query69	465	309	315	309
query70	1167	1146	1163	1146
query71	400	278	276	276
query72	6799	5443	5482	5443
query73	752	322	332	322
query74	5899	5460	5486	5460
query75	3365	2619	2653	2619
query76	2648	939	890	890
query77	448	303	304	303
query78	10173	9898	9780	9780
query79	1796	516	518	516
query80	2210	483	470	470
query81	608	222	226	222
query82	1051	107	110	107
query83	313	172	176	172
query84	266	86	84	84
query85	1303	279	278	278
query86	401	325	325	325
query87	3248	3159	3095	3095
query88	2860	2340	2358	2340
query89	481	395	387	387
query90	1811	196	199	196
query91	139	113	109	109
query92	59	54	55	54
query93	1492	527	511	511
query94	1227	198	203	198
query95	488	320	316	316
query96	582	264	265	264
query97	3215	2990	3085	2990
query98	226	202	201	201
query99	1267	811	831	811
Total cold run time: 272327 ms
Total hot run time: 173088 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d4631c41bcf0857b42c032d658f3efb8f358e8e9, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.24	0.06	0.06
query4	1.66	0.08	0.08
query5	0.51	0.49	0.49
query6	1.13	0.72	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.50	0.49
query10	0.54	0.54	0.55
query11	0.17	0.11	0.12
query12	0.15	0.12	0.13
query13	0.60	0.59	0.58
query14	0.76	0.78	0.78
query15	0.85	0.83	0.81
query16	0.36	0.37	0.36
query17	0.95	0.95	0.96
query18	0.20	0.23	0.28
query19	1.79	1.68	1.82
query20	0.01	0.02	0.01
query21	15.54	0.67	0.65
query22	4.12	7.28	2.52
query23	18.26	1.47	1.40
query24	2.18	0.23	0.21
query25	0.16	0.09	0.08
query26	0.27	0.18	0.18
query27	0.09	0.07	0.07
query28	13.21	1.01	0.99
query29	13.60	3.30	3.20
query30	0.24	0.06	0.06
query31	2.85	0.40	0.39
query32	3.25	0.47	0.48
query33	2.90	2.96	2.86
query34	17.03	4.47	4.41
query35	4.58	4.50	4.60
query36	0.68	0.45	0.45
query37	0.17	0.14	0.15
query38	0.16	0.15	0.15
query39	0.04	0.03	0.03
query40	0.17	0.14	0.15
query41	0.08	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.04	0.03
Total cold run time: 110.33 s
Total hot run time: 31.15 s

@keanji-x keanji-x force-pushed the add_agg_push_foreign branch from d4631c4 to c2f6800 Compare June 7, 2024 08:59
@keanji-x
Copy link
Contributor Author

keanji-x commented Jun 7, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39773 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c2f6800c97d630e4164d5ea8b6525b52d54ffee1, data reload: false

------ Round 1 ----------------------------------
q1	17909	4549	4369	4369
q2	2637	190	200	190
q3	11712	1116	1186	1116
q4	10539	819	836	819
q5	7745	2652	2649	2649
q6	222	134	133	133
q7	967	621	599	599
q8	9562	2051	2076	2051
q9	8716	6476	6501	6476
q10	8878	3740	3752	3740
q11	467	250	238	238
q12	428	228	224	224
q13	17749	2951	2961	2951
q14	264	208	222	208
q15	518	475	456	456
q16	500	375	371	371
q17	940	716	746	716
q18	8117	7444	7379	7379
q19	2939	1483	1306	1306
q20	650	315	306	306
q21	4895	3151	3892	3151
q22	380	325	330	325
Total cold run time: 116734 ms
Total hot run time: 39773 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4405	4220	4270	4220
q2	375	260	257	257
q3	2983	2766	2731	2731
q4	1873	1585	1622	1585
q5	5256	5267	5269	5267
q6	220	125	124	124
q7	2070	1739	1749	1739
q8	3188	3326	3308	3308
q9	8321	8326	8308	8308
q10	3845	3680	3644	3644
q11	574	489	493	489
q12	766	610	587	587
q13	16639	2978	2967	2967
q14	305	270	276	270
q15	516	480	469	469
q16	469	409	419	409
q17	1787	1502	1480	1480
q18	7666	7419	7335	7335
q19	2946	1514	1550	1514
q20	1998	1801	1771	1771
q21	4794	4671	4740	4671
q22	618	525	551	525
Total cold run time: 71614 ms
Total hot run time: 53670 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172670 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c2f6800c97d630e4164d5ea8b6525b52d54ffee1, data reload: false

query1	943	376	375	375
query2	6456	2432	2302	2302
query3	6652	204	216	204
query4	19524	17297	17347	17297
query5	4146	467	448	448
query6	260	156	156	156
query7	4602	297	296	296
query8	310	278	288	278
query9	8679	2404	2365	2365
query10	435	290	268	268
query11	10638	10120	10112	10112
query12	138	83	88	83
query13	1619	361	362	361
query14	9340	6961	7641	6961
query15	231	190	178	178
query16	7829	262	264	262
query17	1829	531	531	531
query18	1952	284	279	279
query19	206	162	163	162
query20	95	85	81	81
query21	200	133	130	130
query22	4284	4127	3975	3975
query23	33632	33332	33186	33186
query24	11990	2841	2792	2792
query25	694	372	374	372
query26	1810	160	161	160
query27	3045	319	317	317
query28	7704	2057	2045	2045
query29	1178	606	605	605
query30	284	150	152	150
query31	986	733	714	714
query32	91	51	57	51
query33	765	280	269	269
query34	1047	468	462	462
query35	740	652	630	630
query36	1116	919	895	895
query37	295	67	76	67
query38	2854	2753	2787	2753
query39	871	827	833	827
query40	273	123	121	121
query41	53	50	52	50
query42	124	96	95	95
query43	580	553	543	543
query44	1246	736	744	736
query45	190	163	160	160
query46	1073	733	701	701
query47	1836	1733	1832	1733
query48	375	291	291	291
query49	1202	406	428	406
query50	780	382	377	377
query51	6817	6703	6592	6592
query52	106	92	94	92
query53	351	287	286	286
query54	994	445	447	445
query55	76	77	71	71
query56	272	256	256	256
query57	1142	1060	1076	1060
query58	265	247	263	247
query59	3395	3096	3182	3096
query60	288	267	279	267
query61	97	89	118	89
query62	646	458	456	456
query63	312	289	293	289
query64	9819	2252	1701	1701
query65	3126	3232	3123	3123
query66	1376	336	331	331
query67	15252	15240	14881	14881
query68	4518	543	602	543
query69	513	412	358	358
query70	1094	1139	1117	1117
query71	391	270	279	270
query72	7157	5635	5555	5555
query73	751	320	316	316
query74	5824	5537	5527	5527
query75	3420	2651	2641	2641
query76	2635	962	901	901
query77	493	303	295	295
query78	10401	9847	9774	9774
query79	1940	505	508	505
query80	2140	465	465	465
query81	603	220	214	214
query82	1001	105	107	105
query83	318	174	173	173
query84	272	83	88	83
query85	1257	296	273	273
query86	388	330	292	292
query87	3248	3055	3053	3053
query88	3378	2330	2319	2319
query89	478	397	378	378
query90	1780	190	185	185
query91	129	97	96	96
query92	58	49	50	49
query93	1519	520	501	501
query94	1181	195	188	188
query95	398	306	306	306
query96	578	273	263	263
query97	3164	3026	2997	2997
query98	220	197	195	195
query99	1201	839	857	839
Total cold run time: 273984 ms
Total hot run time: 172670 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c2f6800c97d630e4164d5ea8b6525b52d54ffee1, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.24	0.05	0.05
query4	1.67	0.07	0.10
query5	0.50	0.49	0.50
query6	1.12	0.73	0.73
query7	0.02	0.02	0.02
query8	0.06	0.04	0.05
query9	0.55	0.51	0.50
query10	0.55	0.56	0.55
query11	0.16	0.11	0.12
query12	0.15	0.12	0.12
query13	0.60	0.59	0.60
query14	0.75	0.78	0.78
query15	0.83	0.81	0.81
query16	0.35	0.37	0.35
query17	1.05	1.05	1.05
query18	0.22	0.25	0.23
query19	1.73	1.67	1.69
query20	0.02	0.01	0.01
query21	15.45	0.69	0.65
query22	4.34	6.67	2.59
query23	18.29	1.41	1.24
query24	2.11	0.22	0.21
query25	0.14	0.09	0.08
query26	0.26	0.17	0.17
query27	0.07	0.08	0.09
query28	13.26	1.01	1.01
query29	13.18	3.27	3.27
query30	0.25	0.07	0.05
query31	2.86	0.39	0.39
query32	3.26	0.47	0.48
query33	2.89	2.94	2.88
query34	17.43	4.44	4.42
query35	4.47	4.50	4.47
query36	0.65	0.46	0.48
query37	0.17	0.16	0.15
query38	0.15	0.15	0.14
query39	0.04	0.03	0.04
query40	0.16	0.14	0.14
query41	0.09	0.04	0.04
query42	0.05	0.05	0.04
query43	0.03	0.04	0.04
Total cold run time: 110.29 s
Total hot run time: 31.26 s

@keanji-x keanji-x force-pushed the add_agg_push_foreign branch from c2f6800 to 7fa9d2b Compare June 11, 2024 05:49
@keanji-x
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39239 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7fa9d2b6ea137bee03c24b92d0c314ef8375c25c, data reload: false

------ Round 1 ----------------------------------
q1	17627	4290	4231	4231
q2	2033	196	191	191
q3	10462	1149	1114	1114
q4	10182	760	700	700
q5	7478	2644	2731	2644
q6	217	135	135	135
q7	939	593	597	593
q8	9230	2034	2066	2034
q9	8822	6428	6443	6428
q10	8950	3671	3701	3671
q11	453	233	245	233
q12	507	224	216	216
q13	18826	2977	2962	2962
q14	264	216	216	216
q15	508	487	492	487
q16	517	373	378	373
q17	961	656	640	640
q18	7884	7386	7329	7329
q19	4844	1555	1376	1376
q20	655	297	296	296
q21	4844	3086	3037	3037
q22	384	335	333	333
Total cold run time: 116587 ms
Total hot run time: 39239 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4348	4169	4208	4169
q2	363	265	268	265
q3	2958	2749	2903	2749
q4	1944	1725	1681	1681
q5	5503	5494	5497	5494
q6	219	128	135	128
q7	2190	1759	1820	1759
q8	3295	3377	3382	3377
q9	8721	8643	8746	8643
q10	4068	3828	3807	3807
q11	591	465	484	465
q12	764	600	608	600
q13	17158	3194	3158	3158
q14	299	297	272	272
q15	520	473	492	473
q16	488	420	414	414
q17	1846	1537	1479	1479
q18	8113	7968	7748	7748
q19	4507	1614	1628	1614
q20	2095	1923	1876	1876
q21	5114	4866	4830	4830
q22	623	534	553	534
Total cold run time: 75727 ms
Total hot run time: 55535 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172819 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7fa9d2b6ea137bee03c24b92d0c314ef8375c25c, data reload: false

query1	941	386	382	382
query2	6480	2577	2399	2399
query3	6626	203	208	203
query4	19039	17430	17329	17329
query5	3636	451	442	442
query6	239	155	158	155
query7	4582	292	285	285
query8	335	306	294	294
query9	8493	2364	2357	2357
query10	561	300	278	278
query11	10446	10092	10028	10028
query12	122	96	83	83
query13	1636	379	371	371
query14	10036	7025	6851	6851
query15	236	187	182	182
query16	7878	289	266	266
query17	1888	561	530	530
query18	2030	278	282	278
query19	200	156	156	156
query20	92	85	85	85
query21	209	134	127	127
query22	4178	3960	3957	3957
query23	33731	33580	33463	33463
query24	10760	2951	2926	2926
query25	612	391	380	380
query26	773	156	153	153
query27	2217	325	341	325
query28	5864	2139	2106	2106
query29	858	634	603	603
query30	263	158	162	158
query31	964	759	758	758
query32	95	51	56	51
query33	684	284	280	280
query34	879	490	470	470
query35	733	661	643	643
query36	1134	978	984	978
query37	150	71	71	71
query38	2908	2823	2817	2817
query39	888	837	814	814
query40	206	134	127	127
query41	56	52	52	52
query42	117	103	117	103
query43	601	570	554	554
query44	1062	708	724	708
query45	191	155	170	155
query46	1080	707	720	707
query47	1830	1776	1786	1776
query48	355	296	306	296
query49	859	401	410	401
query50	761	386	380	380
query51	6736	6601	6781	6601
query52	96	90	94	90
query53	354	288	282	282
query54	871	437	428	428
query55	74	74	70	70
query56	272	272	249	249
query57	1133	1016	1020	1016
query58	252	239	244	239
query59	3736	3199	3337	3199
query60	312	269	257	257
query61	89	87	89	87
query62	601	433	437	433
query63	311	281	284	281
query64	8462	2233	1764	1764
query65	3128	3107	3111	3107
query66	741	322	328	322
query67	15337	15096	14784	14784
query68	4634	530	529	529
query69	544	291	301	291
query70	1189	1044	1085	1044
query71	440	266	281	266
query72	7654	5295	5683	5295
query73	738	326	324	324
query74	5960	5448	5649	5448
query75	4336	2629	2655	2629
query76	3084	933	966	933
query77	648	287	286	286
query78	10343	9747	9813	9747
query79	2423	513	510	510
query80	2587	446	445	445
query81	562	216	215	215
query82	968	100	98	98
query83	339	171	168	168
query84	261	81	86	81
query85	2036	273	272	272
query86	475	324	296	296
query87	3246	3069	3073	3069
query88	4238	2369	2374	2369
query89	467	373	380	373
query90	1854	186	185	185
query91	122	93	98	93
query92	57	48	47	47
query93	2493	496	488	488
query94	1199	181	179	179
query95	390	304	312	304
query96	588	268	268	268
query97	3215	3023	3021	3021
query98	244	197	194	194
query99	1103	850	852	850
Total cold run time: 271456 ms
Total hot run time: 172819 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7fa9d2b6ea137bee03c24b92d0c314ef8375c25c, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.23	0.04	0.05
query4	1.68	0.07	0.06
query5	0.48	0.49	0.48
query6	1.12	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.53	0.49	0.49
query10	0.55	0.54	0.53
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.59	0.59	0.60
query14	0.76	0.78	0.77
query15	0.82	0.82	0.82
query16	0.37	0.36	0.37
query17	1.02	1.04	1.01
query18	0.22	0.26	0.22
query19	1.77	1.69	1.66
query20	0.01	0.01	0.01
query21	15.44	0.64	0.64
query22	4.18	7.33	2.24
query23	18.31	1.29	1.28
query24	2.20	0.23	0.21
query25	0.17	0.08	0.09
query26	0.26	0.17	0.17
query27	0.08	0.08	0.07
query28	13.18	1.02	0.99
query29	12.66	3.26	3.26
query30	0.26	0.07	0.05
query31	2.85	0.39	0.38
query32	3.29	0.47	0.46
query33	2.92	2.95	2.88
query34	16.93	4.38	4.43
query35	4.46	4.45	4.51
query36	0.65	0.46	0.46
query37	0.18	0.16	0.15
query38	0.16	0.14	0.15
query39	0.04	0.03	0.03
query40	0.18	0.14	0.14
query41	0.10	0.05	0.04
query42	0.06	0.06	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.23 s
Total hot run time: 30.7 s

@keanji-x keanji-x force-pushed the add_agg_push_foreign branch from 7fa9d2b to c235383 Compare June 18, 2024 07:06
@keanji-x
Copy link
Contributor Author

run buildall

1 similar comment
@keanji-x
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39697 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f15731d317de1249758f759b3e9307eda422dd50, data reload: false

------ Round 1 ----------------------------------
q1	17625	4378	4285	4285
q2	2010	187	190	187
q3	10472	1081	1137	1081
q4	10189	774	809	774
q5	7560	2665	2644	2644
q6	219	137	137	137
q7	964	601	584	584
q8	9219	2056	2094	2056
q9	8904	6481	6460	6460
q10	8869	3680	3747	3680
q11	443	237	246	237
q12	445	230	221	221
q13	17763	2957	3006	2957
q14	267	218	228	218
q15	524	475	491	475
q16	536	375	377	375
q17	950	645	715	645
q18	8021	7327	7335	7327
q19	6872	1472	1471	1471
q20	684	320	327	320
q21	4900	3235	3791	3235
q22	389	346	328	328
Total cold run time: 117825 ms
Total hot run time: 39697 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4441	4304	4266	4266
q2	372	275	287	275
q3	2995	2953	2925	2925
q4	1962	1701	1694	1694
q5	5515	5515	5468	5468
q6	224	132	132	132
q7	2240	1865	1830	1830
q8	3336	3395	3450	3395
q9	8785	8843	8854	8843
q10	4174	3855	3829	3829
q11	603	523	508	508
q12	822	655	630	630
q13	15969	3238	3200	3200
q14	307	271	305	271
q15	528	518	481	481
q16	534	441	433	433
q17	1839	1489	1577	1489
q18	8161	8289	7663	7663
q19	1826	1627	1725	1627
q20	2100	1991	1992	1991
q21	5338	5182	5013	5013
q22	681	647	562	562
Total cold run time: 72752 ms
Total hot run time: 56525 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173455 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f15731d317de1249758f759b3e9307eda422dd50, data reload: false

query1	1947	387	371	371
query2	7508	2480	2310	2310
query3	6638	205	205	205
query4	19982	17213	17483	17213
query5	3618	467	457	457
query6	240	160	156	156
query7	4554	301	288	288
query8	348	295	281	281
query9	8357	2388	2371	2371
query10	557	323	287	287
query11	10595	10141	9894	9894
query12	119	86	82	82
query13	1611	368	371	368
query14	10220	7775	7655	7655
query15	262	188	193	188
query16	8124	269	271	269
query17	2036	535	504	504
query18	2095	281	271	271
query19	328	156	152	152
query20	94	84	82	82
query21	214	128	128	128
query22	4347	4063	3934	3934
query23	33839	33511	33406	33406
query24	10253	2955	2905	2905
query25	572	379	361	361
query26	701	152	153	152
query27	2171	326	321	321
query28	5746	2116	2118	2116
query29	891	614	632	614
query30	252	154	157	154
query31	995	762	756	756
query32	91	54	53	53
query33	656	286	300	286
query34	910	479	457	457
query35	755	652	632	632
query36	1128	982	1000	982
query37	144	72	73	72
query38	2910	2833	2825	2825
query39	902	871	833	833
query40	203	140	129	129
query41	54	53	57	53
query42	110	99	107	99
query43	598	552	558	552
query44	1095	705	729	705
query45	183	165	165	165
query46	1070	741	725	725
query47	1844	1745	1739	1739
query48	379	295	296	295
query49	817	405	427	405
query50	766	389	392	389
query51	6849	6660	6729	6660
query52	105	88	92	88
query53	357	292	292	292
query54	867	436	462	436
query55	77	74	73	73
query56	273	264	253	253
query57	1095	1014	1030	1014
query58	235	244	247	244
query59	3498	3328	3191	3191
query60	297	270	271	270
query61	91	89	91	89
query62	606	440	466	440
query63	315	292	302	292
query64	8546	2233	1728	1728
query65	3189	3079	3102	3079
query66	737	323	326	323
query67	15535	15160	14894	14894
query68	4626	542	545	542
query69	610	461	377	377
query70	1185	1156	1139	1139
query71	435	275	273	273
query72	7170	5373	5745	5373
query73	765	317	322	317
query74	5777	5448	5393	5393
query75	3402	2686	2664	2664
query76	2875	941	874	874
query77	617	300	287	287
query78	10369	9815	9669	9669
query79	2249	513	514	513
query80	1533	463	455	455
query81	589	211	219	211
query82	765	102	99	99
query83	260	170	162	162
query84	258	87	86	86
query85	1249	277	284	277
query86	466	320	304	304
query87	3234	3102	3043	3043
query88	3773	2343	2341	2341
query89	480	394	393	393
query90	1722	185	194	185
query91	127	99	94	94
query92	58	51	49	49
query93	1902	494	493	493
query94	1036	182	186	182
query95	394	317	311	311
query96	589	266	266	266
query97	3257	3061	3051	3051
query98	224	192	190	190
query99	1309	829	846	829
Total cold run time: 269806 ms
Total hot run time: 173455 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f15731d317de1249758f759b3e9307eda422dd50, data reload: false

query1	0.05	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.06	0.06
query4	1.66	0.06	0.08
query5	0.49	0.49	0.48
query6	1.12	0.73	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.50	0.49
query10	0.56	0.56	0.55
query11	0.16	0.12	0.12
query12	0.14	0.12	0.12
query13	0.59	0.58	0.60
query14	0.78	0.78	0.77
query15	0.82	0.80	0.82
query16	0.37	0.34	0.36
query17	0.99	1.01	0.99
query18	0.23	0.27	0.22
query19	1.79	1.75	1.70
query20	0.02	0.01	0.01
query21	15.40	0.64	0.64
query22	4.46	6.88	2.00
query23	18.31	1.38	1.20
query24	2.13	0.22	0.22
query25	0.15	0.10	0.09
query26	0.26	0.19	0.17
query27	0.09	0.08	0.09
query28	13.30	1.00	1.00
query29	12.64	3.31	3.26
query30	0.26	0.06	0.06
query31	2.86	0.39	0.38
query32	3.28	0.47	0.47
query33	2.89	2.91	2.87
query34	17.27	4.52	4.52
query35	4.56	4.54	4.56
query36	0.67	0.45	0.47
query37	0.18	0.15	0.15
query38	0.16	0.14	0.14
query39	0.05	0.03	0.04
query40	0.16	0.14	0.15
query41	0.10	0.05	0.05
query42	0.05	0.04	0.04
query43	0.04	0.03	0.04
Total cold run time: 109.96 s
Total hot run time: 30.65 s

@keanji-x keanji-x force-pushed the add_agg_push_foreign branch from f15731d to fb93934 Compare June 21, 2024 07:25
@keanji-x
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 27, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 40274 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 08c0b69e4584c271b25371ff89dc2bebbd6ed8eb, data reload: false

------ Round 1 ----------------------------------
q1	17588	4337	4248	4248
q2	2015	196	200	196
q3	10461	1255	1118	1118
q4	10195	816	799	799
q5	7480	2657	2597	2597
q6	215	136	131	131
q7	934	595	613	595
q8	9225	2064	2027	2027
q9	8950	6507	6419	6419
q10	8845	3645	3726	3645
q11	451	243	240	240
q12	415	234	225	225
q13	17774	3002	2962	2962
q14	260	228	224	224
q15	514	474	472	472
q16	483	369	367	367
q17	965	641	634	634
q18	7989	7435	7411	7411
q19	5139	1460	1445	1445
q20	644	309	320	309
q21	4867	3874	3921	3874
q22	393	336	338	336
Total cold run time: 115802 ms
Total hot run time: 40274 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4370	4186	4238	4186
q2	363	257	261	257
q3	2983	2771	2846	2771
q4	2006	1746	1644	1644
q5	5646	5527	5493	5493
q6	225	133	127	127
q7	2203	1834	1899	1834
q8	3231	3391	3373	3373
q9	8706	8662	8779	8662
q10	4076	3981	3765	3765
q11	576	490	505	490
q12	789	649	639	639
q13	16307	3155	3162	3155
q14	303	272	279	272
q15	552	497	504	497
q16	476	433	433	433
q17	1817	1483	1482	1482
q18	8017	8004	7766	7766
q19	1832	1577	1424	1424
q20	2105	1893	1880	1880
q21	7958	4967	4984	4967
q22	647	564	553	553
Total cold run time: 75188 ms
Total hot run time: 55670 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173730 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 08c0b69e4584c271b25371ff89dc2bebbd6ed8eb, data reload: false

query1	906	391	377	377
query2	6478	2490	2288	2288
query3	6636	214	215	214
query4	20580	17482	17331	17331
query5	3626	480	475	475
query6	261	165	158	158
query7	4588	291	283	283
query8	328	274	270	270
query9	8629	2340	2326	2326
query10	549	304	275	275
query11	10634	9898	9939	9898
query12	114	84	86	84
query13	1647	369	365	365
query14	10246	7502	6696	6696
query15	232	184	186	184
query16	7967	271	261	261
query17	1908	536	521	521
query18	2047	272	266	266
query19	200	149	152	149
query20	88	82	84	82
query21	207	134	124	124
query22	4345	4100	3898	3898
query23	33928	33510	33579	33510
query24	10959	2881	2940	2881
query25	585	372	364	364
query26	709	164	155	155
query27	2256	334	316	316
query28	5824	2096	2101	2096
query29	877	632	609	609
query30	263	167	157	157
query31	934	780	776	776
query32	93	58	56	56
query33	754	289	279	279
query34	946	474	489	474
query35	772	637	611	611
query36	1122	982	971	971
query37	149	75	75	75
query38	3010	2917	2835	2835
query39	880	820	829	820
query40	207	136	125	125
query41	59	53	57	53
query42	110	97	99	97
query43	607	543	550	543
query44	1159	742	734	734
query45	192	175	167	167
query46	1081	732	717	717
query47	1888	1801	1789	1789
query48	362	300	301	300
query49	860	408	423	408
query50	779	385	376	376
query51	6949	6888	6734	6734
query52	106	88	96	88
query53	362	292	288	288
query54	861	441	445	441
query55	75	76	74	74
query56	285	254	252	252
query57	1100	1069	1057	1057
query58	259	262	243	243
query59	3425	3447	3077	3077
query60	290	278	275	275
query61	97	91	94	91
query62	615	452	441	441
query63	311	286	291	286
query64	8538	2254	1742	1742
query65	3154	3096	3103	3096
query66	748	332	338	332
query67	15427	15055	15002	15002
query68	4450	542	530	530
query69	593	434	396	396
query70	1192	1125	1115	1115
query71	383	272	262	262
query72	7221	5630	5775	5630
query73	748	328	328	328
query74	5897	5578	5538	5538
query75	3337	2682	2683	2682
query76	2170	982	922	922
query77	482	305	306	305
query78	10484	9995	9786	9786
query79	2740	511	506	506
query80	2581	490	492	490
query81	571	225	221	221
query82	1012	105	103	103
query83	329	201	173	173
query84	274	93	96	93
query85	1174	340	325	325
query86	444	312	340	312
query87	3233	3116	3120	3116
query88	3838	2452	2450	2450
query89	476	375	394	375
query90	1642	189	190	189
query91	139	110	112	110
query92	60	51	54	51
query93	1952	500	491	491
query94	1085	195	197	195
query95	411	319	325	319
query96	586	273	269	269
query97	3219	3067	3102	3067
query98	220	204	198	198
query99	1154	845	864	845
Total cold run time: 270280 ms
Total hot run time: 173730 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 08c0b69e4584c271b25371ff89dc2bebbd6ed8eb, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.05
query3	0.23	0.05	0.06
query4	1.69	0.09	0.08
query5	0.50	0.49	0.49
query6	1.14	0.72	0.73
query7	0.02	0.02	0.01
query8	0.05	0.05	0.04
query9	0.56	0.49	0.50
query10	0.54	0.56	0.53
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.58	0.58	0.60
query14	0.76	0.78	0.77
query15	0.83	0.82	0.81
query16	0.36	0.36	0.37
query17	1.03	1.05	0.98
query18	0.23	0.25	0.24
query19	1.79	1.67	1.66
query20	0.02	0.01	0.01
query21	15.45	0.74	0.66
query22	4.56	7.07	2.06
query23	18.25	1.34	1.24
query24	2.06	0.23	0.20
query25	0.14	0.09	0.08
query26	0.26	0.17	0.18
query27	0.08	0.08	0.07
query28	13.30	1.01	1.01
query29	12.65	3.29	3.25
query30	0.25	0.06	0.05
query31	2.87	0.38	0.39
query32	3.28	0.46	0.48
query33	2.86	2.89	2.90
query34	16.99	4.46	4.52
query35	4.54	4.51	4.53
query36	0.66	0.48	0.46
query37	0.17	0.15	0.15
query38	0.15	0.14	0.15
query39	0.04	0.03	0.03
query40	0.18	0.16	0.16
query41	0.09	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.68 s
Total hot run time: 30.69 s

}

/**
* This class flattens nested join clusters and optimizes aggregation pushdown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently it is a customized support of rbo-level join reorder. In the future we may extend more common utils to support other scenarios, such as over-64 join reorder by heristic methods.

dataroaring pushed a commit that referenced this pull request Jun 28, 2024
1. get arrow flight result schema use query id instead of instance id.
2. get arrow flight result is a sync method, need wait for data ready
and return result, introduced by #36035 36667.
TODO, waiting for data will block pipeline, so use a request pool to
save requests waiting for data.
@keanji-x
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39547 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7fe6542fc6cfa6d6f86cac2cb381911ae5249a03, data reload: false

------ Round 1 ----------------------------------
q1	17640	4287	4195	4195
q2	2018	193	193	193
q3	10454	1263	1109	1109
q4	10190	776	840	776
q5	7476	2674	2575	2575
q6	219	136	138	136
q7	949	594	603	594
q8	9223	2066	2072	2066
q9	8949	6491	6463	6463
q10	8910	3716	3711	3711
q11	463	233	237	233
q12	444	233	228	228
q13	17778	2997	3001	2997
q14	275	216	222	216
q15	518	477	470	470
q16	495	380	368	368
q17	965	716	662	662
q18	7966	7501	7357	7357
q19	5223	1532	1427	1427
q20	641	308	317	308
q21	4846	3851	3132	3132
q22	396	341	331	331
Total cold run time: 116038 ms
Total hot run time: 39547 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4348	4253	4187	4187
q2	375	279	258	258
q3	3001	2695	2882	2695
q4	1977	1744	1758	1744
q5	5634	5516	5493	5493
q6	231	130	131	130
q7	2200	1878	1857	1857
q8	3280	3402	3455	3402
q9	8717	8659	8824	8659
q10	4123	3915	3647	3647
q11	581	503	499	499
q12	803	650	637	637
q13	16001	3165	3173	3165
q14	312	275	284	275
q15	525	483	469	469
q16	493	429	418	418
q17	1829	1508	1510	1508
q18	8044	7851	7804	7804
q19	1844	1605	1623	1605
q20	3027	1957	1843	1843
q21	9325	4970	4841	4841
q22	647	530	538	530
Total cold run time: 77317 ms
Total hot run time: 55666 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173561 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7fe6542fc6cfa6d6f86cac2cb381911ae5249a03, data reload: false

query1	917	388	374	374
query2	6450	2506	2437	2437
query3	6642	207	213	207
query4	18856	17395	17258	17258
query5	3650	485	466	466
query6	276	168	171	168
query7	4589	304	284	284
query8	320	305	312	305
query9	8515	2402	2378	2378
query10	572	309	288	288
query11	10544	10001	10204	10001
query12	119	89	86	86
query13	1649	381	367	367
query14	10205	7077	6283	6283
query15	220	185	182	182
query16	7796	267	268	267
query17	1885	544	511	511
query18	1947	260	273	260
query19	196	148	149	148
query20	91	81	81	81
query21	205	141	122	122
query22	4309	4138	3971	3971
query23	33699	33555	33667	33555
query24	11118	2963	2937	2937
query25	615	382	374	374
query26	975	159	179	159
query27	2336	318	330	318
query28	7018	2150	2118	2118
query29	890	647	626	626
query30	252	161	158	158
query31	976	787	760	760
query32	96	53	59	53
query33	761	307	277	277
query34	1004	481	502	481
query35	746	640	632	632
query36	1167	976	985	976
query37	142	75	71	71
query38	2966	2920	2839	2839
query39	869	821	837	821
query40	219	140	125	125
query41	53	50	56	50
query42	107	101	105	101
query43	599	561	552	552
query44	1174	727	725	725
query45	190	167	166	166
query46	1070	754	708	708
query47	1851	1763	1759	1759
query48	375	291	291	291
query49	835	421	417	417
query50	760	383	392	383
query51	6870	6805	6763	6763
query52	97	97	88	88
query53	364	294	284	284
query54	874	439	436	436
query55	73	74	73	73
query56	277	271	283	271
query57	1144	1068	1074	1068
query58	239	246	241	241
query59	3509	3537	3196	3196
query60	288	272	282	272
query61	104	92	93	92
query62	632	451	437	437
query63	321	293	286	286
query64	8634	2242	1827	1827
query65	3172	3082	3081	3081
query66	741	327	324	324
query67	15614	15026	14894	14894
query68	5252	544	538	538
query69	635	437	385	385
query70	1193	1111	1150	1111
query71	447	263	279	263
query72	7252	5714	5685	5685
query73	771	323	331	323
query74	5920	5530	5478	5478
query75	3796	2680	2663	2663
query76	3198	1004	873	873
query77	630	296	286	286
query78	10318	9716	9826	9716
query79	2422	520	518	518
query80	1968	470	473	470
query81	596	216	221	216
query82	1427	108	104	104
query83	325	173	178	173
query84	275	84	90	84
query85	1424	286	315	286
query86	465	332	288	288
query87	3261	3072	3094	3072
query88	4007	2368	2366	2366
query89	464	380	394	380
query90	1718	189	194	189
query91	138	110	114	110
query92	58	53	54	53
query93	2318	514	515	514
query94	1102	201	200	200
query95	421	330	328	328
query96	594	274	269	269
query97	3248	3049	3099	3049
query98	225	207	203	203
query99	1281	859	861	859
Total cold run time: 272918 ms
Total hot run time: 173561 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7fe6542fc6cfa6d6f86cac2cb381911ae5249a03, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.04	0.04
query3	0.22	0.05	0.06
query4	1.65	0.09	0.11
query5	0.50	0.52	0.50
query6	1.14	0.73	0.73
query7	0.02	0.02	0.01
query8	0.06	0.05	0.04
query9	0.54	0.49	0.49
query10	0.55	0.53	0.54
query11	0.15	0.12	0.11
query12	0.15	0.12	0.12
query13	0.59	0.59	0.60
query14	0.77	0.80	0.78
query15	0.85	0.81	0.82
query16	0.37	0.34	0.36
query17	1.02	1.00	1.04
query18	0.22	0.24	0.23
query19	1.86	1.76	1.68
query20	0.01	0.01	0.02
query21	15.43	0.74	0.65
query22	4.03	6.93	2.40
query23	18.26	1.38	1.31
query24	2.22	0.22	0.22
query25	0.14	0.09	0.08
query26	0.26	0.18	0.18
query27	0.08	0.08	0.07
query28	13.24	1.02	1.02
query29	12.62	3.27	3.26
query30	0.26	0.06	0.05
query31	2.87	0.40	0.41
query32	3.26	0.46	0.48
query33	2.91	2.94	2.88
query34	17.01	4.39	4.42
query35	4.52	4.51	4.52
query36	0.66	0.46	0.46
query37	0.18	0.16	0.16
query38	0.15	0.14	0.15
query39	0.05	0.03	0.03
query40	0.18	0.15	0.13
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.03
Total cold run time: 109.32 s
Total hot run time: 31.11 s

@keanji-x keanji-x merged commit 6889225 into apache:master Jul 1, 2024
dataroaring pushed a commit that referenced this pull request Jul 2, 2024
…in on foreign key (#36035)

## Proposed changes

This PR optimizes query performance by pushing down aggregations through
joins when grouped by a foreign key. This adjustment reduces data
processing overhead above the join, improving both speed and resource
efficiency.

Transformation Example:

Before Optimization:
```
Aggregation(group by fk)
     |
   Join(pk = fk)
   /  \
  pk  fk
```
After Optimization:
```
 Join(pk = fk)
 /     \
pk  Aggregation(group by fk)
       |
      fk
```
@morrySnow morrySnow added the need more test Add more test label Jul 5, 2024
keanji-x added a commit that referenced this pull request Jul 8, 2024
#37343)

intro by #36035

This PR refines the LogicalJoin class by introducing robust input
validation. Key improvements:

* Implement precise checks for join input validity
* Ensure consistency between input slots and output sets
* Gracefully handle various join scenarios (left/right)

These enhancements bolster query integrity and optimize join operations.
xinyiZzz added a commit to xinyiZzz/incubator-doris that referenced this pull request Jul 12, 2024
1. get arrow flight result schema use query id instead of instance id.
2. get arrow flight result is a sync method, need wait for data ready
and return result, introduced by apache#36035 36667.
TODO, waiting for data will block pipeline, so use a request pool to
save requests waiting for data.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
#37343)

intro by #36035

This PR refines the LogicalJoin class by introducing robust input
validation. Key improvements:

* Implement precise checks for join input validity
* Ensure consistency between input slots and output sets
* Gracefully handle various join scenarios (left/right)

These enhancements bolster query integrity and optimize join operations.
morrySnow pushed a commit that referenced this pull request Jan 9, 2026
Related PR: #36035

Problem Summary:
The key of the aggregation must include the primary key of the primary
key table (or contain a unique key that can form a bijection with the
primary key) to push the aggregation to the foreign key table.
Before this pr, doris have wrong results  in this situation:

drop table if exists customer_test;
drop table if exists store_sales_test;

CREATE TABLE customer_test (
    c_customer_sk INT not null ,
    c_first_name VARCHAR(50),
    c_last_name VARCHAR(50)
);

CREATE TABLE store_sales_test (
    ss_customer_sk INT,
    ss_date DATE
);

INSERT INTO customer_test VALUES (1, 'John', 'Smith');
INSERT INTO customer_test VALUES (2, 'John', 'Smith');  

INSERT INTO store_sales_test VALUES (1, '2024-01-01');
INSERT INTO store_sales_test VALUES (2, '2024-01-01');

alter table customer_test add constraint c_pk primary key (c_customer_sk);
alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk);
show constraints from customer_test;
show constraints from store_sales_test;

SELECT DISTINCT c_last_name, c_first_name, ss_date
FROM store_sales_test inner join customer_test
on store_sales_test.ss_customer_sk = customer_test.c_customer_sk;

set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK';
set disable_nereids_rules='';

Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result
with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr.
This is because AGG (group by c_last_name, c_first_name, ss_date) should
not be pushed down below the JOIN operation.
The original transform was:

Agg(group by c_last_name, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

This is an incorrect rewrite because it is not equivalent.
This pr corrects the rewrite, allowing the aggregation to be pushed down
below the join only when there is a bijective relationship between the
group by key from the primary table and the fields in the foreign table
(a functional dependency exists from a to b, and also from b to a, then
a and b have a bijective relationship).
For example,

Agg(group by c_customer_sk, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join(c_customer_sk=ss_customer_sk)
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

Since c_customer_sk is the primary key, c_first_name in the group by
clause can be removed (based on functional dependencies).
Furthermore, due to the equality relationship c_customer_sk =
ss_customer_sk, there is a bijective relationship between c_customer_sk
and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can
be replaced with `group by ss_customer_sk, ss_date`.
The aggregation group by key is entirely replaced with the output of the
foreign table. Since a primary key-foreign key join does not expand the
rows of the foreign table,In this situation, the aggregation can be
pushed down.
github-actions bot pushed a commit that referenced this pull request Jan 9, 2026
Related PR: #36035

Problem Summary:
The key of the aggregation must include the primary key of the primary
key table (or contain a unique key that can form a bijection with the
primary key) to push the aggregation to the foreign key table.
Before this pr, doris have wrong results  in this situation:

drop table if exists customer_test;
drop table if exists store_sales_test;

CREATE TABLE customer_test (
    c_customer_sk INT not null ,
    c_first_name VARCHAR(50),
    c_last_name VARCHAR(50)
);

CREATE TABLE store_sales_test (
    ss_customer_sk INT,
    ss_date DATE
);

INSERT INTO customer_test VALUES (1, 'John', 'Smith');
INSERT INTO customer_test VALUES (2, 'John', 'Smith');  

INSERT INTO store_sales_test VALUES (1, '2024-01-01');
INSERT INTO store_sales_test VALUES (2, '2024-01-01');

alter table customer_test add constraint c_pk primary key (c_customer_sk);
alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk);
show constraints from customer_test;
show constraints from store_sales_test;

SELECT DISTINCT c_last_name, c_first_name, ss_date
FROM store_sales_test inner join customer_test
on store_sales_test.ss_customer_sk = customer_test.c_customer_sk;

set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK';
set disable_nereids_rules='';

Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result
with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr.
This is because AGG (group by c_last_name, c_first_name, ss_date) should
not be pushed down below the JOIN operation.
The original transform was:

Agg(group by c_last_name, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

This is an incorrect rewrite because it is not equivalent.
This pr corrects the rewrite, allowing the aggregation to be pushed down
below the join only when there is a bijective relationship between the
group by key from the primary table and the fields in the foreign table
(a functional dependency exists from a to b, and also from b to a, then
a and b have a bijective relationship).
For example,

Agg(group by c_customer_sk, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join(c_customer_sk=ss_customer_sk)
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

Since c_customer_sk is the primary key, c_first_name in the group by
clause can be removed (based on functional dependencies).
Furthermore, due to the equality relationship c_customer_sk =
ss_customer_sk, there is a bijective relationship between c_customer_sk
and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can
be replaced with `group by ss_customer_sk, ss_date`.
The aggregation group by key is entirely replaced with the output of the
foreign table. Since a primary key-foreign key join does not expand the
rows of the foreign table,In this situation, the aggregation can be
pushed down.
zzzxl1993 pushed a commit to zzzxl1993/doris that referenced this pull request Jan 13, 2026
…#59498)

Related PR: apache#36035

Problem Summary:
The key of the aggregation must include the primary key of the primary
key table (or contain a unique key that can form a bijection with the
primary key) to push the aggregation to the foreign key table.
Before this pr, doris have wrong results  in this situation:

drop table if exists customer_test;
drop table if exists store_sales_test;

CREATE TABLE customer_test (
    c_customer_sk INT not null ,
    c_first_name VARCHAR(50),
    c_last_name VARCHAR(50)
);

CREATE TABLE store_sales_test (
    ss_customer_sk INT,
    ss_date DATE
);

INSERT INTO customer_test VALUES (1, 'John', 'Smith');
INSERT INTO customer_test VALUES (2, 'John', 'Smith');  

INSERT INTO store_sales_test VALUES (1, '2024-01-01');
INSERT INTO store_sales_test VALUES (2, '2024-01-01');

alter table customer_test add constraint c_pk primary key (c_customer_sk);
alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk);
show constraints from customer_test;
show constraints from store_sales_test;

SELECT DISTINCT c_last_name, c_first_name, ss_date
FROM store_sales_test inner join customer_test
on store_sales_test.ss_customer_sk = customer_test.c_customer_sk;

set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK';
set disable_nereids_rules='';

Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result
with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr.
This is because AGG (group by c_last_name, c_first_name, ss_date) should
not be pushed down below the JOIN operation.
The original transform was:

Agg(group by c_last_name, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

This is an incorrect rewrite because it is not equivalent.
This pr corrects the rewrite, allowing the aggregation to be pushed down
below the join only when there is a bijective relationship between the
group by key from the primary table and the fields in the foreign table
(a functional dependency exists from a to b, and also from b to a, then
a and b have a bijective relationship).
For example,

Agg(group by c_customer_sk, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join(c_customer_sk=ss_customer_sk)
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

Since c_customer_sk is the primary key, c_first_name in the group by
clause can be removed (based on functional dependencies).
Furthermore, due to the equality relationship c_customer_sk =
ss_customer_sk, there is a bijective relationship between c_customer_sk
and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can
be replaced with `group by ss_customer_sk, ss_date`.
The aggregation group by key is entirely replaced with the output of the
foreign table. Since a primary key-foreign key join does not expand the
rows of the foreign table,In this situation, the aggregation can be
pushed down.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.0-merged need more test Add more test not-merge/2.1 reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants