Skip to content

Conversation

@yujun777
Copy link
Contributor

@yujun777 yujun777 commented Jan 20, 2026

intro by #59116

BE requires that the repeat node's output slot order should be inconsistent with its input expressions.
That is output slots = input expressions + GroupingID + other grouping functions.
But physical translator not ensure this requirement. Then sometimes the repeat may have bad cast exception.

for sql:

SELECT 100000
FROM db2.table_9_50_undef_partitions2_keys3_properties4_distributed_by5
GROUP BY GROUPING SETS (
        (col_datetime_6__undef_signed, col_varchar_50__undef_signed)
        , ()
        , (col_varchar_50__undef_signed)
        , (col_datetime_6__undef_signed, col_varchar_50__undef_signed)
);

the above sql will have wrong ouput slot order:

screenshot-1

then BE will have exceptions:

(1105, 'errCode = 2, detailMessage = (172.20.57.146)
[E-7412]assert cast err:[E-7412] Bad cast from type:doris::vectorized::ColumnVector<(doris::PrimitiveType)26> to doris::vectorized::ColumnStr<unsigned int>
0#doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&, bool) at /home/zcp/repo_center/doris_master/doris/be/src/common/exception.cpp:0\n\t1#
...

relate PR: #59116 may cause repeat's group by expression order not match repeat's output expression

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 20, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31367 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d22d3613c67f56a8a04958f6a7493ecb5004a4c1, data reload: false

------ Round 1 ----------------------------------
q1	17677	4268	4095	4095
q2	2074	357	262	262
q3	10083	1271	709	709
q4	10202	823	295	295
q5	7517	2047	1855	1855
q6	186	169	134	134
q7	913	772	667	667
q8	9274	1370	1130	1130
q9	4881	4587	4558	4558
q10	6792	1798	1391	1391
q11	531	299	294	294
q12	719	733	582	582
q13	17773	3793	3062	3062
q14	285	288	291	288
q15	584	506	525	506
q16	693	666	635	635
q17	647	777	513	513
q18	6952	6374	6486	6374
q19	1093	985	614	614
q20	378	350	249	249
q21	3003	2467	2200	2200
q22	1015	1019	954	954
Total cold run time: 103272 ms
Total hot run time: 31367 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4160	4050	4050	4050
q2	330	407	311	311
q3	2072	2581	2202	2202
q4	1307	1740	1333	1333
q5	4090	3949	4016	3949
q6	209	172	130	130
q7	1848	1809	1705	1705
q8	2867	2614	2461	2461
q9	7690	7281	7233	7233
q10	2488	2801	2405	2405
q11	555	484	483	483
q12	745	828	689	689
q13	3794	4223	3763	3763
q14	299	313	290	290
q15	535	514	495	495
q16	640	708	627	627
q17	1181	1330	1564	1330
q18	7992	7735	7803	7735
q19	856	890	811	811
q20	2210	2087	1950	1950
q21	4659	4133	4046	4046
q22	1079	1007	966	966
Total cold run time: 51606 ms
Total hot run time: 48964 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173815 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d22d3613c67f56a8a04958f6a7493ecb5004a4c1, data reload: false

query5	4416	619	466	466
query6	314	214	201	201
query7	4202	453	259	259
query8	345	248	238	238
query9	8671	2875	2877	2875
query10	500	358	326	326
query11	15313	15112	14902	14902
query12	181	115	115	115
query13	1264	497	385	385
query14	6328	3008	2757	2757
query14_1	2644	2605	2628	2605
query15	208	196	174	174
query16	988	466	461	461
query17	1088	660	561	561
query18	2454	427	338	338
query19	220	220	196	196
query20	125	116	111	111
query21	210	140	118	118
query22	3825	3955	4129	3955
query23	15919	15723	15326	15326
query23_1	15481	15465	15351	15351
query24	7147	1555	1184	1184
query24_1	1177	1165	1159	1159
query25	543	447	414	414
query26	1243	270	148	148
query27	2768	438	287	287
query28	4581	2170	2163	2163
query29	782	539	428	428
query30	312	243	210	210
query31	801	631	544	544
query32	83	78	75	75
query33	532	360	304	304
query34	928	879	539	539
query35	732	761	680	680
query36	899	897	740	740
query37	136	94	81	81
query38	2712	2684	2628	2628
query39	780	745	749	745
query39_1	723	721	694	694
query40	221	179	113	113
query41	63	79	74	74
query42	108	102	101	101
query43	470	438	433	433
query44	1317	745	733	733
query45	194	187	180	180
query46	829	939	569	569
query47	1440	1473	1347	1347
query48	313	313	238	238
query49	608	443	341	341
query50	629	273	203	203
query51	3760	3757	3778	3757
query52	106	108	100	100
query53	288	330	269	269
query54	282	256	251	251
query55	82	78	80	78
query56	324	306	299	299
query57	1038	975	919	919
query58	267	253	266	253
query59	1992	2039	2041	2039
query60	337	323	304	304
query61	146	139	149	139
query62	374	349	320	320
query63	288	259	258	258
query64	4913	1260	956	956
query65	3853	3736	3734	3734
query66	1483	436	304	304
query67	15654	15656	15409	15409
query68	2400	1094	761	761
query69	437	366	315	315
query70	986	856	941	856
query71	318	308	281	281
query72	5270	3410	3221	3221
query73	600	723	307	307
query74	8743	8855	8607	8607
query75	3008	2817	2476	2476
query76	2278	1041	650	650
query77	353	374	307	307
query78	9716	9881	9189	9189
query79	1106	907	596	596
query80	1299	573	482	482
query81	538	268	238	238
query82	1008	143	111	111
query83	316	246	243	243
query84	255	109	88	88
query85	888	486	423	423
query86	408	320	280	280
query87	2846	2862	2772	2772
query88	3484	2575	2558	2558
query89	390	342	324	324
query90	1984	175	161	161
query91	173	157	133	133
query92	77	70	64	64
query93	1042	924	536	536
query94	664	317	277	277
query95	581	406	307	307
query96	638	494	228	228
query97	2364	2411	2330	2330
query98	209	200	195	195
query99	588	586	510	510
Total cold run time: 247163 ms
Total hot run time: 173815 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d22d3613c67f56a8a04958f6a7493ecb5004a4c1, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.08
query4	1.60	0.12	0.11
query5	0.27	0.26	0.25
query6	1.16	0.67	0.65
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.58	0.51	0.50
query10	0.57	0.55	0.55
query11	0.14	0.10	0.10
query12	0.14	0.11	0.10
query13	0.61	0.60	0.59
query14	0.95	0.96	0.96
query15	0.79	0.77	0.77
query16	0.40	0.40	0.40
query17	1.04	1.06	0.98
query18	0.23	0.21	0.21
query19	2.01	1.83	1.79
query20	0.02	0.02	0.01
query21	15.45	0.29	0.14
query22	5.30	0.04	0.05
query23	16.13	0.30	0.10
query24	1.54	0.26	0.54
query25	0.10	0.05	0.09
query26	0.15	0.15	0.14
query27	0.11	0.08	0.06
query28	4.64	1.08	0.87
query29	12.62	3.87	3.12
query30	0.27	0.14	0.13
query31	2.80	0.61	0.39
query32	3.25	0.56	0.46
query33	3.06	2.99	3.04
query34	16.29	5.04	4.42
query35	4.51	4.46	4.46
query36	0.66	0.50	0.49
query37	0.10	0.07	0.06
query38	0.07	0.04	0.04
query39	0.04	0.04	0.04
query40	0.17	0.14	0.12
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 98.44 s
Total hot run time: 26.69 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (23/23) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (23/23) 🎉
Increment coverage report
Complete coverage report

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical bug in the Repeat node's output slot ordering in the Nereids planner. The BE requires that the repeat node's output slots follow a specific order: input expressions + GroupingID + other grouping functions. The previous physical translator implementation did not guarantee this ordering, which caused bad cast exceptions in the backend when executing queries with GROUPING SETS.

Changes:

  • Refactored the PhysicalPlanTranslator to ensure correct output slot ordering for Repeat nodes
  • Added comprehensive regression tests covering the fixed scenarios
  • Fixed the logic to explicitly add group by expressions first, followed by aggregate function slots, then GroupingID, and finally grouping function slots

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java Refactored visitPhysicalRepeat method to ensure correct output slot ordering: group by expressions → aggregate function slots → grouping ID → grouping function slots
regression-test/suites/nereids_p0/repeat/test_repeat_output_slot.groovy Added regression test with two SQL queries using GROUPING SETS to verify the fix
regression-test/data/nereids_p0/repeat/test_repeat_output_slot.out Expected output for the regression tests, including query plans and results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@morrySnow morrySnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ut for translate repeat

@yujun777
Copy link
Contributor Author

run buildall

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31381 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d85660b3af1f2e89191debc707eb91591395319c, data reload: false

------ Round 1 ----------------------------------
q1	17655	4257	4038	4038
q2	2001	377	247	247
q3	10147	1289	711	711
q4	10206	856	313	313
q5	7497	2109	1855	1855
q6	179	164	136	136
q7	933	786	660	660
q8	9273	1365	1112	1112
q9	4977	4637	4560	4560
q10	6791	1808	1393	1393
q11	525	277	281	277
q12	721	745	562	562
q13	17779	3918	3097	3097
q14	291	292	271	271
q15	601	513	501	501
q16	679	683	649	649
q17	662	703	590	590
q18	7156	6571	6309	6309
q19	1378	996	632	632
q20	400	362	245	245
q21	3036	2530	2248	2248
q22	1029	1017	975	975
Total cold run time: 103916 ms
Total hot run time: 31381 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4108	4047	4038	4038
q2	338	408	346	346
q3	2083	2599	2221	2221
q4	1347	1759	1337	1337
q5	4050	4054	4156	4054
q6	202	166	131	131
q7	1928	1808	1883	1808
q8	2588	2500	2554	2500
q9	7242	7325	7239	7239
q10	2553	2691	2321	2321
q11	557	492	465	465
q12	705	767	586	586
q13	3694	4183	3739	3739
q14	274	294	280	280
q15	548	522	508	508
q16	636	689	671	671
q17	1154	1422	1432	1422
q18	8219	7649	7794	7649
q19	963	841	854	841
q20	1982	2085	1958	1958
q21	4778	4502	4247	4247
q22	1066	1003	939	939
Total cold run time: 51015 ms
Total hot run time: 49300 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173591 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d85660b3af1f2e89191debc707eb91591395319c, data reload: false

query5	4377	621	474	474
query6	326	222	212	212
query7	4219	460	254	254
query8	328	257	245	245
query9	8699	2859	2819	2819
query10	478	369	343	343
query11	15289	15271	15081	15081
query12	184	109	112	109
query13	1229	458	364	364
query14	6047	3075	2778	2778
query14_1	2762	2609	2651	2609
query15	205	192	175	175
query16	975	463	384	384
query17	1086	641	531	531
query18	2426	420	329	329
query19	228	221	186	186
query20	121	112	115	112
query21	208	133	116	116
query22	3857	4052	3951	3951
query23	15961	15702	15196	15196
query23_1	15453	15425	15471	15425
query24	7218	1517	1147	1147
query24_1	1168	1159	1153	1153
query25	501	438	383	383
query26	1234	258	146	146
query27	2782	445	270	270
query28	4627	2127	2121	2121
query29	728	496	405	405
query30	307	241	206	206
query31	792	632	568	568
query32	89	71	73	71
query33	528	334	295	295
query34	926	857	525	525
query35	708	760	690	690
query36	848	915	810	810
query37	136	100	86	86
query38	2723	2732	2682	2682
query39	759	740	743	740
query39_1	696	715	685	685
query40	221	132	120	120
query41	71	65	65	65
query42	100	104	106	104
query43	443	479	421	421
query44	1328	740	751	740
query45	193	184	183	183
query46	831	947	589	589
query47	1459	1411	1308	1308
query48	325	331	259	259
query49	612	429	345	345
query50	612	272	203	203
query51	3765	3794	3752	3752
query52	107	111	95	95
query53	288	327	270	270
query54	303	276	274	274
query55	80	81	82	81
query56	317	325	329	325
query57	1018	1000	987	987
query58	276	262	258	258
query59	2156	2034	2093	2034
query60	355	335	328	328
query61	172	165	166	165
query62	414	382	328	328
query63	301	271	267	267
query64	4900	1341	1019	1019
query65	3867	3745	3781	3745
query66	1477	427	343	343
query67	15511	15681	15506	15506
query68	2462	1107	764	764
query69	460	377	330	330
query70	1020	944	825	825
query71	337	320	301	301
query72	5304	3143	3228	3143
query73	597	738	320	320
query74	8736	8711	8547	8547
query75	2744	2791	2499	2499
query76	2252	1052	675	675
query77	364	383	300	300
query78	9774	9940	9176	9176
query79	1065	897	567	567
query80	1303	556	469	469
query81	547	262	237	237
query82	990	151	110	110
query83	327	253	242	242
query84	250	116	90	90
query85	901	477	422	422
query86	407	297	294	294
query87	2850	2899	2791	2791
query88	3492	2584	2554	2554
query89	379	355	321	321
query90	1977	172	178	172
query91	169	160	134	134
query92	74	76	70	70
query93	948	873	532	532
query94	635	323	292	292
query95	572	385	315	315
query96	636	505	228	228
query97	2367	2347	2354	2347
query98	222	198	214	198
query99	581	580	514	514
Total cold run time: 247234 ms
Total hot run time: 173591 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d85660b3af1f2e89191debc707eb91591395319c, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.05
query3	0.25	0.09	0.08
query4	1.61	0.11	0.11
query5	0.27	0.26	0.25
query6	1.15	0.67	0.64
query7	0.03	0.02	0.02
query8	0.06	0.04	0.04
query9	0.56	0.51	0.50
query10	0.55	0.54	0.54
query11	0.15	0.10	0.10
query12	0.14	0.10	0.11
query13	0.61	0.59	0.59
query14	0.95	0.95	0.96
query15	0.80	0.77	0.77
query16	0.40	0.41	0.43
query17	1.06	1.07	0.96
query18	0.23	0.22	0.21
query19	2.00	1.81	1.85
query20	0.02	0.01	0.01
query21	15.45	0.27	0.14
query22	5.30	0.05	0.05
query23	16.13	0.28	0.10
query24	1.44	0.24	0.75
query25	0.11	0.14	0.05
query26	0.13	0.14	0.13
query27	0.10	0.07	0.06
query28	5.01	1.05	0.88
query29	12.54	3.90	3.14
query30	0.28	0.14	0.12
query31	2.80	0.63	0.39
query32	3.24	0.56	0.45
query33	2.94	3.00	3.10
query34	15.79	5.05	4.38
query35	4.49	4.40	4.46
query36	0.68	0.51	0.49
query37	0.10	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.18	0.15	0.13
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 97.98 s
Total hot run time: 26.56 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (21/21) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 95.24% (20/21) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 22, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit eca6bbe into apache:master Jan 23, 2026
31 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 23, 2026
… input expression (#60045)

maybe relate PR: #21168

BE requires that the repeat node's output slot order should be
inconsistent with its input expressions.
That is output slots = input expressions + GroupingID + other grouping
functions.
But physical translator not ensure this requirement. Then sometimes the
repeat may have bad cast exception.

for sql:
SELECT 100000
FROM db2.table_9_50_undef_partitions2_keys3_properties4_distributed_by5
GROUP BY GROUPING SETS (
        (col_datetime_6__undef_signed, col_varchar_50__undef_signed)
        , ()
        , (col_varchar_50__undef_signed)
        , (col_datetime_6__undef_signed, col_varchar_50__undef_signed)
);

the above sql will have wrong ouput slot order

then BE will have exceptions:
(1105, 'errCode = 2, detailMessage = (172.20.57.146)
[E-7412]assert cast err:[E-7412] Bad cast from type:doris::vectorized::ColumnVector<(doris::PrimitiveType)26> to doris::vectorized::ColumnStr<unsigned int>
0#doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&, bool) at /home/zcp/repo_center/doris_master/doris/be/src/common/exception.cpp:0\n\t1#
...
yujun777 added a commit to yujun777/doris that referenced this pull request Jan 23, 2026
… input expression (apache#60045)

maybe relate PR: apache#21168

BE requires that the repeat node's output slot order should be
inconsistent with its input expressions.
That is output slots = input expressions + GroupingID + other grouping
functions.
But physical translator not ensure this requirement. Then sometimes the
repeat may have bad cast exception.

for sql:
SELECT 100000
FROM db2.table_9_50_undef_partitions2_keys3_properties4_distributed_by5
GROUP BY GROUPING SETS (
        (col_datetime_6__undef_signed, col_varchar_50__undef_signed)
        , ()
        , (col_varchar_50__undef_signed)
        , (col_datetime_6__undef_signed, col_varchar_50__undef_signed)
);

the above sql will have wrong ouput slot order

then BE will have exceptions:
(1105, 'errCode = 2, detailMessage = (172.20.57.146)
[E-7412]assert cast err:[E-7412] Bad cast from type:doris::vectorized::ColumnVector<(doris::PrimitiveType)26> to doris::vectorized::ColumnStr<unsigned int>
0#doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&, bool) at /home/zcp/repo_center/doris_master/doris/be/src/common/exception.cpp:0\n\t1#
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants