Skip to content

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Sep 14, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When users specify only SET expressions (e.g., 'kd01=20230102') without explicit file column mapping, the system incorrectly auto-generates file columns that include columns meant for SET expressions only.

Root cause:

  • when specifyFileFieldNames=false, the system auto-generates file columns for all table columns
  • However, columns with SET expressions (stored in userMappingColumns) should be skipped during auto-generation since they don't exist in the source file
  • The missing skip logic caused these SET-only columns to be treated as file columns

Issues caused:

  1. BE reports 'failed to find default value expr for slot' when SET columns are
    missing from source file
  2. Incorrect slot mapping leading to data quality errors
LOAD LABEL test_s3_load_82852cae_b5c2_463b_9336_a2f0cd767c81_1 (
    DATA INFILE("s3://bucket/basic_data.csv")
    INTO TABLE s3_load_with_set
    COLUMNS TERMINATED BY "|"
    LINES TERMINATED BY "\n"
    FORMAT AS "CSV"
    set(kd01=20240123)
)
WITH S3 ()
msg:quality not good enough to cancel

LOAD LABEL test_s3_load_4c0e7678_08cc_47cb_aa6d_7bfceab8090d_0 (
    DATA INFILE("s3://bucket/basic_data.orc")
    INTO TABLE s3_load_with_set
    FORMAT AS "orc"
    set(kd01=20240123)
)
WITH S3 ()
msg:errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]failed to find default value expr for slot: kd01.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Contributor Author

run buildall

dataroaring
dataroaring previously approved these changes Sep 14, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 14, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 34648 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 25fb3f27a0c4d58c08b7d618e6fa83ee6ca81520, data reload: false

------ Round 1 ----------------------------------
q1	17597	5143	5046	5046
q2	1996	321	208	208
q3	10239	1312	713	713
q4	10224	1015	532	532
q5	7527	2407	2331	2331
q6	197	172	142	142
q7	936	785	633	633
q8	9366	1333	1096	1096
q9	6901	5129	5174	5129
q10	6965	2392	1977	1977
q11	500	312	288	288
q12	374	372	238	238
q13	17765	3668	3030	3030
q14	241	241	212	212
q15	588	494	487	487
q16	1007	1000	938	938
q17	615	858	372	372
q18	7616	7272	7042	7042
q19	1239	950	578	578
q20	345	347	251	251
q21	3828	3225	2410	2410
q22	1065	1034	995	995
Total cold run time: 107131 ms
Total hot run time: 34648 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5219	5113	5102	5102
q2	248	330	234	234
q3	2198	2678	2313	2313
q4	1329	1840	1323	1323
q5	4221	4567	4572	4567
q6	236	172	134	134
q7	2013	2021	1864	1864
q8	2632	2683	2601	2601
q9	7270	7471	7302	7302
q10	3129	3337	2894	2894
q11	592	639	546	546
q12	687	772	642	642
q13	3441	3884	3302	3302
q14	301	318	282	282
q15	541	477	503	477
q16	1109	1093	1060	1060
q17	1181	1559	1401	1401
q18	8058	7546	7680	7546
q19	814	892	1099	892
q20	2013	2082	1936	1936
q21	4974	4399	4214	4214
q22	1089	1051	985	985
Total cold run time: 53295 ms
Total hot run time: 51617 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186012 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 25fb3f27a0c4d58c08b7d618e6fa83ee6ca81520, data reload: false

query1	1053	439	393	393
query2	6553	1679	1732	1679
query3	6754	219	214	214
query4	25983	23636	22926	22926
query5	4411	603	463	463
query6	339	231	236	231
query7	4698	531	309	309
query8	324	267	255	255
query9	8971	2744	2753	2744
query10	507	349	293	293
query11	15758	15036	15229	15036
query12	177	118	112	112
query13	1678	560	443	443
query14	11179	9197	9302	9197
query15	216	212	176	176
query16	7691	687	509	509
query17	1247	771	655	655
query18	2050	448	415	415
query19	201	198	181	181
query20	134	124	119	119
query21	210	132	113	113
query22	4088	4136	3992	3992
query23	34160	33183	33013	33013
query24	8460	2447	2417	2417
query25	561	505	438	438
query26	1244	272	162	162
query27	2711	508	351	351
query28	4328	2237	2216	2216
query29	775	617	480	480
query30	292	224	200	200
query31	919	803	729	729
query32	79	74	71	71
query33	607	377	317	317
query34	807	871	528	528
query35	808	822	765	765
query36	972	1025	905	905
query37	113	109	79	79
query38	3528	3523	3488	3488
query39	1506	1429	1447	1429
query40	221	132	121	121
query41	66	62	64	62
query42	126	120	116	116
query43	516	496	488	488
query44	1348	879	841	841
query45	191	193	179	179
query46	865	1013	642	642
query47	1804	1776	1733	1733
query48	395	413	325	325
query49	782	499	389	389
query50	653	691	409	409
query51	3909	4145	3863	3863
query52	113	112	105	105
query53	244	269	191	191
query54	602	598	529	529
query55	87	89	86	86
query56	329	328	307	307
query57	1181	1163	1149	1149
query58	275	272	267	267
query59	2547	2632	2543	2543
query60	332	337	322	322
query61	174	164	167	164
query62	802	721	699	699
query63	235	195	192	192
query64	4404	1168	854	854
query65	4037	3982	3955	3955
query66	1084	454	349	349
query67	15473	15345	15187	15187
query68	9418	919	584	584
query69	484	314	279	279
query70	1303	1323	1227	1227
query71	564	354	316	316
query72	5979	2638	5395	2638
query73	793	771	359	359
query74	9173	9028	8726	8726
query75	4447	3339	2741	2741
query76	3951	1163	743	743
query77	1007	423	315	315
query78	10151	9801	8839	8839
query79	1795	827	591	591
query80	727	585	501	501
query81	474	260	225	225
query82	238	172	135	135
query83	298	265	248	248
query84	304	105	98	98
query85	859	462	413	413
query86	336	319	312	312
query87	3726	3738	3650	3650
query88	2820	2237	2222	2222
query89	410	338	306	306
query90	2106	210	207	207
query91	162	180	136	136
query92	88	66	62	62
query93	1219	985	656	656
query94	689	423	340	340
query95	400	316	314	314
query96	477	577	278	278
query97	2941	3006	2887	2887
query98	237	216	204	204
query99	1412	1414	1350	1350
Total cold run time: 277875 ms
Total hot run time: 186012 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 25fb3f27a0c4d58c08b7d618e6fa83ee6ca81520, data reload: false

query1	0.06	0.04	0.05
query2	0.10	0.06	0.06
query3	0.26	0.08	0.09
query4	1.61	0.12	0.12
query5	0.28	0.28	0.25
query6	1.17	0.66	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.62	0.54	0.52
query10	0.57	0.58	0.57
query11	0.17	0.12	0.11
query12	0.16	0.11	0.12
query13	0.64	0.62	0.63
query14	1.05	1.05	1.05
query15	0.88	0.85	0.86
query16	0.40	0.39	0.42
query17	1.07	1.10	1.05
query18	0.22	0.19	0.20
query19	1.90	1.80	1.82
query20	0.02	0.01	0.02
query21	15.42	0.94	0.58
query22	0.77	1.21	0.78
query23	14.78	1.37	0.62
query24	6.93	1.48	0.38
query25	0.46	0.14	0.08
query26	0.59	0.16	0.14
query27	0.08	0.06	0.06
query28	10.12	0.95	0.45
query29	12.55	3.91	3.24
query30	0.30	0.14	0.13
query31	2.84	0.61	0.39
query32	3.25	0.58	0.48
query33	3.18	3.10	3.10
query34	16.03	5.51	4.86
query35	4.92	4.94	4.96
query36	0.72	0.51	0.50
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.03	0.04
query40	0.18	0.15	0.13
query41	0.09	0.03	0.03
query42	0.04	0.04	0.03
query43	0.04	0.04	0.04
Total cold run time: 104.78 s
Total hot run time: 29.65 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 85.71% (6/7) 🎉
Increment coverage report
Complete coverage report

@liaoxin01
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 15, 2025
@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/8) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 34846 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 698222fa59516790c25bd5c4773afaf3527d4393, data reload: false

------ Round 1 ----------------------------------
q1	17628	5119	5030	5030
q2	2019	325	222	222
q3	10256	1316	714	714
q4	10233	1036	546	546
q5	7566	2394	2405	2394
q6	183	174	139	139
q7	939	767	651	651
q8	9364	1335	1137	1137
q9	6987	5154	5185	5154
q10	6975	2408	1989	1989
q11	516	316	289	289
q12	375	414	241	241
q13	17789	3675	3034	3034
q14	243	243	219	219
q15	584	490	496	490
q16	1013	1010	948	948
q17	608	872	373	373
q18	7573	7233	7106	7106
q19	1101	954	566	566
q20	342	343	238	238
q21	3783	3223	2378	2378
q22	1062	1012	988	988
Total cold run time: 107139 ms
Total hot run time: 34846 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5125	5492	5088	5088
q2	300	328	229	229
q3	2201	2664	2273	2273
q4	1314	1784	1310	1310
q5	4210	4577	4548	4548
q6	222	181	132	132
q7	2039	2043	1961	1961
q8	2681	2559	2637	2559
q9	7549	7346	7296	7296
q10	3157	3324	2861	2861
q11	600	525	481	481
q12	712	749	628	628
q13	3452	4147	3227	3227
q14	280	334	268	268
q15	501	478	475	475
q16	1112	1152	1102	1102
q17	1237	1658	1434	1434
q18	7923	7677	7609	7609
q19	861	836	935	836
q20	1961	1944	1805	1805
q21	4626	4400	4170	4170
q22	1108	1054	998	998
Total cold run time: 53171 ms
Total hot run time: 51290 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187928 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 698222fa59516790c25bd5c4773afaf3527d4393, data reload: false

query1	1099	427	411	411
query2	6564	1703	1715	1703
query3	6747	223	223	223
query4	26202	23208	22886	22886
query5	4373	597	471	471
query6	342	255	226	226
query7	4646	505	307	307
query8	307	261	272	261
query9	8686	2658	2618	2618
query10	511	362	283	283
query11	15253	15081	14773	14773
query12	173	120	115	115
query13	1713	560	449	449
query14	11156	9112	9151	9112
query15	206	197	178	178
query16	7711	681	527	527
query17	1277	737	620	620
query18	2103	418	331	331
query19	209	201	170	170
query20	131	122	121	121
query21	213	136	142	136
query22	4189	4179	4114	4114
query23	34030	33053	32932	32932
query24	8507	2432	2417	2417
query25	601	529	445	445
query26	1241	281	162	162
query27	2727	514	366	366
query28	4400	2260	2219	2219
query29	768	624	489	489
query30	294	228	197	197
query31	905	814	729	729
query32	82	76	71	71
query33	621	378	347	347
query34	798	866	539	539
query35	837	831	754	754
query36	986	1034	901	901
query37	119	109	86	86
query38	3567	3525	3481	3481
query39	1488	1427	1456	1427
query40	224	128	121	121
query41	66	61	61	61
query42	123	112	124	112
query43	511	510	484	484
query44	1310	865	857	857
query45	186	175	179	175
query46	854	1007	678	678
query47	1762	1834	1757	1757
query48	396	428	318	318
query49	773	508	417	417
query50	646	702	403	403
query51	3957	3894	3874	3874
query52	120	115	105	105
query53	242	270	201	201
query54	605	590	546	546
query55	95	83	84	83
query56	349	328	287	287
query57	1175	1212	1108	1108
query58	280	268	271	268
query59	2619	2656	2478	2478
query60	325	329	333	329
query61	157	173	189	173
query62	818	761	671	671
query63	232	207	206	206
query64	4583	1265	945	945
query65	4047	3985	4007	3985
query66	1111	430	335	335
query67	15390	15401	15125	15125
query68	8739	922	585	585
query69	492	314	287	287
query70	1371	1232	1332	1232
query71	550	340	311	311
query72	6117	4993	5006	4993
query73	708	619	366	366
query74	8938	9171	8665	8665
query75	4082	3326	2785	2785
query76	3676	1168	749	749
query77	809	409	320	320
query78	9519	9746	8894	8894
query79	2061	842	618	618
query80	726	565	519	519
query81	484	259	224	224
query82	311	156	134	134
query83	297	270	256	256
query84	303	108	95	95
query85	864	468	435	435
query86	432	316	312	312
query87	3754	3851	3598	3598
query88	2817	2223	2197	2197
query89	421	320	292	292
query90	2088	209	213	209
query91	171	165	138	138
query92	82	65	60	60
query93	1776	1004	660	660
query94	689	453	341	341
query95	386	324	319	319
query96	560	580	274	274
query97	2944	3006	2858	2858
query98	230	221	232	221
query99	1433	1401	1282	1282
Total cold run time: 276568 ms
Total hot run time: 187928 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 698222fa59516790c25bd5c4773afaf3527d4393, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.06	0.05
query3	0.25	0.08	0.09
query4	1.60	0.11	0.12
query5	0.29	0.28	0.26
query6	1.16	0.69	0.68
query7	0.03	0.02	0.02
query8	0.05	0.04	0.04
query9	0.63	0.53	0.52
query10	0.58	0.58	0.58
query11	0.17	0.12	0.11
query12	0.15	0.12	0.12
query13	0.68	0.64	0.64
query14	1.04	1.06	1.06
query15	0.90	0.88	0.88
query16	0.43	0.42	0.41
query17	1.04	1.08	1.06
query18	0.22	0.20	0.20
query19	2.00	1.90	1.89
query20	0.02	0.01	0.01
query21	15.40	0.94	0.60
query22	0.76	1.21	0.61
query23	14.98	1.42	0.64
query24	7.81	0.85	0.71
query25	0.50	0.20	0.18
query26	0.60	0.16	0.13
query27	0.07	0.05	0.06
query28	9.02	0.96	0.45
query29	12.55	4.06	3.24
query30	0.28	0.14	0.11
query31	2.84	0.63	0.41
query32	3.26	0.59	0.50
query33	3.09	3.16	3.09
query34	16.17	5.66	4.92
query35	4.94	4.99	4.97
query36	0.70	0.52	0.52
query37	0.10	0.07	0.08
query38	0.07	0.05	0.05
query39	0.04	0.03	0.03
query40	0.19	0.16	0.15
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 104.93 s
Total hot run time: 30.3 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (8/8) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 15, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@sollhui sollhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit ce71c34 into apache:master Sep 16, 2025
32 of 34 checks passed
@liaoxin01 liaoxin01 deleted the fix_load_with_set branch September 16, 2025 04:38
Comment on lines +175 to +179
// Only track columns with constant expressions (e.g., "k1 = 'constant'")
// Non-constant expressions (e.g., "k1 = k1 + 1") still need to read from file
if (importColumnDesc.getExpr().isConstant()) {
constantMappingColumns.add(mappingColumnName);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于 column = now() 这种情况,这段逻辑可以正确处理么?

liaoxin01 pushed a commit that referenced this pull request Nov 27, 2025
### What problem does this PR solve?

Related PR: [#xxx](#56041)

Problem Summary:

improve  #56041

before this pr: we can only use `c1 = 72105107105`, `c2 = "ykiko"`,
these constant values in load.
after this pr : we can use `c1 = uuid()`, `c2 = abs(-2)+1`, `c3 =
now()`, these constant function.
github-actions bot pushed a commit that referenced this pull request Nov 27, 2025
### What problem does this PR solve?

Related PR: [#xxx](#56041)

Problem Summary:

improve  #56041

before this pr: we can only use `c1 = 72105107105`, `c2 = "ykiko"`,
these constant values in load.
after this pr : we can use `c1 = uuid()`, `c2 = abs(-2)+1`, `c3 =
now()`, these constant function.
nagisa-kunhah pushed a commit to nagisa-kunhah/doris that referenced this pull request Dec 14, 2025
### What problem does this PR solve?

Related PR: [#xxx](apache#56041)

Problem Summary:

improve  apache#56041

before this pr: we can only use `c1 = 72105107105`, `c2 = "ykiko"`,
these constant values in load.
after this pr : we can use `c1 = uuid()`, `c2 = abs(-2)+1`, `c3 =
now()`, these constant function.
dataroaring pushed a commit that referenced this pull request Dec 18, 2025
github-actions bot pushed a commit that referenced this pull request Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants