Skip to content

Conversation

@morningman
Copy link
Contributor

bp #49931

1.
In apache#49036, we only support hive serde dialect in BE side.
But some constant expr will be evaluated and output in FE side, need to
support it too.

2.
Refactor the method of getting string format value for all type of
literals in FE side.

There are 2 kind of string format value for literal. One is for Query,
the other is for Stream Load.
Here is some difference:

- NullLiteral
    For query, it should be `null`. For load, it should be `\N`.

- StructLiteral
For query, it should be `{"k1":"v1", "k2":null, "k3":"", "k4":"a"}`. For
load, it should be `{"v1", null, "", "a"}`

So we need 2 different methods to distinguish them:
`getStringValueForQuery` and `getStringValueForStreamLoad`.
And I removed or renamed some old and messy methods.

**Exmples**

- `Doris/Hive/Presto` means when setting `serde_dialect` to these types,
the format of query result for different column types.
- `Stream Load ` means what format should be like in csv format when
loading to the table

| Type | Doris | Hive | Presto | Stream Load | Comment |
| --- | --- | --- | --- | --- | --- |
| Bool | `1`, `0` | `1`, `0` | `1`, `0` | `1|0`, `true|false` ||
| Integer | `1`, `1000` | `1`, `1000` | `1`, `1000` | `1|1000` | |
| Float/Decimal | `1.2`, `3.00` | `1.2`, `3.00` | `1.2`, `3.00` |
`1.2|3.00` | |
| Date/Datetime | `2025-01-01`, `2025-01-01 10:11:11` | `2025-01-01`,
`2025-01-01 10:11:11` | `2025-01-01`, `2025-01-01 10:11:11` |
`2025-01-01|2025-01-01 10:11:11` | |
| String | `abc`, `中国` | `abc`, `中国` | `abc`, `中国` | `abc,中国` | |
| Null | `null` | `null` | `NULL` | `\N` ||
| Array<bool> | `[1, 0]` | `[true,false]` | `[1, 0]` | `[1, 0]`, `[true,
false]` ||
| Array<int> | `[1, 1000]` | `[1,1000]` | `[1, 1000]` | `[1, 1000]` ||
| Array<string> | `["abc", "中国"]` | `["abc","中国"]` | `["abc", "中国"]` |
`["abc", "中国"]` | |
| Array<date/datetime> | `["2025-01-01", "2025-01-01 10:11:11"]` |
`["2025-01-01","2025-01-01 10:11:11"]` | `["2025-01-01", "2025-01-01
10:11:11"]` | `["2025-01-01", "2025-01-01 10:11:11"]` ||
| Array<null> | `[null]` | `[null]` | `[NULL]` | `[null]` | |
| Map<int, string> | `{1:"abc", 2:"中国"}` |`{1:"abc",2:"中国"}` |`{1=abc,
2=中国}` | `{1:"abc", 2:"中国"}` | |
| Map<string, date/datetime> | `{"k1":"2022-10-01", "k2":"2022-10-01
10:10:10"}` | `{"k1":"2022-10-01","k2":"2022-10-01 10:10:10"}` |
`{k1=2022-10-01, k2=2022-10-01 10:10:10}` | `{"k1":"2022-10-01",
"k2":"2022-10-01 10:10:10"}` | |
| Map<int, null> | `{1:null, 2:null}` | `{1:null,2:null}` | `{1=NULL,
2=NULL}` | `{1:null, 2:null}` | |
| Struct<> | Same as map | Same as map | Same as map | Same as map | |

3. Fix a bug that for batch insert transaction, the `trim_double_quotas`
should be set to false
@morningman morningman requested a review from dataroaring as a code owner May 14, 2025 02:38
@Thearas
Copy link
Contributor

Thearas commented May 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40426 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 95a67da5766d6fc4428b7a0b5a53a0460d941adf, data reload: false

------ Round 1 ----------------------------------
q1	17613	6972	6579	6579
q2	2080	182	186	182
q3	10691	1085	1196	1085
q4	10489	797	781	781
q5	7754	2850	2829	2829
q6	219	135	136	135
q7	991	616	617	616
q8	9372	1943	2053	1943
q9	6660	6425	6437	6425
q10	6994	2277	2341	2277
q11	476	267	258	258
q12	409	216	217	216
q13	17767	2988	2989	2988
q14	241	209	209	209
q15	501	458	458	458
q16	672	593	600	593
q17	1002	568	622	568
q18	7689	6823	6774	6774
q19	1412	1122	1132	1122
q20	462	208	206	206
q21	4054	3372	3231	3231
q22	1112	951	968	951
Total cold run time: 108660 ms
Total hot run time: 40426 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6568	6539	6594	6539
q2	341	240	244	240
q3	2961	2790	2931	2790
q4	2045	1950	1817	1817
q5	5831	5781	5754	5754
q6	211	130	127	127
q7	2253	1824	1794	1794
q8	3404	3592	3558	3558
q9	8958	8950	9000	8950
q10	3553	3518	3539	3518
q11	609	494	498	494
q12	851	624	609	609
q13	8835	3276	3114	3114
q14	289	277	290	277
q15	515	460	474	460
q16	685	663	671	663
q17	1877	1645	1621	1621
q18	8370	7792	7791	7791
q19	1670	1532	1541	1532
q20	2053	1880	1897	1880
q21	5537	5511	5394	5394
q22	1163	1029	972	972
Total cold run time: 68579 ms
Total hot run time: 59894 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197737 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 95a67da5766d6fc4428b7a0b5a53a0460d941adf, data reload: false

query1	1325	912	883	883
query2	6317	2077	2035	2035
query3	10812	4283	4172	4172
query4	61557	28973	23580	23580
query5	5190	465	447	447
query6	404	176	186	176
query7	5455	307	300	300
query8	300	235	210	210
query9	8448	2624	2608	2608
query10	471	271	261	261
query11	17715	15190	15697	15190
query12	160	106	104	104
query13	1444	456	437	437
query14	10446	7434	7522	7434
query15	209	187	190	187
query16	7125	476	431	431
query17	1174	572	572	572
query18	1841	322	295	295
query19	224	154	153	153
query20	117	108	113	108
query21	219	103	103	103
query22	4710	4374	4722	4374
query23	34747	34289	34050	34050
query24	6158	2888	2878	2878
query25	585	434	420	420
query26	697	172	167	167
query27	1983	376	353	353
query28	4398	2508	2457	2457
query29	708	476	451	451
query30	253	168	165	165
query31	1016	800	833	800
query32	73	54	59	54
query33	410	291	306	291
query34	929	524	533	524
query35	871	748	730	730
query36	1092	954	972	954
query37	119	69	71	69
query38	4062	4136	4022	4022
query39	1511	1477	1500	1477
query40	203	108	104	104
query41	52	49	50	49
query42	115	107	99	99
query43	546	497	487	487
query44	1201	832	843	832
query45	188	175	171	171
query46	1164	732	734	732
query47	2008	1933	1925	1925
query48	496	394	398	394
query49	780	441	418	418
query50	868	434	422	422
query51	7407	7303	7273	7273
query52	119	94	97	94
query53	261	192	188	188
query54	601	474	479	474
query55	79	81	83	81
query56	261	272	257	257
query57	1256	1163	1132	1132
query58	228	220	212	212
query59	3372	3148	3053	3053
query60	277	248	243	243
query61	136	110	112	110
query62	755	694	680	680
query63	219	188	193	188
query64	1912	712	662	662
query65	3261	3238	3207	3207
query66	746	298	305	298
query67	15939	15553	15495	15495
query68	4221	574	555	555
query69	443	280	262	262
query70	1199	1050	1104	1050
query71	335	268	263	263
query72	6369	4013	4149	4013
query73	752	346	359	346
query74	10190	9040	9062	9040
query75	3368	2630	2668	2630
query76	2115	1064	1048	1048
query77	496	273	276	273
query78	10687	9624	9699	9624
query79	1958	580	593	580
query80	1369	431	419	419
query81	516	242	242	242
query82	1276	93	85	85
query83	274	142	152	142
query84	279	87	78	78
query85	1022	322	294	294
query86	405	305	307	305
query87	4438	4253	4268	4253
query88	3759	2406	2373	2373
query89	421	288	292	288
query90	1978	184	187	184
query91	183	151	154	151
query92	70	49	50	49
query93	2408	555	545	545
query94	787	304	270	270
query95	362	258	254	254
query96	621	288	286	286
query97	3287	3149	3138	3138
query98	219	201	201	201
query99	1578	1294	1312	1294
Total cold run time: 317110 ms
Total hot run time: 197737 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 95a67da5766d6fc4428b7a0b5a53a0460d941adf, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.06
query4	1.62	0.10	0.11
query5	0.52	0.53	0.53
query6	1.12	0.73	0.73
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.57	0.50	0.50
query10	0.57	0.56	0.56
query11	0.15	0.11	0.11
query12	0.15	0.12	0.13
query13	0.62	0.60	0.59
query14	2.70	2.84	2.83
query15	0.89	0.82	0.83
query16	0.42	0.37	0.38
query17	1.07	1.08	1.03
query18	0.24	0.22	0.23
query19	1.94	1.74	2.08
query20	0.01	0.02	0.01
query21	15.37	0.58	0.60
query22	3.04	2.25	1.92
query23	17.02	1.11	0.90
query24	3.00	1.69	1.57
query25	0.13	0.22	0.18
query26	0.60	0.14	0.14
query27	0.04	0.03	0.03
query28	9.28	0.54	0.48
query29	12.59	3.21	3.20
query30	0.24	0.06	0.05
query31	2.86	0.38	0.39
query32	3.24	0.47	0.45
query33	3.00	3.02	3.02
query34	16.98	4.46	4.49
query35	4.58	4.52	4.46
query36	0.68	0.47	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.12	0.14
query41	0.08	0.02	0.03
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.14 s
Total hot run time: 33.04 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 40.98% (10811/26381)
Line Coverage 31.82% (92314/290089)
Region Coverage 30.91% (47658/154194)
Branch Coverage 27.41% (24425/89100)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 7adf3bf into apache:branch-3.0 May 14, 2025
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants