Skip to content

Conversation

@LiBinfeng-01
Copy link
Contributor

@LiBinfeng-01 LiBinfeng-01 commented Mar 21, 2025

pick: #49087

Related PR: #40441

Problem Summary:

wrong calculation of emoji character length in some String function when do constant folding in FE. For example:

select STRLEFT('😊😉👍', 2);

should return 😊😉, but fe return 😊 only when folding constant

fixed functions:

  • left
  • strleft
  • right
  • strright
  • locate
  • character_length
  • split_by_string
  • overlay
  • replace_empty

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…ter by mistake (apache#49087)

Related PR: apache#40441

Problem Summary:

wrong calculation of emoji character length in some String function when
do constant folding in FE. For example:

select STRLEFT('😊😉👍', 2);

should return 😊😉, but fe return 😊 only when folding constant

fixed functions:
- left
- strleft
- right
- strright
- locate
- character_length
- split_by_string
- overlay
- replace_empty
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Mar 21, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 40055 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6325eeb926cbd2266de56a2534eee51a5aa3a44b, data reload: false

------ Round 1 ----------------------------------
q1	18033	6818	7366	6818
q2	2072	176	181	176
q3	10710	1076	1115	1076
q4	10411	710	766	710
q5	7761	2850	2753	2753
q6	222	135	131	131
q7	981	610	600	600
q8	9355	1937	1973	1937
q9	6572	6440	6435	6435
q10	7030	2228	2288	2228
q11	457	260	253	253
q12	400	209	211	209
q13	17778	3013	2985	2985
q14	239	211	215	211
q15	503	458	463	458
q16	664	589	584	584
q17	978	526	550	526
q18	7214	6556	6551	6551
q19	1403	1058	1165	1058
q20	469	205	200	200
q21	3939	3162	3197	3162
q22	1136	994	1013	994
Total cold run time: 108327 ms
Total hot run time: 40055 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6597	6627	6530	6530
q2	328	231	238	231
q3	2907	2737	2953	2737
q4	2057	1819	1791	1791
q5	5737	5713	5694	5694
q6	212	128	124	124
q7	2226	1825	1795	1795
q8	3367	3526	3486	3486
q9	8776	8852	8797	8797
q10	3586	3563	3485	3485
q11	593	483	500	483
q12	818	571	606	571
q13	11177	3171	3158	3158
q14	306	286	272	272
q15	524	477	462	462
q16	677	684	657	657
q17	1871	1612	1590	1590
q18	8199	7776	7606	7606
q19	1660	1601	1510	1510
q20	2122	1882	1904	1882
q21	5619	5369	5346	5346
q22	1132	1056	1046	1046
Total cold run time: 70491 ms
Total hot run time: 59253 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196927 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6325eeb926cbd2266de56a2534eee51a5aa3a44b, data reload: false

query1	1350	937	940	937
query2	6222	2018	2002	2002
query3	10880	4397	4606	4397
query4	66482	27780	23288	23288
query5	5014	452	445	445
query6	394	173	170	170
query7	5555	311	333	311
query8	337	229	230	229
query9	8649	2615	2623	2615
query10	432	288	261	261
query11	17383	15143	15850	15143
query12	151	102	100	100
query13	1453	482	432	432
query14	11185	6896	7445	6896
query15	206	183	175	175
query16	7088	504	523	504
query17	1105	584	607	584
query18	1843	343	314	314
query19	204	162	165	162
query20	122	117	116	116
query21	205	106	104	104
query22	4486	4526	4684	4526
query23	34361	34030	34130	34030
query24	6106	2885	2947	2885
query25	543	436	457	436
query26	708	173	172	172
query27	1870	346	372	346
query28	4405	2482	2476	2476
query29	725	482	458	458
query30	247	168	180	168
query31	1039	829	835	829
query32	66	56	54	54
query33	449	302	328	302
query34	924	515	518	515
query35	863	735	735	735
query36	1097	950	953	950
query37	121	67	66	66
query38	4226	3960	4012	3960
query39	1546	1464	1475	1464
query40	203	106	102	102
query41	49	47	50	47
query42	117	112	99	99
query43	538	496	497	496
query44	1198	845	832	832
query45	184	165	172	165
query46	1143	741	715	715
query47	2017	1938	1936	1936
query48	489	407	404	404
query49	735	403	389	389
query50	842	446	434	434
query51	7286	7305	7082	7082
query52	95	88	90	88
query53	249	176	196	176
query54	573	448	445	445
query55	77	76	78	76
query56	252	235	248	235
query57	1236	1140	1090	1090
query58	218	214	208	208
query59	3205	2989	2964	2964
query60	276	252	244	244
query61	104	105	130	105
query62	778	662	675	662
query63	213	195	198	195
query64	1374	658	635	635
query65	3256	3168	3205	3168
query66	707	287	293	287
query67	15895	15525	15560	15525
query68	3951	596	583	583
query69	426	290	258	258
query70	1163	1092	1089	1089
query71	362	257	252	252
query72	6375	4063	3951	3951
query73	753	359	359	359
query74	10469	9150	9212	9150
query75	3345	2632	2652	2632
query76	2048	1052	1118	1052
query77	495	277	280	277
query78	10509	9640	9528	9528
query79	1533	622	587	587
query80	885	431	427	427
query81	525	240	238	238
query82	1270	90	91	90
query83	250	151	151	151
query84	284	87	86	86
query85	877	319	300	300
query86	356	301	297	297
query87	4571	4264	4271	4264
query88	3823	2414	2378	2378
query89	423	289	293	289
query90	2019	186	187	186
query91	190	149	147	147
query92	73	51	50	50
query93	1915	557	567	557
query94	829	308	249	249
query95	357	260	260	260
query96	605	282	283	282
query97	3388	3199	3136	3136
query98	214	204	210	204
query99	1621	1290	1257	1257
Total cold run time: 319255 ms
Total hot run time: 196927 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.97 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6325eeb926cbd2266de56a2534eee51a5aa3a44b, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.02	0.03
query3	0.23	0.07	0.06
query4	1.63	0.10	0.10
query5	0.52	0.52	0.49
query6	1.14	0.72	0.74
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.57	0.50	0.49
query10	0.55	0.55	0.54
query11	0.14	0.11	0.13
query12	0.14	0.12	0.12
query13	0.61	0.59	0.59
query14	2.80	2.74	2.77
query15	0.88	0.84	0.82
query16	0.39	0.38	0.38
query17	0.95	1.01	1.02
query18	0.24	0.22	0.22
query19	1.89	1.74	1.98
query20	0.02	0.01	0.01
query21	15.37	0.60	0.58
query22	2.69	2.79	1.91
query23	16.97	0.84	0.91
query24	3.18	0.65	1.68
query25	0.23	0.11	0.13
query26	0.55	0.15	0.14
query27	0.04	0.03	0.04
query28	10.03	0.48	0.45
query29	12.54	3.28	3.25
query30	0.25	0.06	0.06
query31	2.86	0.38	0.39
query32	3.24	0.47	0.46
query33	2.97	3.01	3.08
query34	17.21	4.50	4.53
query35	4.53	4.58	4.58
query36	0.69	0.49	0.47
query37	0.10	0.06	0.07
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.71 s
Total hot run time: 31.97 s

@morrySnow morrySnow changed the title [fix](Nereids) fold constant for string function process emoji character by mistake (#49087) branch-3.0: [fix](Nereids) fold constant for string function process emoji character by mistake #49087 Mar 25, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 4879493 into apache:branch-3.0 Mar 27, 2025
27 of 28 checks passed
dataroaring pushed a commit that referenced this pull request Mar 30, 2025
Cherry-picked from #49061
case of initcap has been added at:
(#49346)
zclllyybb pushed a commit to apache/doris-website that referenced this pull request Dec 1, 2025
The behavior of APPEND_TRAILING_CHAR_IF_ABSENT has been updated in the
3.x branch.
Documentation needs to be updated accordingly to reflect the new
behavior.

## Related Doris PRs
[#50659](apache/doris#50659)
[#49346](apache/doris#49346)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants