Skip to content

Conversation

@Mryange
Copy link
Contributor

@Mryange Mryange commented Jul 22, 2024

Proposed changes

mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
|                                   0.6 |
+---------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.01 sec)

doc apache/doris-website#899

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Mryange
Copy link
Contributor Author

Mryange commented Jul 22, 2024

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39887 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit afc921460a1bb935e5e0af4ecd331b4ed7f7f358, data reload: false

------ Round 1 ----------------------------------
q1	17618	4329	4270	4270
q2	2012	196	186	186
q3	10443	1214	1193	1193
q4	10187	813	871	813
q5	7565	2699	2625	2625
q6	222	135	135	135
q7	952	591	599	591
q8	9234	2079	2073	2073
q9	8745	6570	6574	6570
q10	8826	3772	3762	3762
q11	475	231	233	231
q12	475	226	219	219
q13	18009	2968	2979	2968
q14	290	237	233	233
q15	533	483	475	475
q16	493	384	380	380
q17	963	719	710	710
q18	8000	7417	7424	7417
q19	5494	1425	1312	1312
q20	697	312	326	312
q21	4948	3128	3177	3128
q22	356	284	286	284
Total cold run time: 116537 ms
Total hot run time: 39887 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4350	4269	4208	4208
q2	385	268	259	259
q3	2971	2819	2890	2819
q4	1963	1750	1696	1696
q5	5667	5516	5517	5516
q6	223	131	128	128
q7	2193	1817	1825	1817
q8	3266	3406	3416	3406
q9	8964	8745	8888	8745
q10	4120	3911	3796	3796
q11	613	474	468	468
q12	801	649	625	625
q13	16844	3188	3193	3188
q14	314	282	308	282
q15	550	486	495	486
q16	491	425	420	420
q17	1819	1503	1505	1503
q18	8170	7893	7898	7893
q19	1883	1580	1535	1535
q20	2200	1921	1876	1876
q21	10097	4953	4779	4779
q22	584	503	507	503
Total cold run time: 78468 ms
Total hot run time: 55948 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174096 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit afc921460a1bb935e5e0af4ecd331b4ed7f7f358, data reload: false

query1	916	369	384	369
query2	6445	1971	1833	1833
query3	6636	208	218	208
query4	28181	17509	17350	17350
query5	3670	477	502	477
query6	264	174	156	156
query7	4571	288	282	282
query8	242	200	204	200
query9	8778	2415	2381	2381
query10	453	294	261	261
query11	11173	10077	10083	10077
query12	119	84	84	84
query13	1639	373	359	359
query14	10252	7612	7609	7609
query15	209	165	170	165
query16	7749	499	467	467
query17	1546	548	519	519
query18	1939	276	273	273
query19	189	153	145	145
query20	93	81	88	81
query21	207	128	122	122
query22	4558	4168	4060	4060
query23	34276	34021	33548	33548
query24	10924	2969	2926	2926
query25	608	399	389	389
query26	700	148	146	146
query27	2295	272	272	272
query28	6273	2055	2017	2017
query29	885	654	629	629
query30	254	154	148	148
query31	990	746	778	746
query32	95	53	53	53
query33	644	332	344	332
query34	953	473	504	473
query35	866	755	772	755
query36	1163	961	980	961
query37	139	86	86	86
query38	2896	2775	2769	2769
query39	886	816	809	809
query40	198	151	116	116
query41	45	43	42	42
query42	121	97	98	97
query43	525	458	475	458
query44	1067	714	715	714
query45	192	161	160	160
query46	1072	728	717	717
query47	1863	1798	1806	1798
query48	362	286	287	286
query49	817	407	413	407
query50	770	383	382	382
query51	6848	6606	6782	6606
query52	104	97	96	96
query53	350	286	286	286
query54	870	449	435	435
query55	74	72	71	71
query56	282	258	281	258
query57	1126	1034	1028	1028
query58	241	262	246	246
query59	3011	2714	2544	2544
query60	302	271	277	271
query61	97	130	92	92
query62	794	660	663	660
query63	312	282	290	282
query64	9120	2225	4137	2225
query65	3141	3094	3118	3094
query66	724	326	327	326
query67	15429	15130	15005	15005
query68	4554	521	518	518
query69	458	325	343	325
query70	1173	1108	1176	1108
query71	429	279	276	276
query72	7137	5708	6027	5708
query73	752	313	314	313
query74	6152	5718	5633	5633
query75	3408	2653	2677	2653
query76	2654	867	903	867
query77	446	311	356	311
query78	12325	9423	8970	8970
query79	4093	518	503	503
query80	2011	470	466	466
query81	591	222	219	219
query82	625	137	140	137
query83	275	163	158	158
query84	280	88	82	82
query85	739	305	295	295
query86	492	326	312	312
query87	3295	3117	3141	3117
query88	4238	2335	2338	2335
query89	498	380	385	380
query90	1917	188	183	183
query91	127	100	99	99
query92	64	52	50	50
query93	4110	486	481	481
query94	1167	280	293	280
query95	398	314	381	314
query96	612	273	270	270
query97	3230	3029	3046	3029
query98	224	194	191	191
query99	1531	1283	1274	1274
Total cold run time: 284538 ms
Total hot run time: 174096 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.78 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit afc921460a1bb935e5e0af4ecd331b4ed7f7f358, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.04
query4	1.68	0.06	0.07
query5	0.50	0.50	0.50
query6	1.13	0.73	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.56	0.50	0.50
query10	0.56	0.55	0.54
query11	0.15	0.11	0.11
query12	0.15	0.13	0.13
query13	0.61	0.59	0.58
query14	0.78	0.76	0.78
query15	0.86	0.81	0.81
query16	0.36	0.36	0.35
query17	1.05	0.97	1.03
query18	0.23	0.22	0.22
query19	1.91	1.84	1.81
query20	0.01	0.00	0.00
query21	15.40	0.73	0.66
query22	4.46	7.21	2.06
query23	18.72	1.43	1.27
query24	2.18	0.22	0.21
query25	0.15	0.08	0.09
query26	0.31	0.21	0.20
query27	0.45	0.23	0.22
query28	13.24	1.01	1.00
query29	12.61	3.30	3.30
query30	0.26	0.07	0.05
query31	2.85	0.38	0.39
query32	3.26	0.47	0.47
query33	2.87	2.96	2.91
query34	17.12	4.33	4.35
query35	4.41	4.40	4.40
query36	0.66	0.46	0.47
query37	0.20	0.15	0.16
query38	0.15	0.15	0.14
query39	0.04	0.03	0.03
query40	0.15	0.12	0.12
query41	0.09	0.05	0.04
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 110.65 s
Total hot run time: 30.78 s

@Mryange Mryange changed the title [draft 2.0] [feature](function) support ngram_search function Jul 23, 2024
@Mryange Mryange marked this pull request as ready for review July 23, 2024 03:44
@Mryange
Copy link
Contributor Author

Mryange commented Jul 23, 2024

run buildall

1 similar comment
@Mryange
Copy link
Contributor Author

Mryange commented Jul 23, 2024

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40172 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 19ea8afc149d16633cc4a1846d3341a9aa882df6, data reload: false

------ Round 1 ----------------------------------
q1	17635	4393	4288	4288
q2	2042	191	187	187
q3	10445	1173	1140	1140
q4	10229	792	869	792
q5	8074	2768	2741	2741
q6	224	139	138	138
q7	971	621	606	606
q8	9220	2107	2089	2089
q9	8824	6597	6510	6510
q10	8704	3782	3844	3782
q11	456	244	243	243
q12	397	226	217	217
q13	18859	2983	2986	2983
q14	274	254	232	232
q15	542	498	491	491
q16	481	398	379	379
q17	963	689	759	689
q18	8175	7480	7416	7416
q19	5724	1376	1478	1376
q20	714	329	332	329
q21	4923	3260	3286	3260
q22	342	291	284	284
Total cold run time: 118218 ms
Total hot run time: 40172 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4395	4242	4227	4227
q2	373	267	276	267
q3	3052	2745	2772	2745
q4	1864	1622	1593	1593
q5	5297	5343	5324	5324
q6	224	138	130	130
q7	2172	1780	1765	1765
q8	3267	3436	3385	3385
q9	8499	8482	8440	8440
q10	3923	3694	3696	3694
q11	581	492	501	492
q12	761	604	608	604
q13	16759	2996	2969	2969
q14	302	268	290	268
q15	514	487	484	484
q16	492	420	421	420
q17	1822	1538	1506	1506
q18	7565	7540	7405	7405
q19	6531	1521	1517	1517
q20	2032	1769	1770	1769
q21	4858	4739	4809	4739
q22	589	496	500	496
Total cold run time: 75872 ms
Total hot run time: 54239 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173509 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 19ea8afc149d16633cc4a1846d3341a9aa882df6, data reload: false

query1	930	368	367	367
query2	6450	1905	1882	1882
query3	6652	206	214	206
query4	24907	17409	17469	17409
query5	4253	467	479	467
query6	279	190	167	167
query7	4608	284	282	282
query8	247	192	215	192
query9	8704	2385	2355	2355
query10	449	270	260	260
query11	10357	10022	10058	10022
query12	140	83	80	80
query13	1645	369	355	355
query14	10210	7577	7465	7465
query15	220	161	163	161
query16	7812	457	498	457
query17	1576	559	544	544
query18	1965	287	281	281
query19	203	158	151	151
query20	90	81	83	81
query21	215	127	130	127
query22	4319	4084	4002	4002
query23	33904	33114	33197	33114
query24	12128	2860	2848	2848
query25	683	385	387	385
query26	1795	153	149	149
query27	2959	269	273	269
query28	7637	1968	1957	1957
query29	1182	648	632	632
query30	289	152	146	146
query31	993	762	739	739
query32	103	55	56	55
query33	786	347	346	346
query34	922	474	484	474
query35	861	745	754	745
query36	1116	929	909	909
query37	287	80	85	80
query38	2886	2767	2745	2745
query39	849	803	784	784
query40	274	119	121	119
query41	52	47	50	47
query42	115	104	102	102
query43	486	458	468	458
query44	1200	716	707	707
query45	196	164	172	164
query46	1086	719	719	719
query47	1883	1776	1791	1776
query48	360	295	288	288
query49	1236	422	424	422
query50	792	394	390	390
query51	6744	6693	6726	6693
query52	109	95	93	93
query53	370	289	287	287
query54	943	530	439	439
query55	76	72	72	72
query56	282	265	271	265
query57	1143	1061	1083	1061
query58	256	254	254	254
query59	2882	2630	2681	2630
query60	332	299	274	274
query61	96	93	95	93
query62	839	644	656	644
query63	324	285	282	282
query64	10407	2244	7403	2244
query65	3130	3137	3105	3105
query66	1400	322	338	322
query67	15668	15015	15265	15015
query68	8299	542	548	542
query69	719	477	369	369
query70	1197	1156	1139	1139
query71	492	280	280	280
query72	7712	5599	5490	5490
query73	801	322	328	322
query74	6249	5681	5661	5661
query75	4426	2671	2685	2671
query76	4767	948	923	923
query77	702	302	306	302
query78	9748	9065	15351	9065
query79	5602	505	508	505
query80	1100	479	464	464
query81	600	217	218	217
query82	286	137	138	137
query83	226	166	168	166
query84	282	81	86	81
query85	1379	327	299	299
query86	395	300	305	300
query87	3302	3179	3068	3068
query88	3796	2379	2345	2345
query89	470	385	369	369
query90	1967	191	188	188
query91	129	98	101	98
query92	64	53	49	49
query93	1023	491	487	487
query94	1212	286	280	280
query95	402	313	313	313
query96	596	276	266	266
query97	3211	3047	3062	3047
query98	219	208	190	190
query99	1605	1270	1221	1221
Total cold run time: 291270 ms
Total hot run time: 173509 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.6 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 19ea8afc149d16633cc4a1846d3341a9aa882df6, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.06	0.05
query4	1.69	0.09	0.08
query5	0.51	0.49	0.50
query6	1.13	0.74	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.57	0.50	0.49
query10	0.55	0.56	0.52
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.59	0.58	0.57
query14	0.76	0.78	0.79
query15	0.85	0.81	0.81
query16	0.36	0.37	0.37
query17	1.05	0.98	1.01
query18	0.22	0.21	0.22
query19	1.83	1.79	1.64
query20	0.01	0.01	0.01
query21	15.42	0.74	0.65
query22	4.60	7.23	1.82
query23	18.34	1.44	1.29
query24	2.12	0.24	0.21
query25	0.16	0.08	0.08
query26	0.30	0.22	0.21
query27	0.45	0.23	0.22
query28	13.30	1.01	1.01
query29	12.58	3.31	3.32
query30	0.25	0.06	0.05
query31	2.87	0.39	0.39
query32	3.28	0.48	0.47
query33	2.90	2.90	2.91
query34	16.96	4.36	4.36
query35	4.44	4.47	4.50
query36	0.65	0.48	0.47
query37	0.19	0.16	0.16
query38	0.15	0.15	0.15
query39	0.04	0.04	0.04
query40	0.14	0.12	0.12
query41	0.10	0.05	0.05
query42	0.05	0.05	0.06
query43	0.04	0.04	0.03
Total cold run time: 110.17 s
Total hot run time: 30.6 s

Copy link
Contributor

@superdiaodiao superdiaodiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we should add the function in this file:
gensrc/script/doris_builtins_functions.py

@Mryange
Copy link
Contributor Author

Mryange commented Jul 23, 2024

run feut

@Mryange
Copy link
Contributor Author

Mryange commented Jul 23, 2024

It seems we should add the function in this file: gensrc/script/doris_builtins_functions.py

The doris_builtins_functions.py is used for the original planner, but the original planner is about to be removed.

@superdiaodiao
Copy link
Contributor

superdiaodiao commented Jul 23, 2024

The doris_builtins_functions.py is used for the original planner, but the original planner is about to be removed.

get✔
Thanks~

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HappenLee
Copy link
Contributor

Add Doc please

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 24, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@HappenLee HappenLee merged commit a59fdf4 into apache:master Jul 26, 2024
Mryange added a commit to Mryange/doris that referenced this pull request Sep 18, 2024
mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
|                                   0.6 |
+---------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.01 sec)
```

doc apache/doris-website#899
Mryange added a commit to Mryange/doris that referenced this pull request Sep 18, 2024
mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
|                                   0.6 |
+---------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.01 sec)
```

doc apache/doris-website#899
yiguolei pushed a commit that referenced this pull request Sep 21, 2024
#38226 
mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
|                                   0.6 |
+---------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.01 sec)
```

doc apache/doris-website#899

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
morningman pushed a commit to apache/doris-website that referenced this pull request Sep 27, 2024
dataroaring pushed a commit that referenced this pull request Oct 9, 2024
mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
|                                   0.6 |
+---------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.01 sec)
```

doc apache/doris-website#899
@gavinchou gavinchou mentioned this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants