Skip to content

Conversation

@zzzxl1993
Copy link
Contributor

@zzzxl1993 zzzxl1993 commented Apr 12, 2024

Proposed changes

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add testcase

return 0;
};
re2::RE2 pattern(terms[0]);
std::vector<std::string> results;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original name terms is more easy to understand.

std::vector<Term*> _terms;
std::vector<TermDocs*> _term_docs;
std::vector<TermIterator> _term_iterators;
// std::vector<Term*> _terms;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just delete

// std::vector<TermDocs*> _term_docs;
// std::vector<TermIterator> _term_iterators;

std::wstring _field_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to put common fields: _filed_name and _terms in parent class Query.

for (int i = 0; i < _terms.size(); i++) {
if (i == 0) {
func(iter, true);
func(_terms[i], true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use one call func(_terms[i], i == 0) instead of if else

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zzzxl1993 zzzxl1993 changed the title [opt](inverted index) match_regexp with PartialMatch enables partial term matching. [opt](inverted index) prevent excessive memory usage caused by too many terms. Apr 16, 2024
@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.50% (8907/25092)
Line Coverage: 27.22% (73151/268715)
Region Coverage: 26.35% (37829/143538)
Branch Coverage: 23.13% (19271/83320)
Coverage Report: http://coverage.selectdb-in.cc/coverage/64fba1ded67b2cac23f0083d7165a2d0efadf2a1_64fba1ded67b2cac23f0083d7165a2d0efadf2a1/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38352 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 64fba1ded67b2cac23f0083d7165a2d0efadf2a1, data reload: false

------ Round 1 ----------------------------------
q1	17620	4349	4211	4211
q2	2015	191	188	188
q3	10431	1161	1142	1142
q4	10190	750	750	750
q5	7500	2694	2623	2623
q6	219	135	131	131
q7	1001	611	585	585
q8	9225	2053	2035	2035
q9	7483	6622	6524	6524
q10	8616	3515	3505	3505
q11	457	229	234	229
q12	520	215	217	215
q13	17773	2986	2944	2944
q14	262	232	233	232
q15	517	484	485	484
q16	509	410	372	372
q17	952	657	689	657
q18	7456	6806	6739	6739
q19	7627	1552	1468	1468
q20	644	308	304	304
q21	3558	2701	2986	2701
q22	360	313	315	313
Total cold run time: 114935 ms
Total hot run time: 38352 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4329	4246	4237	4237
q2	370	269	265	265
q3	2982	2792	2763	2763
q4	1835	1560	1592	1560
q5	5269	5290	5288	5288
q6	211	122	123	122
q7	2235	1875	1859	1859
q8	3236	3338	3343	3338
q9	8608	8552	8696	8552
q10	4034	3894	4028	3894
q11	619	506	492	492
q12	830	678	675	675
q13	15990	3260	3164	3164
q14	318	290	298	290
q15	538	506	486	486
q16	476	449	444	444
q17	1847	1547	1511	1511
q18	8002	7888	7888	7888
q19	1681	1582	1574	1574
q20	2071	1878	1832	1832
q21	8833	4948	4889	4889
q22	540	471	460	460
Total cold run time: 74854 ms
Total hot run time: 55583 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185122 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 64fba1ded67b2cac23f0083d7165a2d0efadf2a1, data reload: false

query1	884	357	358	357
query2	6188	2600	2349	2349
query3	6653	202	201	201
query4	24744	21389	21261	21261
query5	4129	396	410	396
query6	292	184	190	184
query7	4582	288	309	288
query8	232	177	174	174
query9	8573	2348	2337	2337
query10	418	237	254	237
query11	14736	14182	14136	14136
query12	131	88	85	85
query13	1627	354	354	354
query14	9474	7447	7544	7447
query15	238	179	180	179
query16	8089	258	257	257
query17	1946	569	541	541
query18	2062	268	267	267
query19	193	144	146	144
query20	90	84	83	83
query21	193	132	120	120
query22	5044	4859	4857	4857
query23	33880	33128	33274	33128
query24	11372	3027	2988	2988
query25	601	383	381	381
query26	706	157	163	157
query27	2455	364	368	364
query28	6299	2101	2091	2091
query29	864	622	624	622
query30	281	176	199	176
query31	1038	765	757	757
query32	97	52	52	52
query33	765	257	254	254
query34	1114	491	506	491
query35	865	747	717	717
query36	1090	933	942	933
query37	115	76	76	76
query38	3464	3358	3338	3338
query39	1668	1642	1577	1577
query40	169	137	129	129
query41	47	43	45	43
query42	101	93	100	93
query43	603	557	520	520
query44	1241	756	752	752
query45	285	268	262	262
query46	1102	742	716	716
query47	2045	1928	1937	1928
query48	382	319	307	307
query49	827	389	391	389
query50	800	406	419	406
query51	6959	6783	6690	6690
query52	98	89	88	88
query53	343	286	282	282
query54	295	221	227	221
query55	77	71	74	71
query56	246	242	228	228
query57	1173	1130	1137	1130
query58	216	190	199	190
query59	3563	3096	3192	3096
query60	259	242	230	230
query61	90	89	86	86
query62	611	453	432	432
query63	304	284	284	284
query64	4925	3708	3736	3708
query65	3174	3087	3036	3036
query66	760	329	328	328
query67	15464	15080	14995	14995
query68	7671	549	544	544
query69	550	312	309	309
query70	1248	1159	1181	1159
query71	1476	1265	1270	1265
query72	6492	2607	2492	2492
query73	741	328	332	328
query74	6797	6298	6319	6298
query75	3945	2647	2708	2647
query76	4563	997	976	976
query77	650	269	272	269
query78	10944	10292	10117	10117
query79	7478	512	516	512
query80	1055	454	467	454
query81	507	252	243	243
query82	622	180	99	99
query83	199	174	167	167
query84	255	87	81	81
query85	864	275	261	261
query86	366	303	291	291
query87	3473	3339	3305	3305
query88	4676	2417	2407	2407
query89	479	362	370	362
query90	1936	184	184	184
query91	121	93	97	93
query92	54	47	47	47
query93	6007	513	512	512
query94	1085	182	183	182
query95	387	293	300	293
query96	599	265	260	260
query97	3105	2950	2916	2916
query98	235	207	224	207
query99	1171	846	874	846
Total cold run time: 293102 ms
Total hot run time: 185122 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 64fba1ded67b2cac23f0083d7165a2d0efadf2a1, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.03
query3	0.22	0.05	0.05
query4	1.68	0.08	0.08
query5	0.49	0.47	0.50
query6	1.48	0.72	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.57	0.49	0.49
query10	0.53	0.56	0.53
query11	0.15	0.12	0.11
query12	0.14	0.12	0.12
query13	0.63	0.59	0.59
query14	0.76	0.78	0.77
query15	0.84	0.81	0.81
query16	0.36	0.35	0.37
query17	0.98	0.94	0.95
query18	0.21	0.23	0.22
query19	1.75	1.79	1.69
query20	0.01	0.02	0.01
query21	15.61	0.64	0.64
query22	4.81	7.81	1.65
query23	18.28	1.37	1.23
query24	1.57	0.27	0.27
query25	0.15	0.08	0.08
query26	0.28	0.17	0.16
query27	0.09	0.08	0.08
query28	13.29	1.00	0.97
query29	12.64	3.28	3.24
query30	0.28	0.07	0.05
query31	2.87	0.39	0.39
query32	3.26	0.46	0.46
query33	2.85	2.85	2.80
query34	17.18	4.40	4.46
query35	4.46	4.50	4.48
query36	0.64	0.46	0.46
query37	0.18	0.15	0.15
query38	0.16	0.14	0.15
query39	0.04	0.03	0.04
query40	0.18	0.14	0.14
query41	0.09	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.03	0.03
Total cold run time: 110 s
Total hot run time: 30 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 64fba1ded67b2cac23f0083d7165a2d0efadf2a1 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       14.3 seconds inserted 10000000 Rows, about 699K ops/s

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 18, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants