Skip to content

Conversation

@amorynan
Copy link
Contributor

@amorynan amorynan commented Mar 21, 2024

Proposed changes

in this pr we will do:

  1. support expr to eval inverted index
  2. inverted index reader param should be generic for expr

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

return true;
}

bool SegmentIterator::_check_apply_by_inverted_index(ColumnId col_id) {

This comment was marked as abuse.


const std::string& expr_name() const override { return _expr_name; }

bool is_all_ones(const roaring::Roaring& r) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'is_all_ones' can be made static [readability-convert-member-functions-to-static]

Suggested change
bool is_all_ones(const roaring::Roaring& r) {
static bool is_all_ones(const roaring::Roaring& r) {

// 2. when meet 'and' conjunct, function with column b can not apply inverted index
// eg. a and hash(b)=1, if b can apply index, but hash(b)=1 is not for index, so b should not be extracted
// but a and array_contains(b, 1), b can be applied inverted index, which b can be extracted
Status eval_inverted_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'eval_inverted_index' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status eval_inverted_index(
static Status eval_inverted_index(

be/src/vec/exprs/vcompound_pred.h:74:

-             uint32_t num_rows, roaring::Roaring* bitmap) const override {
+             uint32_t num_rows, roaring::Roaring* bitmap) override {

std::vector<size_t>& args) override;
Status eval_inverted_index(
VExprContext* context,
const std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: parameter 2 is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change
const std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair,
std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair,

* @return status not ok means execute failed.
*/
[[nodiscard]] Status eval_inverted_indexs(
const std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: parameter 1 is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change
const std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair,
std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair,

return Status::OK();
}

Status eval_inverted_index(FunctionContext* context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'eval_inverted_index' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status eval_inverted_index(FunctionContext* context,
static Status eval_inverted_index(FunctionContext* context,

be/src/vec/functions/function.h:411:

-                                roaring::Roaring* bitmap) const override {
+                                roaring::Roaring* bitmap) override {

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// 2. when meet 'and' conjunct, function with column b can not apply inverted index
// eg. a and hash(b)=1, if b can apply index, but hash(b)=1 is not for index, so b should not be extracted
// but a and array_contains(b, 1), b can be applied inverted index, which b can be extracted
Status eval_inverted_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'eval_inverted_index' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status eval_inverted_index(
static Status eval_inverted_index(

be/src/vec/exprs/vcompound_pred.h:64:

-             uint32_t num_rows, roaring::Roaring* bitmap) const override {
+             uint32_t num_rows, roaring::Roaring* bitmap) override {

// specific language governing permissions and limitations
// under the License.

#include <gtest/gtest-message.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'gtest/gtest-message.h' file not found [clang-diagnostic-error]

#include <gtest/gtest-message.h>
         ^

return create_query_value<TYPE_STRING>(value, result_param);
case PrimitiveType::TYPE_IPV4:
return create_query_value<TYPE_IPV4>(value, result_param);
case PrimitiveType::TYPE_IPV6:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not support HLL/IPV6/IPV4 type for inverted index

std::unordered_map<ColumnId, std::pair<vectorized::NameAndTypePair, InvertedIndexIterator*>>
iter_map;

for (auto col_id : _common_expr_columns) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about those non-array expr be processed in this situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just will return NotSupport

@amorynan
Copy link
Contributor Author

run buildall

@amorynan
Copy link
Contributor Author

run buildall

1 similar comment
@amorynan
Copy link
Contributor Author

run buildall

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.41% (8924/25205)
Line Coverage: 27.10% (73298/270428)
Region Coverage: 26.24% (37871/144311)
Branch Coverage: 23.05% (19289/83700)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d48a1de18b8bbccec9595127d750d5e7c6c03ecc_d48a1de18b8bbccec9595127d750d5e7c6c03ecc/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 37236 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d48a1de18b8bbccec9595127d750d5e7c6c03ecc, data reload: false

------ Round 1 ----------------------------------
q1	17600	4932	4204	4204
q2	2056	235	222	222
q3	10472	1135	1249	1135
q4	10252	818	769	769
q5	7500	2750	2686	2686
q6	218	134	133	133
q7	1020	616	595	595
q8	9228	2078	2063	2063
q9	7391	6603	6563	6563
q10	8472	3554	3544	3544
q11	446	234	225	225
q12	504	259	265	259
q13	17881	2977	3004	2977
q14	264	234	231	231
q15	524	482	486	482
q16	545	410	394	394
q17	969	616	723	616
q18	7469	6810	6681	6681
q19	276	227	246	227
q20	669	360	334	334
q21	3396	2823	2900	2823
q22	102	76	73	73
Total cold run time: 107254 ms
Total hot run time: 37236 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7470	4271	4278	4271
q2	381	286	282	282
q3	3027	2754	2757	2754
q4	1901	1598	1671	1598
q5	5381	5315	5341	5315
q6	217	125	127	125
q7	2280	1843	1917	1843
q8	3222	3377	3366	3366
q9	8595	8556	8627	8556
q10	3968	3922	3934	3922
q11	607	505	488	488
q12	918	694	670	670
q13	16521	3256	3165	3165
q14	339	295	279	279
q15	530	485	481	481
q16	488	506	451	451
q17	1834	1538	1541	1538
q18	8050	8110	7840	7840
q19	1074	1201	1083	1083
q20	1960	1846	1890	1846
q21	5238	5018	5080	5018
q22	166	166	151	151
Total cold run time: 74167 ms
Total hot run time: 55042 ms

eldenmoon
eldenmoon previously approved these changes Apr 28, 2024
Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 28, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@amorynan
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 28, 2024
eldenmoon
eldenmoon previously approved these changes Apr 28, 2024
Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 28, 2024
@doris-robot
Copy link

TPC-DS: Total hot run time: 186909 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 56b4e88bd84d23bdba38f3db5d34d3115a6196de, data reload: false

query1	900	369	366	366
query2	6252	2333	2415	2333
query3	6668	215	209	209
query4	23957	21821	21797	21797
query5	3865	431	425	425
query6	276	181	176	176
query7	4571	309	302	302
query8	253	193	190	190
query9	8746	2446	2440	2440
query10	424	250	261	250
query11	15237	14782	14887	14782
query12	120	84	88	84
query13	1658	363	366	363
query14	10697	6997	6834	6834
query15	260	167	172	167
query16	8166	264	259	259
query17	1699	560	554	554
query18	2094	271	266	266
query19	289	150	145	145
query20	89	82	84	82
query21	194	125	125	125
query22	5054	4808	4809	4808
query23	33972	33322	33218	33218
query24	10710	3026	2904	2904
query25	603	364	360	360
query26	1157	156	151	151
query27	2358	321	327	321
query28	7128	2001	1985	1985
query29	859	607	601	601
query30	235	151	149	149
query31	933	720	735	720
query32	93	50	59	50
query33	732	247	242	242
query34	1028	471	480	471
query35	821	671	685	671
query36	1063	891	870	870
query37	142	65	67	65
query38	3122	3020	3042	3020
query39	1612	1563	1545	1545
query40	198	126	128	126
query41	43	38	40	38
query42	103	95	96	95
query43	570	540	538	538
query44	1177	725	729	725
query45	277	261	271	261
query46	1068	755	720	720
query47	1950	1852	1814	1814
query48	379	299	291	291
query49	891	406	396	396
query50	756	388	395	388
query51	6853	6647	6545	6545
query52	104	93	93	93
query53	349	283	289	283
query54	313	249	245	245
query55	80	74	73	73
query56	268	231	232	231
query57	1209	1138	1121	1121
query58	232	208	215	208
query59	3516	3087	3089	3087
query60	263	241	248	241
query61	142	92	90	90
query62	648	449	439	439
query63	315	283	281	281
query64	8504	7212	7182	7182
query65	3121	3075	3057	3057
query66	1334	326	337	326
query67	15620	15350	14996	14996
query68	5268	539	549	539
query69	480	345	306	306
query70	1093	1098	1139	1098
query71	419	271	273	271
query72	7118	2637	2429	2429
query73	714	329	327	327
query74	6418	6106	6108	6106
query75	3371	2688	2692	2688
query76	3141	991	923	923
query77	556	265	279	265
query78	10874	10227	10193	10193
query79	4639	517	522	517
query80	1264	438	445	438
query81	502	225	224	224
query82	1308	90	107	90
query83	201	174	174	174
query84	270	88	97	88
query85	1366	272	274	272
query86	459	305	293	293
query87	3277	3099	3075	3075
query88	5076	2434	2432	2432
query89	489	387	376	376
query90	2039	188	190	188
query91	127	100	100	100
query92	60	50	48	48
query93	4830	512	499	499
query94	1141	185	186	185
query95	406	312	309	309
query96	590	272	274	272
query97	3170	2956	2939	2939
query98	247	217	216	216
query99	1238	856	839	839
Total cold run time: 290471 ms
Total hot run time: 186909 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.51% (8929/25148)
Line Coverage: 27.12% (73472/270874)
Region Coverage: 26.31% (37961/144287)
Branch Coverage: 23.10% (19335/83710)
Coverage Report: http://coverage.selectdb-in.cc/coverage/56b4e88bd84d23bdba38f3db5d34d3115a6196de_56b4e88bd84d23bdba38f3db5d34d3115a6196de/report/index.html

@amorynan
Copy link
Contributor Author

run p0

@amorynan
Copy link
Contributor Author

run external

@amorynan
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 28, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 41665 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e083aef64a1ad2b48ddb69adb5221e60fd8a0a2e, data reload: false

------ Round 1 ----------------------------------
q1	17931	4504	4267	4267
q2	2014	193	185	185
q3	10467	1199	1312	1199
q4	10200	714	778	714
q5	7530	2791	2721	2721
q6	221	132	130	130
q7	1062	648	623	623
q8	9221	2184	2135	2135
q9	9576	6909	6837	6837
q10	9191	3924	3928	3924
q11	447	253	238	238
q12	550	230	235	230
q13	17259	3158	3213	3158
q14	285	237	233	233
q15	502	480	462	462
q16	496	387	402	387
q17	1023	754	676	676
q18	8468	7814	7867	7814
q19	3220	1579	1525	1525
q20	631	330	319	319
q21	5482	3620	4245	3620
q22	343	268	283	268
Total cold run time: 116119 ms
Total hot run time: 41665 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4552	4448	4433	4433
q2	384	272	286	272
q3	3215	2954	2927	2927
q4	1934	1627	1589	1589
q5	5452	5527	5548	5527
q6	221	124	124	124
q7	2387	1972	1968	1968
q8	3277	3394	3427	3394
q9	8849	8875	8918	8875
q10	4027	3730	3811	3730
q11	613	480	485	480
q12	796	602	623	602
q13	16170	3086	3114	3086
q14	308	287	273	273
q15	538	481	460	460
q16	520	448	431	431
q17	1802	1548	1486	1486
q18	7685	7760	7455	7455
q19	1651	1551	1572	1551
q20	1978	1772	1781	1772
q21	5192	4782	4783	4782
q22	586	505	503	503
Total cold run time: 72137 ms
Total hot run time: 55720 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186120 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e083aef64a1ad2b48ddb69adb5221e60fd8a0a2e, data reload: false

query1	907	362	355	355
query2	6619	2434	2482	2434
query3	6645	211	214	211
query4	23014	21212	21220	21212
query5	4115	419	420	419
query6	266	191	176	176
query7	4595	301	295	295
query8	247	221	187	187
query9	8508	2326	2298	2298
query10	440	245	271	245
query11	14796	14110	14175	14110
query12	131	89	84	84
query13	1628	378	374	374
query14	9650	8271	7470	7470
query15	268	164	169	164
query16	8024	254	256	254
query17	1776	566	542	542
query18	2078	284	264	264
query19	208	149	145	145
query20	95	96	83	83
query21	190	128	124	124
query22	5222	4911	4949	4911
query23	33608	33215	33132	33132
query24	6772	2930	2912	2912
query25	545	380	370	370
query26	700	147	144	144
query27	2045	316	314	314
query28	3802	2007	1996	1996
query29	847	610	595	595
query30	246	148	154	148
query31	982	730	715	715
query32	86	52	52	52
query33	476	260	251	251
query34	876	470	469	469
query35	770	675	662	662
query36	1067	854	926	854
query37	101	65	65	65
query38	3173	3071	3050	3050
query39	1596	1534	1560	1534
query40	201	138	130	130
query41	45	41	41	41
query42	106	95	98	95
query43	598	557	563	557
query44	1084	731	745	731
query45	296	261	282	261
query46	1072	703	719	703
query47	1977	1860	1858	1858
query48	376	306	294	294
query49	781	449	391	391
query50	777	379	393	379
query51	6657	6607	6651	6607
query52	98	90	91	90
query53	347	275	281	275
query54	264	232	236	232
query55	81	75	71	71
query56	240	222	215	215
query57	1188	1140	1160	1140
query58	213	210	198	198
query59	3370	3002	3141	3002
query60	248	231	228	228
query61	92	89	92	89
query62	591	444	455	444
query63	301	280	279	279
query64	8155	7147	7118	7118
query65	3088	3039	3029	3029
query66	803	338	336	336
query67	15360	15247	15022	15022
query68	5265	531	533	531
query69	472	312	315	312
query70	1187	1101	1111	1101
query71	380	266	261	261
query72	7250	2741	2425	2425
query73	714	323	328	323
query74	6537	6130	6063	6063
query75	3352	2687	2663	2663
query76	2697	1010	989	989
query77	380	263	260	260
query78	11192	10295	10274	10274
query79	4028	525	523	523
query80	2147	439	426	426
query81	539	224	214	214
query82	1121	90	97	90
query83	274	175	163	163
query84	268	85	83	83
query85	1860	268	272	268
query86	529	319	310	310
query87	3327	3153	3153	3153
query88	4556	2309	2313	2309
query89	491	376	363	363
query90	2056	175	177	175
query91	123	115	98	98
query92	64	47	48	47
query93	5672	509	498	498
query94	1261	182	180	180
query95	389	294	303	294
query96	596	277	265	265
query97	3123	2953	2946	2946
query98	230	221	215	215
query99	1201	863	874	863
Total cold run time: 278804 ms
Total hot run time: 186120 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.50% (8932/25164)
Line Coverage: 27.18% (73655/270941)
Region Coverage: 26.36% (38045/144330)
Branch Coverage: 23.16% (19389/83730)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e083aef64a1ad2b48ddb69adb5221e60fd8a0a2e_e083aef64a1ad2b48ddb69adb5221e60fd8a0a2e/report/index.html

@amorynan amorynan requested a review from eldenmoon April 29, 2024 01:25
Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants