Skip to content

Conversation

@xiaokang
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

  1. allow inverted index for key columns for MOR unique table
  2. allow inverted index without parser for value columns for MOR unique table, and do not push down predicate if it may cause wrong result
  3. disallow inverted index with parser for value columns for MOR unique table, due to MATCH without predicate push down is very slow.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@xiaokang
Copy link
Contributor Author

run buildall

@xiaokang xiaokang requested a review from qidaye February 18, 2024 07:20
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.76% (8558/23929)
Line Coverage: 27.73% (69425/250402)
Region Coverage: 26.85% (36026/134183)
Branch Coverage: 23.66% (18424/77886)
Coverage Report: http://coverage.selectdb-in.cc/coverage/65440c76f6eafe4ca4eb52e30e4aa1a32b14e429_65440c76f6eafe4ca4eb52e30e4aa1a32b14e429/report/index.html

qidaye
qidaye previously approved these changes Feb 18, 2024
Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 18, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

throw new AnalysisException("index should only be used in columns of DUP_KEYS/UNIQUE_KEYS table"
+ " or key columns of AGG_KEYS table. invalid index: " + indexName);
} else if (keysType == KeysType.UNIQUE_KEYS && !enableUniqueKeyMergeOnWrite
&& indexType == IndexType.INVERTED && properties != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check inverted index here? If this is not an inverted index, for example, bloomfilter index (or other index we added in the future) , it is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and found that index can be used for value columns of mor unique table, since there is a guard for possible wrong result: _should_push_down_value_predicates().

BetaRowsetReader::get_segment_iterators(...) {
  // ...
  if (_should_push_down_value_predicates()) {
        if (_read_context->value_predicates != nullptr) {
            _read_options.column_predicates.insert(_read_options.column_predicates.end(),
                                                   _read_context->value_predicates->begin(),
                                                   _read_context->value_predicates->end());
            for (auto pred : *(_read_context->value_predicates)) {
                if (_read_options.col_id_to_predicates.count(pred->column_id()) < 1) {
                    _read_options.col_id_to_predicates.insert(
                            {pred->column_id(), std::make_shared<AndBlockColumnPredicate>()});
                }
                auto single_column_block_predicate = new SingleColumnBlockPredicate(pred);
                _read_options.col_id_to_predicates[pred->column_id()]->add_column_predicate(
                        single_column_block_predicate);
            }
        }
    }
    // ...
}

bool BetaRowsetReader::_should_push_down_value_predicates() const {
    // if unique table with rowset [0-x] or [0-1] [2-y] [...],
    // value column predicates can be pushdown on rowset [0-x] or [2-y], [2-y]
    // must be compaction, not overlapping and don't have sequence column
    return _rowset->keys_type() == UNIQUE_KEYS &&
           (((_rowset->start_version() == 0 || _rowset->start_version() == 2) &&
             !_rowset->_rowset_meta->is_segments_overlapping() &&
             _read_context->sequence_id_idx == -1) ||
            _read_context->enable_unique_key_merge_on_write);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _should_push_down_value_predicates() is false, predicate on value column can not be pushed down to storage layer where index is applied. So it's safe to use index on value column. But it's too slow for MATCH query if index is not applied, so do not allow inverted index with parser.

@xiaokang
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Feb 20, 2024
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.77% (8558/23925)
Line Coverage: 27.72% (69413/250434)
Region Coverage: 26.83% (36015/134232)
Branch Coverage: 23.65% (18423/77910)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b43b010f3e1fc1d2df22a6487db78803f1c03e6f_b43b010f3e1fc1d2df22a6487db78803f1c03e6f/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41313 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b43b010f3e1fc1d2df22a6487db78803f1c03e6f, data reload: false

------ Round 1 ----------------------------------
q1	17634	5020	4864	4864
q2	2037	137	133	133
q3	10585	1034	1001	1001
q4	4645	969	970	969
q5	7666	3214	3264	3214
q6	194	136	138	136
q7	1256	784	758	758
q8	9246	2070	2053	2053
q9	7556	6678	6663	6663
q10	8302	2640	2645	2640
q11	413	208	229	208
q12	713	329	333	329
q13	17997	3649	3669	3649
q14	287	270	261	261
q15	631	542	564	542
q16	475	406	420	406
q17	920	846	859	846
q18	7440	6594	6664	6594
q19	1537	1488	1493	1488
q20	596	342	334	334
q21	6786	3879	3935	3879
q22	882	352	346	346
Total cold run time: 107798 ms
Total hot run time: 41313 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4833	4794	4852	4794
q2	294	186	191	186
q3	3585	3566	3561	3561
q4	2481	2494	2507	2494
q5	5772	5735	5742	5735
q6	212	127	131	127
q7	2256	1643	1613	1613
q8	2984	3072	3076	3072
q9	8710	8703	8678	8678
q10	6762	4210	4223	4210
q11	515	386	386	386
q12	763	557	570	557
q13	5295	3449	3412	3412
q14	267	247	231	231
q15	628	492	495	492
q16	478	439	470	439
q17	1672	1592	1601	1592
q18	8358	7525	7584	7525
q19	1628	1631	1627	1627
q20	2124	1834	1833	1833
q21	6523	6128	6137	6128
q22	580	518	532	518
Total cold run time: 66720 ms
Total hot run time: 59210 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 177143 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b43b010f3e1fc1d2df22a6487db78803f1c03e6f, data reload: false

query1	937	344	341	341
query2	6517	1693	1755	1693
query3	6687	208	201	201
query4	23245	21160	21087	21087
query5	4236	374	371	371
query6	264	167	160	160
query7	4604	295	294	294
query8	246	191	196	191
query9	8438	2818	2802	2802
query10	412	225	226	225
query11	15177	14599	14502	14502
query12	146	85	83	83
query13	1686	429	418	418
query14	9422	7712	7940	7712
query15	221	186	195	186
query16	7467	258	250	250
query17	1415	559	531	531
query18	1962	277	275	275
query19	193	147	155	147
query20	87	79	89	79
query21	185	126	118	118
query22	4913	4718	4706	4706
query23	32495	31501	31464	31464
query24	12664	3439	3353	3353
query25	647	359	361	359
query26	1873	157	162	157
query27	3044	321	323	321
query28	6623	1843	1828	1828
query29	1117	614	616	614
query30	274	138	146	138
query31	932	759	745	745
query32	91	58	55	55
query33	724	233	238	233
query34	1050	487	494	487
query35	948	821	837	821
query36	1002	891	883	883
query37	165	60	62	60
query38	3266	3185	3168	3168
query39	1371	1340	1317	1317
query40	289	107	106	106
query41	36	33	34	33
query42	103	103	101	101
query43	483	456	449	449
query44	1056	682	695	682
query45	194	184	176	176
query46	1033	775	744	744
query47	1656	1528	1636	1528
query48	411	347	350	347
query49	1214	297	296	296
query50	765	390	385	385
query51	5301	5176	5195	5176
query52	107	95	93	93
query53	400	304	306	304
query54	305	224	227	224
query55	80	79	80	79
query56	220	198	198	198
query57	1042	979	905	905
query58	211	203	207	203
query59	2316	2142	2233	2142
query60	247	220	208	208
query61	82	81	81	81
query62	602	372	369	369
query63	333	281	294	281
query64	6306	3018	3135	3018
query65	3288	3270	3237	3237
query66	1335	328	316	316
query67	14573	14107	14247	14107
query68	5172	570	556	556
query69	523	365	351	351
query70	1245	1187	1229	1187
query71	407	259	260	259
query72	6347	2766	2614	2614
query73	705	314	309	309
query74	6805	6403	6401	6401
query75	3203	2583	2552	2552
query76	3252	1104	1207	1104
query77	360	242	228	228
query78	9324	8755	8780	8755
query79	990	515	502	502
query80	659	365	357	357
query81	440	200	203	200
query82	1000	88	87	87
query83	240	128	127	127
query84	228	78	78	78
query85	1122	346	336	336
query86	295	315	309	309
query87	3456	3259	3258	3258
query88	2771	2279	2273	2273
query89	445	375	358	358
query90	2049	163	163	163
query91	154	124	131	124
query92	54	52	53	52
query93	1024	523	488	488
query94	1132	182	181	181
query95	482	8696	372	372
query96	567	264	264	264
query97	4433	4234	4276	4234
query98	220	196	194	194
query99	1111	729	707	707
Total cold run time: 270949 ms
Total hot run time: 177143 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b43b010f3e1fc1d2df22a6487db78803f1c03e6f, data reload: false

query1	0.03	0.02	0.02
query2	0.06	0.03	0.03
query3	0.22	0.08	0.08
query4	1.66	0.08	0.08
query5	0.49	0.47	0.47
query6	1.38	0.61	0.63
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.53	0.45	0.44
query10	0.49	0.49	0.50
query11	0.12	0.10	0.10
query12	0.12	0.10	0.10
query13	0.58	0.59	0.58
query14	0.76	0.79	0.79
query15	0.82	0.79	0.79
query16	0.33	0.33	0.32
query17	0.90	0.86	0.91
query18	0.18	0.17	0.18
query19	1.73	1.70	1.68
query20	0.02	0.01	0.01
query21	15.41	0.64	0.56
query22	2.77	3.92	2.34
query23	17.69	1.08	0.98
query24	2.01	0.36	0.36
query25	0.63	0.07	0.05
query26	0.18	0.16	0.14
query27	0.06	0.05	0.04
query28	12.18	0.77	0.84
query29	12.56	3.32	3.35
query30	0.53	0.51	0.48
query31	2.78	0.37	0.37
query32	3.36	0.48	0.47
query33	3.14	3.15	3.13
query34	15.38	4.47	4.51
query35	4.51	4.49	4.49
query36	1.06	0.95	0.94
query37	0.07	0.05	0.05
query38	0.04	0.03	0.03
query39	0.02	0.01	0.02
query40	0.17	0.14	0.15
query41	0.08	0.01	0.02
query42	0.02	0.02	0.01
query43	0.02	0.02	0.02
Total cold run time: 105.15 s
Total hot run time: 30.84 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit b43b010f3e1fc1d2df22a6487db78803f1c03e6f with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       13.8 seconds inserted 10000000 Rows, about 724K ops/s

@xiaokang
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.76% (8557/23927)
Line Coverage: 27.72% (69423/250449)
Region Coverage: 26.84% (36027/134250)
Branch Coverage: 23.64% (18424/77926)
Coverage Report: http://coverage.selectdb-in.cc/coverage/19ce1a6377acd9a0dfc8eb590ffa8b7d05a614e7_19ce1a6377acd9a0dfc8eb590ffa8b7d05a614e7/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41298 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 19ce1a6377acd9a0dfc8eb590ffa8b7d05a614e7, data reload: false

------ Round 1 ----------------------------------
q1	17707	4920	4984	4920
q2	2034	146	130	130
q3	10587	1033	1032	1032
q4	4649	965	978	965
q5	7693	3185	3257	3185
q6	196	131	126	126
q7	1242	771	765	765
q8	9244	2067	2056	2056
q9	7588	6694	6671	6671
q10	8311	2645	2645	2645
q11	421	208	211	208
q12	711	330	322	322
q13	17984	3678	3689	3678
q14	288	259	261	259
q15	634	509	548	509
q16	468	403	404	403
q17	926	851	843	843
q18	7422	6731	6528	6528
q19	1535	1473	1494	1473
q20	602	359	326	326
q21	6560	3919	3957	3919
q22	882	341	335	335
Total cold run time: 107684 ms
Total hot run time: 41298 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4870	4866	4834	4834
q2	294	181	179	179
q3	3591	3571	3582	3571
q4	2547	2524	2540	2524
q5	5760	5737	5759	5737
q6	213	128	123	123
q7	2232	1668	1674	1668
q8	3003	3075	3120	3075
q9	8744	8729	8667	8667
q10	6819	4249	4240	4240
q11	528	371	386	371
q12	787	543	541	541
q13	4287	3380	3431	3380
q14	261	240	233	233
q15	592	504	500	500
q16	491	429	426	426
q17	1683	1614	1620	1614
q18	8269	7696	7531	7531
q19	1625	1622	1619	1619
q20	2099	1851	1811	1811
q21	6561	6150	6155	6150
q22	574	523	535	523
Total cold run time: 65830 ms
Total hot run time: 59317 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 178834 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 19ce1a6377acd9a0dfc8eb590ffa8b7d05a614e7, data reload: false

query1	921	348	340	340
query2	6522	1699	1732	1699
query3	6688	205	201	201
query4	22930	21211	21167	21167
query5	4264	455	468	455
query6	261	164	160	160
query7	4603	294	297	294
query8	252	195	194	194
query9	8420	2749	2742	2742
query10	419	209	217	209
query11	14990	14669	14633	14633
query12	148	85	86	85
query13	1705	424	430	424
query14	9120	7758	7726	7726
query15	217	194	187	187
query16	7605	282	261	261
query17	2173	584	537	537
query18	2245	266	260	260
query19	189	155	148	148
query20	85	82	85	82
query21	190	124	116	116
query22	4920	4873	4866	4866
query23	32728	32496	32402	32402
query24	13049	3412	3346	3346
query25	646	366	352	352
query26	1926	156	158	156
query27	3048	316	317	316
query28	6654	1816	1810	1810
query29	1150	617	617	617
query30	282	133	145	133
query31	927	745	767	745
query32	99	64	57	57
query33	723	240	230	230
query34	1077	492	493	492
query35	933	835	806	806
query36	971	871	879	871
query37	269	60	62	60
query38	3311	3201	3193	3193
query39	1369	1334	1327	1327
query40	290	106	108	106
query41	36	34	34	34
query42	111	99	99	99
query43	469	451	457	451
query44	1074	680	693	680
query45	196	186	179	179
query46	1058	777	748	748
query47	1658	1622	1567	1567
query48	417	332	344	332
query49	1217	304	304	304
query50	779	373	371	371
query51	5368	5158	5202	5158
query52	116	94	91	91
query53	393	298	295	295
query54	287	230	228	228
query55	84	82	80	80
query56	218	203	200	200
query57	1083	929	951	929
query58	208	202	198	198
query59	2251	2110	2150	2110
query60	235	209	221	209
query61	85	84	82	82
query62	590	384	357	357
query63	317	280	290	280
query64	6442	3078	3148	3078
query65	3288	3234	3231	3231
query66	1336	326	333	326
query67	14477	14354	14426	14354
query68	5124	552	567	552
query69	525	354	355	354
query70	1226	1168	1213	1168
query71	467	251	247	247
query72	6378	2807	2610	2610
query73	700	308	312	308
query74	6996	6435	6442	6435
query75	3263	2554	2555	2554
query76	3337	1116	1211	1116
query77	490	240	231	231
query78	9493	8834	8760	8760
query79	968	494	508	494
query80	511	361	342	342
query81	441	209	202	202
query82	233	83	84	83
query83	139	123	121	121
query84	228	77	78	77
query85	1038	345	344	344
query86	316	289	300	289
query87	3529	3338	3328	3328
query88	2678	2269	2272	2269
query89	433	351	351	351
query90	2000	161	167	161
query91	156	132	127	127
query92	56	51	47	47
query93	964	509	512	509
query94	1270	178	179	178
query95	480	377	8666	377
query96	574	261	261	261
query97	4416	4289	4253	4253
query98	212	197	198	197
query99	1077	713	738	713
Total cold run time: 271830 ms
Total hot run time: 178834 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.87 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 19ce1a6377acd9a0dfc8eb590ffa8b7d05a614e7, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.02	0.02
query3	0.23	0.08	0.08
query4	1.64	0.08	0.08
query5	0.50	0.47	0.47
query6	1.38	0.62	0.62
query7	0.02	0.01	0.02
query8	0.04	0.02	0.02
query9	0.53	0.45	0.47
query10	0.50	0.50	0.49
query11	0.13	0.10	0.09
query12	0.12	0.10	0.10
query13	0.59	0.59	0.58
query14	0.77	0.81	0.78
query15	0.83	0.81	0.79
query16	0.33	0.33	0.34
query17	0.91	0.91	0.91
query18	0.19	0.15	0.19
query19	1.68	1.65	1.64
query20	0.02	0.01	0.01
query21	15.40	0.61	0.59
query22	2.82	3.84	3.05
query23	17.38	1.09	0.98
query24	2.01	0.57	0.37
query25	0.63	0.06	0.06
query26	0.17	0.15	0.15
query27	0.05	0.05	0.06
query28	11.99	0.86	0.85
query29	12.75	3.42	3.29
query30	0.55	0.48	0.50
query31	2.78	0.37	0.38
query32	3.33	0.47	0.49
query33	3.09	3.13	3.16
query34	15.38	4.56	4.54
query35	4.58	4.59	4.59
query36	1.11	1.00	0.97
query37	0.08	0.05	0.05
query38	0.04	0.04	0.03
query39	0.03	0.02	0.01
query40	0.18	0.15	0.16
query41	0.06	0.01	0.02
query42	0.03	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 104.98 s
Total hot run time: 31.87 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 19ce1a6377acd9a0dfc8eb590ffa8b7d05a614e7 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          60 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.6 seconds inserted 10000000 Rows, about 735K ops/s

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 21, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 602af27 into apache:master Feb 22, 2024
feiniaofeiafei pushed a commit to feiniaofeiafei/doris that referenced this pull request Feb 23, 2024
…#31051)

* [fix](index) Fix index for none key column of unique mor table  (apache#31035)

* disable INVERTED index with parser on value columns of MOR unique table

* add debug log for test_build_index

* add debug log

* only do index compaction for dup and mow
xiaokang added a commit to xiaokang/doris that referenced this pull request Feb 23, 2024
…#31051)

* [fix](index) Fix index for none key column of unique mor table  (apache#31035)

* disable INVERTED index with parser on value columns of MOR unique table

* add debug log for test_build_index

* add debug log

* only do index compaction for dup and mow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.5 reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants