Skip to content

Conversation

@924060929
Copy link
Contributor

cherry pick from #47608

…#47608)

This pr can speedup huge InPredicate for partition pruning, for this
sql:
```sql
select *
from tbl
where dt in ('2024-01-02 01:00:00',  ... , '2024-05-02 02:00:00') -- about 2k literals
```

In my test case, use 2k literals query the table with 20k hour (range /
list) partitions, this pr can speed up from 12s to 160ms, if disable
binary search filtering, and speed up from 6s to 160ms if enable binary
search filtering

The changes:
1. add `SessionVariable.enable_binary_search_filtering_partitions` to
disable binary search filtering partitions
2. cache the Set/RangeSet of the `InPredicate.options`, instead of
foreach options to evaluate it for every partition
3. use big RangeSet to intersect small RangeSet, the order is not
trivial, because the TreeRangeSet is red-black tree, the right side need
to foreach all items, and the left side can search fast
4. skip prune tablets if the number of InPredicate.options >
`Config.max_distribution_pruner_recursion_depth`, because it tend to hit
all buckets

(cherry picked from commit 5ec7d34)
@924060929 924060929 requested a review from morrySnow as a code owner July 1, 2025 09:04
@924060929
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 39899 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 057ee31aabbe796e70f4493f9f3e505ae8e128e7, data reload: false

------ Round 1 ----------------------------------
q1	17608	7158	6635	6635
q2	2063	218	161	161
q3	10495	1147	1190	1147
q4	10222	804	758	758
q5	7717	2930	2880	2880
q6	216	137	137	137
q7	1017	623	616	616
q8	9460	1966	2076	1966
q9	6680	6392	6396	6392
q10	6994	2238	2287	2238
q11	455	271	271	271
q12	408	210	210	210
q13	17787	2975	2996	2975
q14	228	204	202	202
q15	502	462	469	462
q16	451	380	376	376
q17	988	629	588	588
q18	7296	6611	6614	6611
q19	1321	971	952	952
q20	496	214	208	208
q21	3999	3177	3142	3142
q22	1068	982	972	972
Total cold run time: 107471 ms
Total hot run time: 39899 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6595	6617	6569	6569
q2	337	231	233	231
q3	2908	2982	2990	2982
q4	2096	1819	1750	1750
q5	5729	5686	5724	5686
q6	204	124	129	124
q7	2162	1777	1781	1777
q8	3295	3479	3499	3479
q9	8914	8857	8869	8857
q10	3548	3519	3516	3516
q11	589	492	494	492
q12	805	625	620	620
q13	7919	3107	3075	3075
q14	293	273	269	269
q15	496	461	483	461
q16	497	440	441	440
q17	1845	1622	1626	1622
q18	8168	7747	7851	7747
q19	1725	1621	1657	1621
q20	2115	1903	1900	1900
q21	5423	4916	4886	4886
q22	1113	998	1031	998
Total cold run time: 66776 ms
Total hot run time: 59102 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191189 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 057ee31aabbe796e70f4493f9f3e505ae8e128e7, data reload: false

query1	971	377	387	377
query2	6516	1973	2000	1973
query3	6704	218	220	218
query4	34254	23590	23507	23507
query5	4335	476	464	464
query6	281	183	191	183
query7	4635	334	325	325
query8	313	234	233	233
query9	9668	2617	2618	2617
query10	506	281	262	262
query11	18175	15181	15241	15181
query12	155	108	107	107
query13	1668	434	431	431
query14	9882	6741	6862	6741
query15	228	181	193	181
query16	7933	465	497	465
query17	1630	587	572	572
query18	2072	310	320	310
query19	219	158	163	158
query20	119	112	109	109
query21	210	107	105	105
query22	4472	4145	4067	4067
query23	34239	34360	33543	33543
query24	11804	2960	2900	2900
query25	703	414	415	414
query26	1666	178	179	178
query27	3013	353	350	350
query28	7998	2157	2147	2147
query29	927	462	466	462
query30	336	164	162	162
query31	1008	806	819	806
query32	100	63	63	63
query33	806	320	320	320
query34	910	518	537	518
query35	867	729	734	729
query36	1101	959	944	944
query37	129	80	77	77
query38	3942	3874	3841	3841
query39	1492	1434	1436	1434
query40	288	109	103	103
query41	55	55	53	53
query42	121	106	104	104
query43	545	487	488	487
query44	1296	805	817	805
query45	185	174	171	171
query46	1167	728	720	720
query47	1919	1826	1835	1826
query48	446	347	349	347
query49	1291	428	413	413
query50	824	446	440	440
query51	7297	7194	7165	7165
query52	107	95	96	95
query53	272	191	190	190
query54	1177	510	484	484
query55	77	75	82	75
query56	265	250	245	245
query57	1285	1139	1151	1139
query58	242	215	215	215
query59	3116	2840	2969	2840
query60	277	262	264	262
query61	135	112	151	112
query62	904	675	707	675
query63	223	190	191	190
query64	5240	654	648	648
query65	3328	3214	3200	3200
query66	1405	316	312	312
query67	15976	15835	15663	15663
query68	4673	599	603	599
query69	463	275	272	272
query70	1187	1108	1152	1108
query71	355	263	266	263
query72	6305	4013	4018	4013
query73	759	359	369	359
query74	10265	9191	9062	9062
query75	3382	2637	2702	2637
query76	2968	1061	1059	1059
query77	415	297	283	283
query78	10480	9600	9614	9600
query79	1324	587	606	587
query80	836	453	437	437
query81	528	223	224	223
query82	415	93	90	90
query83	261	160	166	160
query84	237	82	77	77
query85	1360	308	301	301
query86	434	293	270	270
query87	4416	4266	4242	4242
query88	4443	2418	2375	2375
query89	402	296	296	296
query90	2058	196	193	193
query91	139	107	114	107
query92	69	54	59	54
query93	1645	592	573	573
query94	884	314	304	304
query95	363	280	274	274
query96	620	282	277	277
query97	3328	3168	3144	3144
query98	219	192	195	192
query99	1500	1352	1305	1305
Total cold run time: 302183 ms
Total hot run time: 191189 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 057ee31aabbe796e70f4493f9f3e505ae8e128e7, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.23	0.06	0.07
query4	1.63	0.10	0.10
query5	0.51	0.51	0.50
query6	1.13	0.73	0.73
query7	0.04	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.50	0.51
query10	0.56	0.55	0.56
query11	0.14	0.10	0.10
query12	0.16	0.11	0.10
query13	0.61	0.58	0.60
query14	0.78	0.78	0.81
query15	0.85	0.82	0.84
query16	0.38	0.39	0.36
query17	1.06	1.07	1.07
query18	0.23	0.22	0.21
query19	1.99	1.89	1.86
query20	0.01	0.01	0.00
query21	15.39	0.58	0.58
query22	2.78	1.75	1.46
query23	17.14	0.91	1.05
query24	2.73	1.15	1.47
query25	0.27	0.18	0.06
query26	0.39	0.15	0.15
query27	0.05	0.06	0.03
query28	10.22	0.54	0.47
query29	12.54	3.27	3.24
query30	0.25	0.06	0.06
query31	2.86	0.39	0.40
query32	3.23	0.47	0.46
query33	3.00	3.02	3.02
query34	16.94	4.48	4.48
query35	4.56	4.55	4.49
query36	0.68	0.47	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.04
query39	0.04	0.02	0.03
query40	0.16	0.12	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.03
query43	0.03	0.03	0.03
Total cold run time: 104.54 s
Total hot run time: 30.15 s

@morrySnow morrySnow changed the title [opt](nereids) speedup huge InPredicate for partition pruning (#47608) branch-3.1: [opt](nereids) speedup huge InPredicate for partition pruning #47608 Jul 2, 2025
@morrySnow morrySnow merged commit c1c609f into apache:branch-3.1 Jul 2, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants