Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Jul 24, 2024

PR #34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many scan ranges being scheduled.

This PR mainly changes:

  1. Add scan range merging logic back.
  2. Change the default file split size from 8MB to 64MB, to avoid too many small split.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

LOG(INFO) << "Merge " << scan_ranges.size() << " scan ranges to " << _scan_ranges.size();
}

protected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
protected:
Additional context

be/src/vec/exec/scan/split_source_connector.h:46: previously declared here

protected:
^

@morningman morningman marked this pull request as ready for review July 30, 2024 15:42
@morningman
Copy link
Contributor Author

run buildall

1 similar comment
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41631 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 21402675ed31fc49eabfae33f4e82d3e12acadae, data reload: false

------ Round 1 ----------------------------------
q1	17631	4118	4066	4066
q2	2036	212	209	209
q3	10543	1289	1371	1289
q4	10203	809	972	809
q5	7638	2936	2972	2936
q6	220	140	138	138
q7	1045	602	608	602
q8	9447	1932	1898	1898
q9	8467	6608	6626	6608
q10	8753	3821	3816	3816
q11	435	251	256	251
q12	402	230	221	221
q13	17764	2953	2958	2953
q14	270	240	242	240
q15	526	488	490	488
q16	516	382	382	382
q17	978	921	905	905
q18	8034	7300	7249	7249
q19	1859	1229	1224	1224
q20	564	327	357	327
q21	5399	4744	4738	4738
q22	348	282	282	282
Total cold run time: 113078 ms
Total hot run time: 41631 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4092	4016	4026	4016
q2	329	225	226	225
q3	2971	2994	3163	2994
q4	1981	2009	1973	1973
q5	5586	5500	5448	5448
q6	221	129	129	129
q7	2154	1784	1820	1784
q8	3345	3420	3372	3372
q9	8659	8696	8710	8696
q10	3966	4057	3974	3974
q11	540	444	446	444
q12	745	584	620	584
q13	16301	3164	3107	3107
q14	311	281	282	281
q15	540	486	513	486
q16	478	417	434	417
q17	1761	1728	1728	1728
q18	8180	7801	7744	7744
q19	1838	1718	1714	1714
q20	2050	1859	1904	1859
q21	5697	5525	5401	5401
q22	518	454	463	454
Total cold run time: 72263 ms
Total hot run time: 56830 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169742 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 21402675ed31fc49eabfae33f4e82d3e12acadae, data reload: false

query1	915	371	368	368
query2	6479	1716	1670	1670
query3	6653	213	231	213
query4	20001	17266	17275	17266
query5	3706	533	522	522
query6	282	177	205	177
query7	4601	294	314	294
query8	265	215	210	210
query9	8525	2393	2379	2379
query10	424	275	280	275
query11	10534	10088	10020	10020
query12	128	98	87	87
query13	1632	391	395	391
query14	8825	7075	7460	7075
query15	202	163	170	163
query16	6802	458	406	406
query17	957	565	561	561
query18	1901	285	291	285
query19	199	151	150	150
query20	96	91	87	87
query21	205	102	102	102
query22	4085	3941	3892	3892
query23	33790	33735	33388	33388
query24	10647	3106	3082	3082
query25	712	467	453	453
query26	1832	154	159	154
query27	3055	284	285	284
query28	7512	2048	2031	2031
query29	1383	468	444	444
query30	239	160	162	160
query31	949	795	765	765
query32	109	58	59	58
query33	674	335	330	330
query34	956	495	507	495
query35	907	778	784	778
query36	1032	895	913	895
query37	295	91	85	85
query38	2933	2893	2793	2793
query39	877	836	838	836
query40	258	116	114	114
query41	50	45	46	45
query42	126	100	101	100
query43	460	407	422	407
query44	1196	738	751	738
query45	216	188	178	178
query46	1083	826	775	775
query47	1809	1722	1726	1722
query48	455	293	298	293
query49	940	420	409	409
query50	917	419	426	419
query51	6990	6738	6579	6579
query52	105	89	91	89
query53	254	183	179	179
query54	626	450	442	442
query55	78	72	76	72
query56	271	260	256	256
query57	1168	1032	1038	1032
query58	276	276	275	275
query59	2567	2379	2406	2379
query60	289	283	270	270
query61	124	96	98	96
query62	890	647	653	647
query63	216	195	185	185
query64	5649	1912	1914	1912
query65	3209	3118	3104	3104
query66	1306	340	324	324
query67	15328	14891	14862	14862
query68	5985	582	576	576
query69	785	390	309	309
query70	1161	1114	1062	1062
query71	487	276	270	270
query72	8178	2684	2508	2508
query73	896	329	332	329
query74	5981	5612	5662	5612
query75	4264	2744	2720	2720
query76	3815	1389	1393	1389
query77	715	317	317	317
query78	9453	9005	8813	8813
query79	3406	530	538	530
query80	1174	501	544	501
query81	566	230	224	224
query82	697	131	126	126
query83	337	181	172	172
query84	272	78	80	78
query85	1492	311	300	300
query86	438	285	301	285
query87	3222	3140	3099	3099
query88	3738	2408	2397	2397
query89	448	305	300	300
query90	1986	187	192	187
query91	124	102	99	99
query92	65	52	53	52
query93	5011	630	643	630
query94	925	288	306	288
query95	384	262	268	262
query96	620	279	279	279
query97	3208	3011	3067	3011
query98	225	197	188	188
query99	1603	1302	1310	1302
Total cold run time: 273836 ms
Total hot run time: 169742 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.52 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 21402675ed31fc49eabfae33f4e82d3e12acadae, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.49	0.49	0.49
query6	1.13	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.57	0.50	0.51
query10	0.55	0.55	0.56
query11	0.15	0.11	0.12
query12	0.16	0.12	0.12
query13	0.61	0.60	0.59
query14	0.78	0.80	0.78
query15	0.90	0.86	0.85
query16	0.35	0.35	0.35
query17	1.01	1.00	1.02
query18	0.21	0.21	0.21
query19	1.88	1.75	1.74
query20	0.02	0.01	0.01
query21	15.40	0.79	0.66
query22	3.96	7.30	1.94
query23	18.05	1.28	1.35
query24	2.29	0.22	0.21
query25	0.19	0.09	0.07
query26	0.32	0.22	0.21
query27	0.46	0.23	0.24
query28	13.17	1.01	0.96
query29	12.51	3.29	3.31
query30	0.27	0.05	0.05
query31	2.86	0.40	0.41
query32	3.25	0.48	0.48
query33	2.96	2.89	2.96
query34	15.47	4.25	4.25
query35	4.32	4.27	4.35
query36	0.68	0.48	0.48
query37	0.19	0.17	0.16
query38	0.16	0.16	0.14
query39	0.04	0.04	0.04
query40	0.16	0.12	0.13
query41	0.09	0.04	0.05
query42	0.05	0.05	0.06
query43	0.04	0.05	0.04
Total cold run time: 107.78 s
Total hot run time: 30.52 s

@morningman
Copy link
Contributor Author

run buildall

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41571 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 16c3efd5fd162a82c1d1115d29c2df13ffa6c8ea, data reload: false

------ Round 1 ----------------------------------
q1	17601	4900	4094	4094
q2	2029	202	209	202
q3	10431	1311	1347	1311
q4	10169	813	971	813
q5	7652	3021	2980	2980
q6	219	134	136	134
q7	1038	612	595	595
q8	9452	1785	1944	1785
q9	8570	6658	6593	6593
q10	8772	3840	3835	3835
q11	437	248	246	246
q12	433	224	219	219
q13	17765	2937	2954	2937
q14	269	241	246	241
q15	534	477	472	472
q16	510	381	389	381
q17	1005	938	926	926
q18	8001	7241	7331	7241
q19	1427	1228	1220	1220
q20	557	331	336	331
q21	5265	4747	4740	4740
q22	357	289	275	275
Total cold run time: 112493 ms
Total hot run time: 41571 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4085	4046	4073	4046
q2	328	235	222	222
q3	2995	2998	3185	2998
q4	2023	2040	1968	1968
q5	5675	5482	5446	5446
q6	220	129	128	128
q7	2136	1820	1791	1791
q8	3356	3405	3404	3404
q9	8667	8666	8805	8666
q10	3936	4079	3947	3947
q11	550	464	500	464
q12	774	616	599	599
q13	16557	3130	3159	3130
q14	310	270	266	266
q15	526	475	479	475
q16	464	404	415	404
q17	1764	1764	1741	1741
q18	8152	7775	7706	7706
q19	1717	1738	1762	1738
q20	2030	1842	1819	1819
q21	5776	5499	5244	5244
q22	521	473	451	451
Total cold run time: 72562 ms
Total hot run time: 56653 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169772 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 16c3efd5fd162a82c1d1115d29c2df13ffa6c8ea, data reload: false

query1	923	367	361	361
query2	6456	1773	1787	1773
query3	6653	210	219	210
query4	20304	17341	17181	17181
query5	3619	511	513	511
query6	271	161	163	161
query7	4608	284	290	284
query8	248	198	201	198
query9	8526	2361	2366	2361
query10	432	292	274	274
query11	10546	9928	9910	9910
query12	123	88	88	88
query13	1644	374	368	368
query14	9939	7632	7027	7027
query15	205	165	171	165
query16	6964	434	472	434
query17	944	565	558	558
query18	1916	289	282	282
query19	198	162	150	150
query20	90	91	85	85
query21	209	102	101	101
query22	4483	4092	3907	3907
query23	33747	34124	33748	33748
query24	9704	3172	3072	3072
query25	655	395	391	391
query26	1187	155	162	155
query27	2643	269	276	269
query28	7612	1993	1997	1993
query29	998	442	414	414
query30	238	162	159	159
query31	965	762	785	762
query32	124	51	55	51
query33	678	318	331	318
query34	944	488	500	488
query35	881	775	736	736
query36	1057	883	853	853
query37	174	80	79	79
query38	2913	2807	2762	2762
query39	870	797	828	797
query40	204	111	109	109
query41	45	42	42	42
query42	113	97	103	97
query43	468	422	431	422
query44	1194	724	736	724
query45	213	174	176	174
query46	1084	852	771	771
query47	1808	1737	1767	1737
query48	365	291	289	289
query49	889	421	409	409
query50	899	424	424	424
query51	6867	6765	6670	6670
query52	97	88	109	88
query53	258	181	181	181
query54	596	450	440	440
query55	75	74	74	74
query56	280	250	247	247
query57	1130	1042	1052	1042
query58	284	261	261	261
query59	2658	2434	2513	2434
query60	284	275	262	262
query61	96	90	95	90
query62	883	663	658	658
query63	214	179	193	179
query64	4736	1938	1838	1838
query65	3197	3079	3162	3079
query66	964	333	330	330
query67	15205	15033	15123	15033
query68	5817	570	572	570
query69	708	392	297	297
query70	1132	1049	1065	1049
query71	457	270	277	270
query72	7726	2665	2472	2472
query73	915	325	323	323
query74	6074	5640	5557	5557
query75	3592	2746	2728	2728
query76	3334	1395	1415	1395
query77	573	304	357	304
query78	9399	8982	8956	8956
query79	2270	552	527	527
query80	857	508	506	506
query81	568	225	223	223
query82	1176	129	130	129
query83	194	169	177	169
query84	273	77	77	77
query85	1338	310	296	296
query86	472	287	278	278
query87	3257	3135	3134	3134
query88	3703	2376	2451	2376
query89	395	300	285	285
query90	1760	184	186	184
query91	124	102	97	97
query92	62	48	49	48
query93	1988	611	614	611
query94	762	285	300	285
query95	386	269	264	264
query96	606	283	275	275
query97	3185	3086	3066	3066
query98	222	203	209	203
query99	1608	1278	1314	1278
Total cold run time: 264730 ms
Total hot run time: 169772 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.73 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 16c3efd5fd162a82c1d1115d29c2df13ffa6c8ea, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.05	0.04
query3	0.22	0.05	0.05
query4	1.68	0.06	0.07
query5	0.48	0.48	0.48
query6	1.13	0.72	0.72
query7	0.02	0.01	0.01
query8	0.06	0.05	0.04
query9	0.58	0.50	0.50
query10	0.56	0.57	0.57
query11	0.16	0.11	0.12
query12	0.15	0.12	0.13
query13	0.61	0.60	0.60
query14	0.77	0.80	0.79
query15	0.91	0.85	0.86
query16	0.36	0.35	0.36
query17	1.02	1.02	0.97
query18	0.22	0.22	0.21
query19	1.80	1.77	1.69
query20	0.01	0.00	0.02
query21	15.43	0.75	0.66
query22	4.09	6.79	1.16
query23	17.67	1.32	1.35
query24	2.26	0.22	0.22
query25	0.19	0.08	0.08
query26	0.31	0.21	0.20
query27	0.45	0.24	0.24
query28	13.15	0.98	0.96
query29	12.58	3.26	3.26
query30	0.25	0.06	0.05
query31	2.87	0.41	0.41
query32	3.24	0.48	0.49
query33	2.96	2.98	2.94
query34	15.43	4.24	4.28
query35	4.31	4.28	4.28
query36	0.67	0.46	0.49
query37	0.19	0.17	0.16
query38	0.16	0.15	0.14
query39	0.04	0.04	0.03
query40	0.16	0.12	0.14
query41	0.09	0.04	0.04
query42	0.06	0.05	0.06
query43	0.05	0.04	0.04
Total cold run time: 107.47 s
Total hot run time: 29.73 s

@morningman morningman added the usercase Important user case type label label Aug 1, 2024
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41714 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 340f80494f1d5f1b50465cb5fac95ebcc7cf0b8d, data reload: false

------ Round 1 ----------------------------------
q1	17715	4200	4083	4083
q2	2021	201	200	200
q3	10443	1334	1359	1334
q4	10180	831	995	831
q5	7676	2900	2982	2900
q6	221	138	142	138
q7	1069	634	631	631
q8	9445	1880	1954	1880
q9	8561	6590	6584	6584
q10	8741	3858	3854	3854
q11	431	253	254	253
q12	432	236	238	236
q13	17756	2936	2944	2936
q14	269	240	238	238
q15	523	485	494	485
q16	537	387	382	382
q17	977	918	933	918
q18	8014	7320	7246	7246
q19	1927	1203	1221	1203
q20	565	322	340	322
q21	5292	4780	4896	4780
q22	362	280	282	280
Total cold run time: 113157 ms
Total hot run time: 41714 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4111	4040	4018	4018
q2	338	227	217	217
q3	2983	3040	3152	3040
q4	2018	2086	2009	2009
q5	5648	5450	5426	5426
q6	222	136	134	134
q7	2145	1782	1788	1782
q8	3339	3401	3348	3348
q9	8697	8665	8840	8665
q10	3928	4101	3905	3905
q11	576	450	468	450
q12	773	606	575	575
q13	16392	3095	3094	3094
q14	295	273	274	273
q15	524	496	486	486
q16	457	428	417	417
q17	1757	1722	1744	1722
q18	8171	7705	7677	7677
q19	1743	1756	1709	1709
q20	2030	1865	1894	1865
q21	5770	5536	5457	5457
q22	515	452	458	452
Total cold run time: 72432 ms
Total hot run time: 56721 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169578 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 340f80494f1d5f1b50465cb5fac95ebcc7cf0b8d, data reload: false

query1	906	368	364	364
query2	6455	1761	1732	1732
query3	6660	215	224	215
query4	19323	17551	17217	17217
query5	3624	532	523	523
query6	282	172	181	172
query7	4601	293	303	293
query8	270	210	192	192
query9	8501	2407	2381	2381
query10	432	261	269	261
query11	10431	10028	9943	9943
query12	120	95	90	90
query13	1649	385	376	376
query14	8382	7123	7475	7123
query15	208	167	167	167
query16	6819	456	431	431
query17	947	577	558	558
query18	1870	286	281	281
query19	197	153	147	147
query20	92	84	88	84
query21	206	105	103	103
query22	4136	3947	3932	3932
query23	33464	33971	33386	33386
query24	10370	3135	3073	3073
query25	683	390	416	390
query26	1758	156	154	154
query27	2903	283	294	283
query28	7613	2046	2018	2018
query29	1351	445	415	415
query30	241	157	153	153
query31	937	798	761	761
query32	108	55	54	54
query33	689	317	316	316
query34	926	499	509	499
query35	873	784	754	754
query36	1054	904	914	904
query37	216	82	84	82
query38	2962	2809	2874	2809
query39	871	821	832	821
query40	247	116	111	111
query41	46	42	44	42
query42	125	98	107	98
query43	468	426	452	426
query44	1177	734	735	734
query45	225	179	178	178
query46	1116	808	777	777
query47	1808	1704	1727	1704
query48	362	289	311	289
query49	942	411	425	411
query50	890	439	435	435
query51	6736	6621	6606	6606
query52	105	89	88	88
query53	258	193	183	183
query54	634	456	450	450
query55	78	74	75	74
query56	268	242	253	242
query57	1139	1049	1021	1021
query58	258	274	276	274
query59	2557	2401	2473	2401
query60	292	279	271	271
query61	97	95	96	95
query62	886	650	651	650
query63	219	186	189	186
query64	5616	1900	1884	1884
query65	3180	3097	3105	3097
query66	1307	321	324	321
query67	15210	14814	14893	14814
query68	5875	591	591	591
query69	713	371	308	308
query70	1114	1094	1037	1037
query71	490	281	275	275
query72	7645	2683	2519	2519
query73	905	331	324	324
query74	6034	5709	5682	5682
query75	3830	2773	2741	2741
query76	3810	1273	1286	1273
query77	562	316	311	311
query78	9347	8920	8881	8881
query79	2061	531	530	530
query80	1588	505	512	505
query81	569	229	225	225
query82	782	134	131	131
query83	248	166	169	166
query84	279	83	80	80
query85	1407	324	323	323
query86	463	299	289	289
query87	3277	3081	3098	3081
query88	3550	2472	2380	2380
query89	394	300	295	295
query90	1843	194	192	192
query91	127	101	102	101
query92	59	51	52	51
query93	1892	626	621	621
query94	751	300	284	284
query95	380	265	270	265
query96	608	275	284	275
query97	3280	3066	3045	3045
query98	223	205	199	199
query99	1606	1279	1281	1279
Total cold run time: 265088 ms
Total hot run time: 169578 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.13 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 340f80494f1d5f1b50465cb5fac95ebcc7cf0b8d, data reload: false

query1	0.04	0.05	0.04
query2	0.08	0.04	0.04
query3	0.23	0.04	0.04
query4	1.70	0.06	0.06
query5	0.49	0.48	0.50
query6	1.14	0.71	0.73
query7	0.03	0.02	0.01
query8	0.05	0.05	0.04
query9	0.59	0.51	0.51
query10	0.57	0.56	0.57
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.62	0.61	0.60
query14	0.76	0.80	0.80
query15	0.90	0.86	0.86
query16	0.35	0.35	0.36
query17	0.98	1.01	0.97
query18	0.23	0.21	0.22
query19	1.81	1.74	1.72
query20	0.00	0.01	0.01
query21	15.39	0.74	0.65
query22	4.08	7.15	1.60
query23	17.76	1.28	1.20
query24	2.27	0.22	0.21
query25	0.17	0.09	0.08
query26	0.33	0.22	0.21
query27	0.46	0.23	0.23
query28	13.18	1.00	0.97
query29	12.60	3.28	3.32
query30	0.26	0.05	0.06
query31	2.88	0.41	0.40
query32	3.26	0.49	0.50
query33	2.92	2.97	2.93
query34	15.42	4.29	4.25
query35	4.28	4.28	4.30
query36	0.68	0.48	0.47
query37	0.18	0.15	0.17
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.15	0.13	0.13
query41	0.10	0.05	0.05
query42	0.05	0.04	0.05
query43	0.05	0.04	0.04
Total cold run time: 107.54 s
Total hot run time: 30.13 s

Copy link
Contributor

@wuwenchi wuwenchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 6, 2024
@morningman morningman merged commit 2ae2a13 into apache:master Aug 6, 2024
morningman added a commit to morningman/doris that referenced this pull request Aug 6, 2024
PR apache#34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
PR #34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
PR #34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
@gavinchou gavinchou mentioned this pull request Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.2-merged doing reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants