Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented Apr 23, 2024

Proposed changes

When scanning a table with many files, It will take a lot of time to transfer splits to backends.(20s of the following 1209172 splits).

|   0:VHIVE_SCAN_NODE(71)                                            |
|      table: level3partition                                        |
|      inputSplitNum=1209172, totalFileSize=6527616577, scanRanges=3 |
|      partition=60591/60591                                         |
|      cardinality=1, numNodes=3                                     |
|      pushdown agg=NONE                                             |
|      limit: 1                                                      |
+--------------------------------------------------------------------+

Therefore, using batch mode to fetch the file splits, BE can do scanning while fetch the file splits.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau AshinGau force-pushed the split_source branch 2 times, most recently from 9e436b6 to 380cea0 Compare May 7, 2024 07:34
@AshinGau AshinGau marked this pull request as ready for review May 7, 2024 07:34
@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member Author

AshinGau commented May 7, 2024

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member Author

AshinGau commented May 7, 2024

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member Author

AshinGau commented May 8, 2024

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.69% (8985/25178)
Line Coverage: 27.35% (74208/271362)
Region Coverage: 26.60% (38373/144259)
Branch Coverage: 23.40% (19565/83610)
Coverage Report: http://coverage.selectdb-in.cc/coverage/36185172b418fbd82887a7dfa972ff441cc57cf2_36185172b418fbd82887a7dfa972ff441cc57cf2/report/index.html

Jibing-Li
Jibing-Li previously approved these changes May 8, 2024
Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels May 8, 2024
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2024

PR approved by anyone and no changes requested.

SplitSource splitSource = new SplitSource(
this::splitToScanRange, backend, locationProperties, splits, pathPartitionKeys);
splitSources.add(splitSource);
SplitSourceManager.registerSplitSource(splitSource);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use singleton

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class SplitSourceManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

  1. not using singletion
  2. extends MasterDaemon class

return QeProcessorImpl.INSTANCE.reportExecStatus(params, getClientAddr());
}

public TFetchSplitBatchResult fetchSplitBatch(TFetchSplitBatchRequest request) throws TException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing @OverRide

LOG(WARNING) << "Failed to get batch of split source: {}, try to reopen" << e1.what();
RETURN_IF_ERROR(coord.reopen());
try {
coord->fetchSplitBatch(result, request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not retry when failure.
If first call fail, it is highly possible the second would fail too.
Simply fail this query to avoid avalanche

@ConfField(mutable = true, masterOnly = false, description = {
"如果切片数量超过阈值,BE将通过batch方式获取scan ranges",
"If the number of splits exceeds the threshold, scan ranges will be got through batch mode."})
public static int num_splits_in_batch_mode = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better be a session varible?

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41933 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f9d87914102858d10220d80bd03afdae5deeb48a, data reload: false

------ Round 1 ----------------------------------
q1	17615	4481	4291	4291
q2	2027	187	190	187
q3	10475	1291	1152	1152
q4	10189	870	829	829
q5	7475	2764	2816	2764
q6	225	136	133	133
q7	1044	611	619	611
q8	9252	2214	2138	2138
q9	9403	6834	6695	6695
q10	9409	4034	3931	3931
q11	457	239	250	239
q12	440	242	234	234
q13	18146	3093	3284	3093
q14	255	208	213	208
q15	500	468	478	468
q16	471	407	399	399
q17	1001	725	684	684
q18	8385	7819	7700	7700
q19	6304	1611	1544	1544
q20	632	322	326	322
q21	5253	4036	4154	4036
q22	353	289	275	275
Total cold run time: 119311 ms
Total hot run time: 41933 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4540	4423	4401	4401
q2	384	272	263	263
q3	3140	2959	2882	2882
q4	1906	1614	1617	1614
q5	5486	5510	5505	5505
q6	218	125	125	125
q7	2370	1972	2024	1972
q8	3297	3426	3447	3426
q9	8624	8707	8683	8683
q10	3976	3841	3894	3841
q11	609	501	494	494
q12	802	644	614	614
q13	17060	3192	3268	3192
q14	299	276	288	276
q15	516	457	477	457
q16	468	414	418	414
q17	1786	1493	1474	1474
q18	7782	7567	7584	7567
q19	1679	1566	1570	1566
q20	1971	1764	1744	1744
q21	11312	4893	4740	4740
q22	568	491	479	479
Total cold run time: 78793 ms
Total hot run time: 55729 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187732 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f9d87914102858d10220d80bd03afdae5deeb48a, data reload: false

query1	912	370	356	356
query2	6466	2472	2258	2258
query3	6647	217	223	217
query4	23165	21295	21288	21288
query5	4172	422	455	422
query6	258	175	173	173
query7	4582	304	292	292
query8	243	190	190	190
query9	8385	2355	2333	2333
query10	437	256	261	256
query11	14975	14184	14163	14163
query12	140	95	87	87
query13	1649	380	377	377
query14	9894	8550	7706	7706
query15	216	177	177	177
query16	7853	282	268	268
query17	1705	585	565	565
query18	1994	282	278	278
query19	205	156	155	155
query20	95	89	86	86
query21	209	132	131	131
query22	5028	4837	4860	4837
query23	34049	33619	33664	33619
query24	6663	3001	2987	2987
query25	555	421	354	354
query26	698	158	154	154
query27	1896	318	318	318
query28	3824	2025	2031	2025
query29	843	614	608	608
query30	235	156	156	156
query31	983	767	769	767
query32	95	52	54	52
query33	503	263	248	248
query34	898	474	495	474
query35	772	681	669	669
query36	1038	949	905	905
query37	103	67	69	67
query38	2898	2758	2732	2732
query39	1599	1592	1598	1592
query40	197	128	126	126
query41	42	37	39	37
query42	101	95	97	95
query43	612	550	580	550
query44	1078	733	755	733
query45	270	255	247	247
query46	1062	713	702	702
query47	1973	1866	1891	1866
query48	381	322	296	296
query49	777	400	405	400
query50	776	382	393	382
query51	6756	6605	6603	6603
query52	101	100	89	89
query53	354	281	289	281
query54	529	432	429	429
query55	75	72	74	72
query56	241	221	227	221
query57	1234	1148	1154	1148
query58	222	201	201	201
query59	3558	3338	3272	3272
query60	263	237	261	237
query61	92	88	90	88
query62	563	459	487	459
query63	308	289	284	284
query64	8377	7394	7400	7394
query65	3123	3088	3140	3088
query66	797	352	333	333
query67	15452	14992	15047	14992
query68	4725	533	531	531
query69	476	303	309	303
query70	1239	1143	1154	1143
query71	384	282	276	276
query72	7367	2577	2331	2331
query73	693	328	324	324
query74	6467	6139	6062	6062
query75	3306	2688	2640	2640
query76	2290	1029	952	952
query77	416	271	275	271
query78	10608	10333	10237	10237
query79	2578	517	514	514
query80	1099	499	444	444
query81	521	225	220	220
query82	723	93	97	93
query83	238	174	172	172
query84	248	92	85	85
query85	1269	278	268	268
query86	509	323	284	284
query87	3275	3097	3118	3097
query88	4405	2421	2431	2421
query89	485	392	402	392
query90	2036	192	195	192
query91	125	98	102	98
query92	63	52	49	49
query93	1953	510	500	500
query94	1149	184	186	184
query95	403	309	309	309
query96	604	281	268	268
query97	3227	3007	2988	2988
query98	239	222	220	220
query99	1238	897	907	897
Total cold run time: 270554 ms
Total hot run time: 187732 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.66% (8990/25208)
Line Coverage: 27.32% (74295/271957)
Region Coverage: 26.55% (38402/144616)
Branch Coverage: 23.37% (19579/83794)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f9d87914102858d10220d80bd03afdae5deeb48a_f9d87914102858d10220d80bd03afdae5deeb48a/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 14, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AshinGau AshinGau merged commit cc457a2 into apache:master May 14, 2024
AshinGau added a commit to AshinGau/incubator-doris that referenced this pull request May 21, 2024
When scanning a table with many files, It will take a lot of time to transfer splits to backends.(20s of the following 1209172 splits).  Therefore, using batch mode to fetch the file splits, BE can do scanning while fetch the file splits.
AshinGau added a commit to AshinGau/incubator-doris that referenced this pull request May 21, 2024
When scanning a table with many files, It will take a lot of time to transfer splits to backends.(20s of the following 1209172 splits).  Therefore, using batch mode to fetch the file splits, BE can do scanning while fetch the file splits.
AshinGau added a commit to AshinGau/incubator-doris that referenced this pull request May 21, 2024
When scanning a table with many files, It will take a lot of time to transfer splits to backends.(20s of the following 1209172 splits).  Therefore, using batch mode to fetch the file splits, BE can do scanning while fetch the file splits.
AshinGau added a commit to AshinGau/incubator-doris that referenced this pull request May 21, 2024
When scanning a table with many files, It will take a lot of time to transfer splits to backends.(20s of the following 1209172 splits).  Therefore, using batch mode to fetch the file splits, BE can do scanning while fetch the file splits.
morningman added a commit that referenced this pull request Aug 6, 2024
PR #34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
morningman added a commit to morningman/doris that referenced this pull request Aug 6, 2024
PR apache#34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
PR #34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
PR #34032 introduce a new method to get splits batch by batch,
but it removed a logic that BE will merge scan ranges to avoid too many
scan ranges being scheduled.

This PR mainly changes:
1. Add scan range merging logic back.
2. Change the default file split size from 8MB to 64MB, to avoid too
many small split.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.4-merged dev/3.0.0-merged meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants