Skip to content

Conversation

@zhangbutao
Copy link
Contributor

Proposed changes

#22923 did a good optimization for iceberg count. I think we can end the get splits loop early as one split is enough if the statement can push down count. This can reduce the query time if iceberg table has many splits.

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

// iceberg use integer to store date,
// we need transform it to string
value = DateTimeUtil.daysToIsoDate((Integer) obj);
for (CombinedScanTask taskGrp : combinedScanTasks) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a better way to end the loop in forEach, so i use for to replace forEach. :(

partitionPathSet.add(structLike.toString());
// End loop early as one split is enough if the statement can push down count
if (canPushCount) {
break;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what i want to do. End the entire loop early to avoid a lot of useless spilts if the statement can push down count.

@zhangbutao
Copy link
Contributor Author

run buildall

@zhangbutao
Copy link
Contributor Author

@wuwenchi Could you give some suggestion about this change? Thanks.

@doris-robot
Copy link

TPC-H: Total hot run time: 39914 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a664ff983a3f98b80d64dd28650c6dc60a333a7e, data reload: false

------ Round 1 ----------------------------------
q1	7603	4317	4250	4250
q2	1506	188	193	188
q3	9035	1126	1169	1126
q4	1076	764	799	764
q5	2770	2926	2898	2898
q6	232	141	134	134
q7	1054	585	550	550
q8	1948	2051	2047	2047
q9	6688	6564	6434	6434
q10	3823	3677	3714	3677
q11	361	231	233	231
q12	380	223	213	213
q13	16860	3059	2946	2946
q14	250	217	211	211
q15	525	486	474	474
q16	464	395	407	395
q17	952	699	639	639
q18	7941	7391	7379	7379
q19	1584	1541	1526	1526
q20	514	312	311	311
q21	4909	3240	3968	3240
q22	353	281	287	281
Total cold run time: 70828 ms
Total hot run time: 39914 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4289	4203	4244	4203
q2	373	268	270	268
q3	2971	2774	2741	2741
q4	1855	1608	1602	1602
q5	5262	5291	5265	5265
q6	211	126	124	124
q7	2246	1907	1820	1820
q8	3217	3334	3325	3325
q9	8348	8366	8342	8342
q10	3880	3709	3668	3668
q11	609	495	490	490
q12	763	601	579	579
q13	15958	2986	3011	2986
q14	306	272	256	256
q15	519	471	476	471
q16	474	410	407	407
q17	1767	1485	1485	1485
q18	7552	7423	7351	7351
q19	1653	1535	1599	1535
q20	2001	1782	1738	1738
q21	4856	4824	4754	4754
q22	573	502	514	502
Total cold run time: 69683 ms
Total hot run time: 53912 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187944 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a664ff983a3f98b80d64dd28650c6dc60a333a7e, data reload: false

query1	904	371	344	344
query2	6442	2489	2377	2377
query3	6676	207	219	207
query4	24348	21286	21242	21242
query5	4128	418	425	418
query6	271	180	175	175
query7	4584	292	283	283
query8	244	193	194	193
query9	8486	2440	2415	2415
query10	428	253	256	253
query11	14841	14322	14120	14120
query12	137	89	85	85
query13	1636	367	353	353
query14	10572	8448	8519	8448
query15	261	177	179	177
query16	8168	292	267	267
query17	1847	556	535	535
query18	2111	272	263	263
query19	206	149	149	149
query20	90	84	84	84
query21	193	126	125	125
query22	5077	4827	4866	4827
query23	34200	33509	33584	33509
query24	11256	2893	2804	2804
query25	617	359	375	359
query26	1550	152	154	152
query27	2954	313	329	313
query28	7208	2065	2056	2056
query29	954	621	586	586
query30	295	147	152	147
query31	973	771	731	731
query32	96	52	54	52
query33	748	252	246	246
query34	1093	484	475	475
query35	816	685	675	675
query36	1101	898	957	898
query37	134	67	63	63
query38	2932	2791	2746	2746
query39	1636	1565	1558	1558
query40	275	128	129	128
query41	45	41	41	41
query42	102	99	98	98
query43	584	583	527	527
query44	1151	729	732	729
query45	258	254	251	251
query46	1095	727	721	721
query47	1958	1882	1910	1882
query48	368	305	297	297
query49	1138	393	397	393
query50	775	383	386	383
query51	6840	6779	6792	6779
query52	112	93	90	90
query53	351	279	282	279
query54	1005	435	426	426
query55	79	72	74	72
query56	242	223	221	221
query57	1239	1179	1156	1156
query58	225	200	194	194
query59	3314	3163	3305	3163
query60	255	226	252	226
query61	89	94	92	92
query62	670	490	479	479
query63	302	287	279	279
query64	9764	7403	7336	7336
query65	3218	3144	3099	3099
query66	1396	344	340	340
query67	15389	15117	14912	14912
query68	4741	532	544	532
query69	526	300	306	300
query70	1184	1139	1150	1139
query71	415	258	261	258
query72	7880	2644	2360	2360
query73	718	327	324	324
query74	6437	6124	6099	6099
query75	3474	2641	2650	2641
query76	3063	1052	1013	1013
query77	644	264	262	262
query78	10743	10159	10307	10159
query79	2335	517	516	516
query80	1278	434	428	428
query81	485	216	222	216
query82	679	94	96	94
query83	190	165	163	163
query84	267	86	81	81
query85	1426	270	260	260
query86	411	315	325	315
query87	3340	3100	3124	3100
query88	3852	2349	2347	2347
query89	484	390	379	379
query90	1976	198	184	184
query91	123	99	96	96
query92	57	51	47	47
query93	3007	516	513	513
query94	1130	181	190	181
query95	401	296	293	293
query96	593	323	271	271
query97	3222	2968	3010	2968
query98	242	210	225	210
query99	1339	908	892	892
Total cold run time: 289114 ms
Total hot run time: 187944 ms

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do this optimization, why not just create a dummy IcebergSplit?
So we even don't need to call TableScanUtil.planTasks

@zhangbutao
Copy link
Contributor Author

If you want to do this optimization, why not just create a dummy IcebergSplit? So we even don't need to call TableScanUtil.planTasks

@morningman Thanks for your suggestion! You are right, create a dummy IcebergSplit is a better approach than this PR. But i found that BE need a real iceberg spilt to do some code logic, and we need do some odd check in BE side to let the BE accept the dummy IcebergSplit.

Acutually, i think the count pushdown is a metadata operation which can be done in fe side, and so we can skip to initialize the split and no need the operation in BE side. This can be improve increase efficiency of count operation.

I just submitted a new PR #34928, Please take a look if you have time.
Thanks.

@morningman morningman closed this Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants