Skip to content

Conversation

@seawinde
Copy link
Contributor

Proposed changes

Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when query a single table with aggregate and many filter

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41317 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 34737db79c591559e9c381c183a01198e4ef9b54, data reload: false

------ Round 1 ----------------------------------
q1	17611	4356	4301	4301
q2	2021	183	185	183
q3	10476	1238	1180	1180
q4	10194	804	808	804
q5	7478	2709	2691	2691
q6	229	132	134	132
q7	964	611	614	611
q8	9211	2140	2098	2098
q9	9168	6663	6677	6663
q10	9286	3859	3888	3859
q11	452	263	236	236
q12	524	243	224	224
q13	17270	3303	3241	3241
q14	268	220	227	220
q15	529	481	492	481
q16	519	411	411	411
q17	994	704	688	688
q18	8375	7816	7767	7767
q19	5481	1565	1543	1543
q20	656	328	310	310
q21	5221	3399	4114	3399
q22	352	278	275	275
Total cold run time: 117279 ms
Total hot run time: 41317 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4537	4407	4449	4407
q2	395	277	255	255
q3	3106	2917	2879	2879
q4	1985	1771	1719	1719
q5	5259	5440	5497	5440
q6	216	133	128	128
q7	2179	1814	1807	1807
q8	3253	3386	3379	3379
q9	8561	8580	8654	8580
q10	4089	3939	3725	3725
q11	613	487	497	487
q12	770	608	612	608
q13	16125	3175	3187	3175
q14	299	279	269	269
q15	518	495	490	490
q16	508	460	444	444
q17	1817	1538	1539	1538
q18	7951	7622	7500	7500
q19	1676	1484	1486	1484
q20	2024	1787	1827	1787
q21	10140	4884	4773	4773
q22	550	514	513	513
Total cold run time: 76571 ms
Total hot run time: 55387 ms

// Maybe return null, which means the id according statistics should calc normally rather than getting
// form this map
// id maybe relation id or cteId or other type of id
private final Map<Pair<Id, Class<? extends Id>>, Statistics> idToStatisticsMap = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just support relation id now

@doris-robot
Copy link

TPC-DS: Total hot run time: 169253 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 34737db79c591559e9c381c183a01198e4ef9b54, data reload: false

query1	899	376	372	372
query2	7249	2361	2240	2240
query3	6635	209	204	204
query4	19881	17471	17297	17297
query5	4118	419	410	410
query6	249	156	156	156
query7	4578	298	300	298
query8	235	186	179	179
query9	8632	2373	2352	2352
query10	452	279	267	267
query11	10488	10164	9983	9983
query12	134	94	89	89
query13	1653	379	387	379
query14	10115	7041	7656	7041
query15	207	170	167	167
query16	7677	260	260	260
query17	1640	539	518	518
query18	1762	271	267	267
query19	197	153	156	153
query20	99	85	91	85
query21	201	133	130	130
query22	4114	3983	3859	3859
query23	33367	32960	33044	32960
query24	7018	2873	2886	2873
query25	570	368	355	355
query26	721	157	156	156
query27	2197	319	330	319
query28	4867	2071	2066	2066
query29	849	608	603	603
query30	246	150	151	150
query31	983	765	736	736
query32	87	52	56	52
query33	513	275	264	264
query34	867	473	481	473
query35	703	619	594	594
query36	1056	931	907	907
query37	102	69	69	69
query38	2897	2773	2761	2761
query39	840	843	795	795
query40	195	127	130	127
query41	47	43	45	43
query42	103	97	98	97
query43	591	548	536	536
query44	1076	736	756	736
query45	177	162	156	156
query46	1054	703	709	703
query47	1801	1742	1753	1742
query48	361	299	305	299
query49	849	381	392	381
query50	763	391	379	379
query51	7009	6713	6749	6713
query52	105	93	88	88
query53	348	289	285	285
query54	554	434	435	434
query55	74	72	75	72
query56	260	271	252	252
query57	1102	1034	1052	1034
query58	224	208	211	208
query59	3374	3264	3164	3164
query60	275	258	265	258
query61	93	90	91	90
query62	609	455	447	447
query63	309	283	282	282
query64	8528	2213	1741	1741
query65	3184	3096	3126	3096
query66	790	331	321	321
query67	15013	14946	14694	14694
query68	4605	516	531	516
query69	447	268	273	268
query70	1168	1073	1140	1073
query71	362	271	274	271
query72	7440	5303	2772	2772
query73	723	324	324	324
query74	5913	5531	5583	5531
query75	3265	2634	2635	2634
query76	2262	965	1007	965
query77	426	261	269	261
query78	10139	9902	9691	9691
query79	2015	529	520	520
query80	1073	440	444	440
query81	513	224	216	216
query82	650	89	95	89
query83	242	170	169	169
query84	248	85	86	85
query85	1812	285	341	285
query86	473	327	316	316
query87	3277	3095	3109	3095
query88	4019	2457	2442	2442
query89	486	389	380	380
query90	2058	186	187	186
query91	121	94	99	94
query92	58	49	50	49
query93	2715	503	490	490
query94	1267	185	186	185
query95	403	324	316	316
query96	590	273	274	273
query97	3197	3032	3059	3032
query98	245	224	212	212
query99	1184	859	856	856
Total cold run time: 261362 ms
Total hot run time: 169253 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 34737db79c591559e9c381c183a01198e4ef9b54, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.04
query3	0.23	0.04	0.04
query4	1.68	0.06	0.07
query5	0.51	0.49	0.50
query6	1.13	0.73	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.48	0.48
query10	0.54	0.55	0.55
query11	0.17	0.11	0.10
query12	0.14	0.11	0.12
query13	0.60	0.59	0.60
query14	0.80	0.76	0.79
query15	0.81	0.81	0.80
query16	0.36	0.36	0.37
query17	0.94	1.00	0.97
query18	0.22	0.27	0.24
query19	1.86	1.66	1.75
query20	0.02	0.01	0.01
query21	15.73	0.66	0.66
query22	4.54	6.76	2.50
query23	18.25	1.37	1.27
query24	1.29	0.30	0.24
query25	0.13	0.09	0.08
query26	0.24	0.16	0.16
query27	0.07	0.08	0.08
query28	13.65	1.10	1.08
query29	13.13	3.32	3.28
query30	0.25	0.06	0.05
query31	2.95	0.38	0.39
query32	3.22	0.47	0.47
query33	2.92	2.89	2.93
query34	17.07	4.40	4.47
query35	4.49	4.66	4.50
query36	0.66	0.46	0.48
query37	0.17	0.16	0.16
query38	0.15	0.14	0.15
query39	0.04	0.04	0.04
query40	0.16	0.15	0.16
query41	0.10	0.04	0.04
query42	0.05	0.04	0.05
query43	0.04	0.04	0.03
Total cold run time: 110.04 s
Total hot run time: 31.16 s

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41713 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2dbd9caa882851272c004e96b2232ce0d40de1f0, data reload: false

------ Round 1 ----------------------------------
q1	17593	4330	4242	4242
q2	2039	180	180	180
q3	10474	1225	1198	1198
q4	10188	821	828	821
q5	7494	2709	2706	2706
q6	226	144	136	136
q7	965	606	611	606
q8	9219	2110	2083	2083
q9	9132	6649	6693	6649
q10	9117	3858	3851	3851
q11	448	236	245	236
q12	450	216	216	216
q13	18431	3273	3232	3232
q14	277	225	232	225
q15	509	478	480	478
q16	499	372	387	372
q17	953	739	659	659
q18	8431	7715	7681	7681
q19	4487	1581	1550	1550
q20	638	308	320	308
q21	5065	4038	4006	4006
q22	351	281	278	278
Total cold run time: 116986 ms
Total hot run time: 41713 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4517	4423	4460	4423
q2	388	267	269	267
q3	3140	2929	2986	2929
q4	1925	1612	1670	1612
q5	5443	5524	5483	5483
q6	213	123	128	123
q7	2242	1789	1807	1789
q8	3272	3400	3394	3394
q9	8669	8673	8701	8673
q10	4006	3706	3833	3706
q11	593	503	483	483
q12	797	657	657	657
q13	16989	3178	3276	3178
q14	316	291	280	280
q15	536	491	496	491
q16	503	446	446	446
q17	1819	1504	1465	1465
q18	7643	7522	7404	7404
q19	1638	1547	1750	1547
q20	1995	1780	1802	1780
q21	4908	4824	4840	4824
q22	568	479	480	479
Total cold run time: 72120 ms
Total hot run time: 55433 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 27, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 171030 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2dbd9caa882851272c004e96b2232ce0d40de1f0, data reload: false

query1	923	394	369	369
query2	6426	2592	2360	2360
query3	6646	203	205	203
query4	19270	17230	17340	17230
query5	4189	412	407	407
query6	251	157	152	152
query7	4583	305	300	300
query8	232	179	180	179
query9	8483	2322	2311	2311
query10	462	275	261	261
query11	10669	10069	10100	10069
query12	139	95	89	89
query13	1642	371	366	366
query14	8474	5996	7980	5996
query15	215	176	165	165
query16	7881	271	259	259
query17	1751	518	514	514
query18	2004	267	263	263
query19	196	153	149	149
query20	93	90	86	86
query21	194	125	137	125
query22	4196	3923	3985	3923
query23	33357	33007	33070	33007
query24	6307	2926	2792	2792
query25	490	345	357	345
query26	693	157	156	156
query27	1865	315	312	312
query28	3769	2003	1994	1994
query29	881	614	590	590
query30	224	147	154	147
query31	935	757	754	754
query32	65	52	53	52
query33	506	277	260	260
query34	865	463	464	463
query35	684	604	609	604
query36	1022	924	930	924
query37	105	65	65	65
query38	2874	2802	2795	2795
query39	854	790	789	789
query40	192	126	126	126
query41	47	44	45	44
query42	102	92	97	92
query43	590	543	565	543
query44	1067	725	754	725
query45	179	169	165	165
query46	1061	730	707	707
query47	1847	1779	1775	1775
query48	395	296	302	296
query49	782	417	376	376
query50	765	371	387	371
query51	6792	6787	6755	6755
query52	105	86	90	86
query53	355	293	290	290
query54	521	426	420	420
query55	71	73	71	71
query56	262	233	240	233
query57	1110	1051	1046	1046
query58	228	213	202	202
query59	3504	3388	3437	3388
query60	273	270	260	260
query61	86	116	86	86
query62	552	443	438	438
query63	316	286	290	286
query64	3009	1715	1697	1697
query65	3185	3118	3101	3101
query66	790	332	330	330
query67	15251	14878	14721	14721
query68	4523	527	528	527
query69	450	328	271	271
query70	1189	1081	1088	1081
query71	367	264	266	264
query72	7801	5284	5517	5284
query73	722	322	318	318
query74	6010	5683	5640	5640
query75	3288	2648	2589	2589
query76	2155	1060	1014	1014
query77	382	263	264	263
query78	10313	9823	9710	9710
query79	2179	511	517	511
query80	1129	432	427	427
query81	529	222	228	222
query82	680	91	88	88
query83	256	168	167	167
query84	245	87	88	87
query85	1719	268	294	268
query86	497	291	274	274
query87	3360	3160	3213	3160
query88	4151	2418	2426	2418
query89	477	418	377	377
query90	2005	191	189	189
query91	122	97	93	93
query92	64	50	48	48
query93	3094	500	484	484
query94	1191	189	187	187
query95	405	303	311	303
query96	599	268	269	268
query97	3148	2990	3017	2990
query98	243	223	211	211
query99	1132	847	842	842
Total cold run time: 252277 ms
Total hot run time: 171030 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.64 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2dbd9caa882851272c004e96b2232ce0d40de1f0, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.04	0.04
query4	1.68	0.06	0.06
query5	0.48	0.50	0.49
query6	1.14	0.73	0.72
query7	0.01	0.01	0.01
query8	0.05	0.04	0.04
query9	0.53	0.49	0.50
query10	0.54	0.55	0.56
query11	0.15	0.11	0.12
query12	0.15	0.12	0.12
query13	0.59	0.58	0.59
query14	0.77	0.76	0.77
query15	0.83	0.81	0.82
query16	0.37	0.37	0.37
query17	0.95	1.02	0.94
query18	0.22	0.26	0.26
query19	1.79	1.70	1.71
query20	0.02	0.01	0.01
query21	15.73	0.65	0.64
query22	4.00	7.76	2.08
query23	18.31	1.34	1.24
query24	1.96	0.22	0.20
query25	0.14	0.08	0.08
query26	0.27	0.17	0.16
query27	0.07	0.08	0.07
query28	13.29	1.02	0.99
query29	13.34	3.36	3.26
query30	0.25	0.06	0.06
query31	2.86	0.39	0.39
query32	3.29	0.47	0.46
query33	2.88	2.91	2.87
query34	17.35	4.46	4.44
query35	4.52	4.51	4.55
query36	0.68	0.50	0.49
query37	0.17	0.15	0.15
query38	0.15	0.14	0.15
query39	0.05	0.04	0.03
query40	0.16	0.14	0.15
query41	0.09	0.04	0.04
query42	0.05	0.05	0.04
query43	0.04	0.03	0.04
Total cold run time: 110.27 s
Total hot run time: 30.64 s

@englefly englefly merged commit 812eed4 into apache:master May 28, 2024
yiguolei pushed a commit that referenced this pull request May 28, 2024
… cost estimation later (#35436)

Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when
query a single table with aggregate and many filter
dataroaring pushed a commit that referenced this pull request May 28, 2024
… cost estimation later (#35436)

Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when
query a single table with aggregate and many filter
morrySnow pushed a commit that referenced this pull request Jun 3, 2024
…ats to mv scan plan based (#35749)

this is brought by #35436 

in the method `MaterializationContext#getPlanStatistics` this get the
materialization context orginal plan statistics.
but the `expressionToColumnStats` in statistics is the slot of original plan.
We want the statistics of original plan but the
`expressionToColumnStats` in which should be mv scan plan based actually.
So add the method
`MaterializationContext#normalizeStatisticsColumnExpression`. when after
generate the PlanStatistics in MaterializationContext, should call the
normalizeStatisticsColumnExpression method.
dataroaring pushed a commit that referenced this pull request Jun 4, 2024
…ats to mv scan plan based (#35749)

this is brought by #35436 

in the method `MaterializationContext#getPlanStatistics` this get the
materialization context orginal plan statistics.
but the `expressionToColumnStats` in statistics is the slot of original plan.
We want the statistics of original plan but the
`expressionToColumnStats` in which should be mv scan plan based actually.
So add the method
`MaterializationContext#normalizeStatisticsColumnExpression`. when after
generate the PlanStatistics in MaterializationContext, should call the
normalizeStatisticsColumnExpression method.
seawinde added a commit to seawinde/doris that referenced this pull request Jun 5, 2024
…ats to mv scan plan based (apache#35749)

this is brought by apache#35436 

in the method `MaterializationContext#getPlanStatistics` this get the
materialization context orginal plan statistics.
but the `expressionToColumnStats` in statistics is the slot of original plan.
We want the statistics of original plan but the
`expressionToColumnStats` in which should be mv scan plan based actually.
So add the method
`MaterializationContext#normalizeStatisticsColumnExpression`. when after
generate the PlanStatistics in MaterializationContext, should call the
normalizeStatisticsColumnExpression method.
seawinde added a commit to seawinde/doris that referenced this pull request Jun 7, 2024
…ats to mv scan plan based (apache#35749)

this is brought by apache#35436 

in the method `MaterializationContext#getPlanStatistics` this get the
materialization context orginal plan statistics.
but the `expressionToColumnStats` in statistics is the slot of original plan.
We want the statistics of original plan but the
`expressionToColumnStats` in which should be mv scan plan based actually.
So add the method
`MaterializationContext#normalizeStatisticsColumnExpression`. when after
generate the PlanStatistics in MaterializationContext, should call the
normalizeStatisticsColumnExpression method.
morningman pushed a commit to seawinde/doris that referenced this pull request Jun 8, 2024
…ats to mv scan plan based (apache#35749)

this is brought by apache#35436 

in the method `MaterializationContext#getPlanStatistics` this get the
materialization context orginal plan statistics.
but the `expressionToColumnStats` in statistics is the slot of original plan.
We want the statistics of original plan but the
`expressionToColumnStats` in which should be mv scan plan based actually.
So add the method
`MaterializationContext#normalizeStatisticsColumnExpression`. when after
generate the PlanStatistics in MaterializationContext, should call the
normalizeStatisticsColumnExpression method.
seawinde added a commit to seawinde/doris that referenced this pull request Jun 14, 2024
…ats to mv scan plan based (apache#35749)

this is brought by apache#35436 

in the method `MaterializationContext#getPlanStatistics` this get the
materialization context orginal plan statistics.
but the `expressionToColumnStats` in statistics is the slot of original plan.
We want the statistics of original plan but the
`expressionToColumnStats` in which should be mv scan plan based actually.
So add the method
`MaterializationContext#normalizeStatisticsColumnExpression`. when after
generate the PlanStatistics in MaterializationContext, should call the
normalizeStatisticsColumnExpression method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.4-merged dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants