Skip to content

Conversation

@seawinde
Copy link
Contributor

Proposed changes

This is brought by #35562

When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition.
if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem.
For example as following:
mv def is:

   CREATE MATERIALIZED VIEW roll_up_mv
   BUILD IMMEDIATE REFRESH AUTO ON MANUAL
   partition by (date_trunc(`col1`, 'month'))
   DISTRIBUTED BY RANDOM BUCKETS 2
   PROPERTIES ('replication_num' = '1')
   AS

select date_trunc(l_shipdate, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
l_suppkey, sum(o_totalprice) as sum_total
from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
group by
col1,
l_shipdate,
o_orderdate,
l_partkey,
l_suppkey;

if run the insert comand

insert into lineitem values
    (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy');

then run query as following, result will not return the 2023-11-21 partition data

select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,

l_suppkey, sum(o_totalprice) as sum_total
from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
group by
col1,
l_shipdate,
o_orderdate,
l_partkey,
l_suppkey;

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39926 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ca6ece723361110bdbde2de4718d7566078ea3e7, data reload: false

------ Round 1 ----------------------------------
q1	17891	4500	4403	4403
q2	2682	193	193	193
q3	12371	1106	1150	1106
q4	10556	820	813	813
q5	7506	2727	2750	2727
q6	235	144	140	140
q7	962	615	598	598
q8	9218	2058	2087	2058
q9	8858	6511	6482	6482
q10	8941	3739	3714	3714
q11	448	245	238	238
q12	427	234	236	234
q13	17762	2973	2979	2973
q14	271	222	235	222
q15	521	483	485	483
q16	523	391	376	376
q17	957	649	732	649
q18	8085	7605	7291	7291
q19	2522	1481	1434	1434
q20	658	308	346	308
q21	4908	3148	3836	3148
q22	395	342	336	336
Total cold run time: 116697 ms
Total hot run time: 39926 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4392	4274	4249	4249
q2	377	266	268	266
q3	2952	2751	2695	2695
q4	1912	1630	1625	1625
q5	5256	5258	5276	5258
q6	212	128	129	128
q7	2095	1736	1736	1736
q8	3195	3326	3299	3299
q9	8306	8328	8295	8295
q10	3874	3684	3679	3679
q11	578	481	493	481
q12	782	619	579	579
q13	17147	2940	2983	2940
q14	287	270	257	257
q15	527	476	471	471
q16	454	407	419	407
q17	1736	1481	1475	1475
q18	7548	7541	7448	7448
q19	1719	1531	1616	1531
q20	1945	1763	1780	1763
q21	4896	4738	4753	4738
q22	600	534	518	518
Total cold run time: 70790 ms
Total hot run time: 53838 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 171766 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ca6ece723361110bdbde2de4718d7566078ea3e7, data reload: false

query1	919	379	377	377
query2	6456	2365	2371	2365
query3	6653	204	205	204
query4	19065	17348	17391	17348
query5	4173	459	468	459
query6	245	157	159	157
query7	4589	298	288	288
query8	322	315	298	298
query9	8694	2367	2355	2355
query10	614	312	278	278
query11	10606	10037	10124	10037
query12	137	84	90	84
query13	1627	360	364	360
query14	8827	6226	7056	6226
query15	245	184	188	184
query16	7799	265	256	256
query17	1573	537	543	537
query18	1896	270	268	268
query19	202	156	159	156
query20	90	83	82	82
query21	219	133	125	125
query22	4358	3990	3968	3968
query23	33711	33026	33433	33026
query24	11892	2821	2815	2815
query25	652	360	390	360
query26	1718	152	152	152
query27	3083	317	316	316
query28	7680	2018	2011	2011
query29	1049	626	597	597
query30	279	150	151	150
query31	951	737	767	737
query32	104	53	58	53
query33	772	281	275	275
query34	989	471	458	458
query35	730	641	639	639
query36	1118	937	930	930
query37	158	71	71	71
query38	2889	2764	2703	2703
query39	868	788	794	788
query40	285	129	128	128
query41	57	55	55	55
query42	121	100	104	100
query43	563	544	552	544
query44	1215	716	728	716
query45	203	173	171	171
query46	1104	732	725	725
query47	1900	1790	1762	1762
query48	383	293	303	293
query49	1155	457	399	399
query50	761	369	368	368
query51	6761	6760	6672	6672
query52	104	96	100	96
query53	368	296	282	282
query54	978	431	423	423
query55	74	70	72	70
query56	282	254	258	254
query57	1140	1047	1082	1047
query58	260	237	242	237
query59	3380	3202	3463	3202
query60	280	280	303	280
query61	91	87	86	86
query62	645	460	444	444
query63	312	291	282	282
query64	9889	2224	1766	1766
query65	3311	3129	3110	3110
query66	1374	323	334	323
query67	15516	14945	14826	14826
query68	4586	533	530	530
query69	453	304	303	303
query70	1200	1153	1170	1153
query71	384	274	264	264
query72	7138	5454	5869	5454
query73	744	318	318	318
query74	5964	5537	5563	5537
query75	3367	2669	2682	2669
query76	2709	926	891	891
query77	432	293	287	287
query78	10327	9917	9712	9712
query79	2153	503	495	495
query80	944	515	457	457
query81	579	221	219	219
query82	704	104	95	95
query83	267	170	165	165
query84	235	83	84	83
query85	1930	297	266	266
query86	495	328	295	295
query87	3287	3075	3106	3075
query88	4219	2342	2320	2320
query89	475	386	374	374
query90	1798	193	188	188
query91	129	99	98	98
query92	64	48	50	48
query93	2442	517	499	499
query94	1281	180	185	180
query95	399	305	324	305
query96	602	266	270	266
query97	3244	3066	3011	3011
query98	221	199	195	195
query99	1096	908	860	860
Total cold run time: 274332 ms
Total hot run time: 171766 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ca6ece723361110bdbde2de4718d7566078ea3e7, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.03	0.04
query3	0.22	0.06	0.05
query4	1.67	0.09	0.10
query5	0.50	0.48	0.50
query6	1.13	0.73	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.50	0.48
query10	0.54	0.55	0.53
query11	0.16	0.11	0.12
query12	0.14	0.12	0.12
query13	0.59	0.59	0.61
query14	0.79	0.79	0.77
query15	0.86	0.82	0.81
query16	0.36	0.36	0.37
query17	0.97	1.01	0.99
query18	0.23	0.22	0.25
query19	1.82	1.71	1.70
query20	0.01	0.01	0.00
query21	15.43	0.66	0.67
query22	3.84	8.50	1.52
query23	18.27	1.39	1.22
query24	2.12	0.23	0.21
query25	0.15	0.08	0.08
query26	0.27	0.18	0.18
query27	0.07	0.08	0.08
query28	13.26	1.03	1.00
query29	12.62	3.26	3.25
query30	0.26	0.06	0.06
query31	2.89	0.38	0.39
query32	3.25	0.46	0.47
query33	2.90	2.87	2.93
query34	17.12	4.46	4.39
query35	4.50	4.57	4.47
query36	0.64	0.46	0.46
query37	0.18	0.14	0.14
query38	0.15	0.14	0.14
query39	0.04	0.03	0.03
query40	0.18	0.15	0.15
query41	0.10	0.05	0.04
query42	0.06	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.06 s
Total hot run time: 30 s


def mv_name = "mv_10086"
sql """DROP MATERIALIZED VIEW IF EXISTS ${mv_name}"""
sql """DROP TABLE IF EXISTS ${mv_name}"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need run drop table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is needed since mv and table name are use same space

l_suppkey;
"""

def roll_up_all_partition_sql = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between roll_up_mv_def_sql and roll_up_all_partition_sql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is used by mv def, the other is used by query, the sql is the same

"""

sql """DROP MATERIALIZED VIEW IF EXISTS ${mv_name}"""
sql """DROP TABLE IF EXISTS ${mv_name}"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need


explain {
sql("${roll_up_all_partition_sql}")
contains("${mv_name}(${mv_name})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 2 ${mv_name}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the key word in explain plan is mv_name(mv_name) when use the materialized view

@seawinde seawinde requested a review from zddr June 20, 2024 06:27
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 21, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 5e009b5 into apache:master Jun 21, 2024
iszhangpch pushed a commit to iszhangpch/doris-p that referenced this pull request Jun 21, 2024
… rewrite by partition rolled up mv (apache#36414)

This is brought by apache#35562 

When mv is partition rolled up mv, which is rolled up by date_trunc. If
base table add new partition.
if query rewrite successfully by the partition mv, the data will lost
the new partition data. This pr fix this problem. For example as following:

mv def is:

CREATE MATERIALIZED VIEW roll_up_mv
BUILD IMMEDIATE REFRESH AUTO ON MANUAL
partition by (date_trunc(`col1`, 'month'))
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES ('replication_num' = '1')
AS
select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
   l_suppkey, sum(o_totalprice) as sum_total
   from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
   group by
   col1,
   l_shipdate,
   o_orderdate,
   l_partkey,
   l_suppkey;

if run the insert comand

insert into lineitem values
    (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy');

then run query as following, result will not return the 2023-11-21 partition data

select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
   l_suppkey, sum(o_totalprice) as sum_total
   from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
   group by
   col1,
   l_shipdate,
   o_orderdate,
   l_partkey,
   l_suppkey;
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
… rewrite by partition rolled up mv (#36414)

This is brought by #35562 

When mv is partition rolled up mv, which is rolled up by date_trunc. If
base table add new partition.
if query rewrite successfully by the partition mv, the data will lost
the new partition data. This pr fix this problem. For example as following:

mv def is:

CREATE MATERIALIZED VIEW roll_up_mv
BUILD IMMEDIATE REFRESH AUTO ON MANUAL
partition by (date_trunc(`col1`, 'month'))
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES ('replication_num' = '1')
AS
select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
   l_suppkey, sum(o_totalprice) as sum_total
   from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
   group by
   col1,
   l_shipdate,
   o_orderdate,
   l_partkey,
   l_suppkey;

if run the insert comand

insert into lineitem values
    (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy');

then run query as following, result will not return the 2023-11-21 partition data

select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
   l_suppkey, sum(o_totalprice) as sum_total
   from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
   group by
   col1,
   l_shipdate,
   o_orderdate,
   l_partkey,
   l_suppkey;
seawinde added a commit to seawinde/doris that referenced this pull request Jul 11, 2024
… rewrite by partition rolled up mv (apache#36414)

This is brought by apache#35562 

When mv is partition rolled up mv, which is rolled up by date_trunc. If
base table add new partition.
if query rewrite successfully by the partition mv, the data will lost
the new partition data. This pr fix this problem. For example as following:

mv def is:

CREATE MATERIALIZED VIEW roll_up_mv
BUILD IMMEDIATE REFRESH AUTO ON MANUAL
partition by (date_trunc(`col1`, 'month'))
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES ('replication_num' = '1')
AS
select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
   l_suppkey, sum(o_totalprice) as sum_total
   from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
   group by
   col1,
   l_shipdate,
   o_orderdate,
   l_partkey,
   l_suppkey;

if run the insert comand

insert into lineitem values
    (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy');

then run query as following, result will not return the 2023-11-21 partition data

select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey,
   l_suppkey, sum(o_totalprice) as sum_total
   from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
   group by
   col1,
   l_shipdate,
   o_orderdate,
   l_partkey,
   l_suppkey;
morrySnow pushed a commit that referenced this pull request Jul 12, 2024
cherry-pick from master
pr: #36318
commitId: c199947

pr: #36111
commitId: 35ebef6

pr: #36175
commitId: 4c8e66b

pr: #36414
commitId: 5e009b5

pr: #36770
commitId: 19e2126

pr: #36567
commitId: 3da8351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants