-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](mtmv) Fix data wrong if base table add new partition when query rewrite by partition rolled up mv #36414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](mtmv) Fix data wrong if base table add new partition when query rewrite by partition rolled up mv #36414
Conversation
… rewrite by partition rolled up mv
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 39926 ms |
TPC-DS: Total hot run time: 171766 ms |
ClickBench: Total hot run time: 30 s |
|
|
||
| def mv_name = "mv_10086" | ||
| sql """DROP MATERIALIZED VIEW IF EXISTS ${mv_name}""" | ||
| sql """DROP TABLE IF EXISTS ${mv_name}""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need run drop table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is needed since mv and table name are use same space
| l_suppkey; | ||
| """ | ||
|
|
||
| def roll_up_all_partition_sql = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between roll_up_mv_def_sql and roll_up_all_partition_sql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one is used by mv def, the other is used by query, the sql is the same
| """ | ||
|
|
||
| sql """DROP MATERIALIZED VIEW IF EXISTS ${mv_name}""" | ||
| sql """DROP TABLE IF EXISTS ${mv_name}""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need
|
|
||
| explain { | ||
| sql("${roll_up_all_partition_sql}") | ||
| contains("${mv_name}(${mv_name})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 2 ${mv_name}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the key word in explain plan is mv_name(mv_name) when use the materialized view
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
… rewrite by partition rolled up mv (#36414) This is brought by #35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
Proposed changes
This is brought by #35562
When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition.
if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem.
For example as following:
mv def is:
if run the insert comand
then run query as following, result will not return the
2023-11-21 partitiondata