-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](mtmv) Fix high nest level materialized view can not be rewritten, because low level mv aggregate roll up #36567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
|
run performance |
TPC-H: Total hot run time: 39837 ms |
TPC-DS: Total hot run time: 171735 ms |
…cause low level mv aggregate roll up
566143d to
fe79381
Compare
ClickBench: Total hot run time: 30.42 s |
|
run buildall |
TPC-H: Total hot run time: 40247 ms |
TPC-DS: Total hot run time: 172986 ms |
ClickBench: Total hot run time: 30.9 s |
| Plan rewrittenPlan = MaterializedViewUtils.rewriteByRules(cascadesContext, | ||
| childContext -> { | ||
| Rewriter.getCteChildrenRewriter(childContext, | ||
| ImmutableList.of(Rewriter.topDown(new EliminateGroupByKey()))).execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need apply this rule seperately? rewrite the whole mv plan tree could not eliminate group by key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there is five group by dimension in mv, but query use four of them.
So need add projects that query used on mv plan, and run eliminate group by key again.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…n, because low level mv aggregate roll up (#36567) Query is aggregate, the query group by expression is less than materialzied view group by expression. when the more dimensions than queries in materialzied view can be eliminated with functional dependencies. it can be rewritten with out roll up aggregate. For example as following: mv def is CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey; query is as following: select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey; we can see that query doesn't use `ps_partkey` which is in mv group by expression. Normally will add roll up aggragate on materialized view if the gorup by dimension in mv is mucher than query group by dimension. And, in this scane we can get the function dependency on `l_suppkey = ps_suppkey `. and we doesn't need to add roll up aggregate on materialized view in rewritten plan. this improve performance and is beneficial for nest materialized view rewrite.
…n, because low level mv aggregate roll up (apache#36567) Query is aggregate, the query group by expression is less than materialzied view group by expression. when the more dimensions than queries in materialzied view can be eliminated with functional dependencies. it can be rewritten with out roll up aggregate. For example as following: mv def is CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey; query is as following: select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey; we can see that query doesn't use `ps_partkey` which is in mv group by expression. Normally will add roll up aggragate on materialized view if the gorup by dimension in mv is mucher than query group by dimension. And, in this scane we can get the function dependency on `l_suppkey = ps_suppkey `. and we doesn't need to add roll up aggregate on materialized view in rewritten plan. this improve performance and is beneficial for nest materialized view rewrite.
Proposed changes
Query is aggregate, the query group by expression is less than materialzied view group by expression.
when the more dimensions than queries in materialzied view can be eliminated with functional dependencies.
it can be rewritten with out roll up aggregate.
For example as following:
mv def is
query is as following:
we can see that query doesn't use
ps_partkeywhich is in mv group by expression.Normally will add roll up aggragate on materialized view if the gorup by dimension in mv is mucher than query group by dimension.
And, in this scane we can get the function dependency on
l_suppkey = ps_suppkey. and we doesn't need to add roll up aggregate on materialized view in rewritten plan. this improve performance and is beneficial for nest materialized view rewrite.