-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](mtmv) Support to use mv group dimension when query aggregate function is distinct #36318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](mtmv) Support to use mv group dimension when query aggregate function is distinct #36318
Conversation
…e function is distinct
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 39852 ms |
TPC-DS: Total hot run time: 171638 ms |
ClickBench: Total hot run time: 30.74 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
zfr9527
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e function is distinct (#36318) ## Proposed changes This extend the query rewrite by materialized view ability For example mv def is > CREATE MATERIALIZED VIEW mv1 > BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ('replication_num' = '1') > AS > select > count(o_totalprice), > o_shippriority, > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > o_shippriority, > bin(o_orderkey); the query as following can be rewritten by materialized view successfully though `sum(distinct o_shippriority)` in query is not appear in mv output, but query aggregate function is distinct and it use the group by dimension in mv, in this scene, the `sum(distinct o_shippriority)` can use mv group dimension `o_shippriority` directly and the result is true. Suppport the following distinct aggregate function currently, others are supported in the furture on demand - max(distinct arg) - min(distinct arg) - sum(distinct arg) - avg(distinct arg) - count(distinct arg) > select > count(o_totalprice), > max(distinct o_shippriority), > min(distinct o_shippriority), > avg(distinct o_shippriority), > sum(distinct o_shippriority) / count(distinct o_shippriority) > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > bin(o_orderkey);
…e function is distinct (apache#36318) ## Proposed changes This extend the query rewrite by materialized view ability For example mv def is > CREATE MATERIALIZED VIEW mv1 > BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ('replication_num' = '1') > AS > select > count(o_totalprice), > o_shippriority, > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > o_shippriority, > bin(o_orderkey); the query as following can be rewritten by materialized view successfully though `sum(distinct o_shippriority)` in query is not appear in mv output, but query aggregate function is distinct and it use the group by dimension in mv, in this scene, the `sum(distinct o_shippriority)` can use mv group dimension `o_shippriority` directly and the result is true. Suppport the following distinct aggregate function currently, others are supported in the furture on demand - max(distinct arg) - min(distinct arg) - sum(distinct arg) - avg(distinct arg) - count(distinct arg) > select > count(o_totalprice), > max(distinct o_shippriority), > min(distinct o_shippriority), > avg(distinct o_shippriority), > sum(distinct o_shippriority) / count(distinct o_shippriority) > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > bin(o_orderkey);
…ys nullable (#52960) ### What problem does this PR solve? Related PR: #36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
…ys nullable (#52960) ### What problem does this PR solve? Related PR: #36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
…ys nullable (#52960) ### What problem does this PR solve? Related PR: #36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
…ys nullable (#52960) ### What problem does this PR solve? Related PR: #36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
…ys nullable (apache#52960) ### What problem does this PR solve? Related PR: apache#36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
…ys nullable (apache#52960) ### What problem does this PR solve? Related PR: apache#36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
…ys nullable (apache#52960) ### What problem does this PR solve? Related PR: apache#36318 Problem Summary: materaialized view def is as fllowing: create materialized view as select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3; `sum(k2) ` nullable is true if query is as following, would rewritten fail with err info 'query aggregate function roll up fail', the pr fix this select sum(distinct k1) from agg_use_key_direct
Proposed changes
This extend the query rewrite by materialized view ability
For example mv def is
the query as following can be rewritten by materialized view successfully
though
sum(distinct o_shippriority)in query is not appear in mv output, but query aggregate function is distinct and it usethe group by dimension in mv, in this scene, the
sum(distinct o_shippriority)can use mv group dimensiono_shipprioritydirectly and the result is true.
Suppport the following distinct aggregate function currently, others are supported in the furture on demand