-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improvement](mtmv) Support grouping_sets rewrite when query rewrite by materialized view #36056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
1 similar comment
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 43926 ms |
TPC-DS: Total hot run time: 169574 ms |
ClickBench: Total hot run time: 31.73 s |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 39865 ms |
TPC-DS: Total hot run time: 173884 ms |
ClickBench: Total hot run time: 31.12 s |
morrySnow
approved these changes
Jun 12, 2024
Contributor
|
PR approved by at least one committer and no changes requested. |
starocean999
approved these changes
Jun 12, 2024
Contributor
|
PR approved by anyone and no changes requested. |
dataroaring
pushed a commit
that referenced
this pull request
Jun 13, 2024
…rialized view (#36056) Support grouping_sets, cube, rollup query rewrite by materialized view, if mv group by fields contains all the group by fields in query. For example as following: CREATE MATERIALIZED VIEW mv_1 BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union( to_bitmap( case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end ) ) as bitmap_union_basic from orders group by o_orderstatus, o_orderdate, o_orderpriority; the query following can rewrite successfully by mv above select o_orderstatus, o_orderdate, o_orderpriority, grouping_id(o_orderstatus, o_orderdate, o_orderpriority), grouping_id(o_orderstatus, o_orderdate), grouping(o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders group by GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderpriority), (o_orderstatus), ()); if query group by fields is sub of mv group by fields, and the query aggregate function extends `RollupTrait` it can also rewrites successfully, for example query as following. this is applicable for `CUBE`, `ROLLUP` select o_orderstatus, o_orderdate, grouping_id(o_orderstatus, o_orderdate), grouping(o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders group by GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderdate),());
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jun 20, 2024
…rialized view (apache#36056) Support grouping_sets, cube, rollup query rewrite by materialized view, if mv group by fields contains all the group by fields in query. For example as following: CREATE MATERIALIZED VIEW mv_1 BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union( to_bitmap( case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end ) ) as bitmap_union_basic from orders group by o_orderstatus, o_orderdate, o_orderpriority; the query following can rewrite successfully by mv above select o_orderstatus, o_orderdate, o_orderpriority, grouping_id(o_orderstatus, o_orderdate, o_orderpriority), grouping_id(o_orderstatus, o_orderdate), grouping(o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders group by GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderpriority), (o_orderstatus), ()); if query group by fields is sub of mv group by fields, and the query aggregate function extends `RollupTrait` it can also rewrites successfully, for example query as following. this is applicable for `CUBE`, `ROLLUP` select o_orderstatus, o_orderdate, grouping_id(o_orderstatus, o_orderdate), grouping(o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders group by GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderdate),());
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jul 8, 2024
…rialized view (apache#36056) Support grouping_sets, cube, rollup query rewrite by materialized view, if mv group by fields contains all the group by fields in query. For example as following: CREATE MATERIALIZED VIEW mv_1 BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union( to_bitmap( case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end ) ) as bitmap_union_basic from orders group by o_orderstatus, o_orderdate, o_orderpriority; the query following can rewrite successfully by mv above select o_orderstatus, o_orderdate, o_orderpriority, grouping_id(o_orderstatus, o_orderdate, o_orderpriority), grouping_id(o_orderstatus, o_orderdate), grouping(o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders group by GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderpriority), (o_orderstatus), ()); if query group by fields is sub of mv group by fields, and the query aggregate function extends `RollupTrait` it can also rewrites successfully, for example query as following. this is applicable for `CUBE`, `ROLLUP` select o_orderstatus, o_orderdate, grouping_id(o_orderstatus, o_orderdate), grouping(o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders group by GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderdate),());
morrySnow
pushed a commit
that referenced
this pull request
Jul 8, 2024
starocean999
pushed a commit
that referenced
this pull request
Sep 20, 2024
…terialized view (#40803) ## Proposed changes This is brought by #36056 Not all query after rewritten successfully can compensate union all Such as: mv def sql is as following, partition column is a ```sql select a, b, count(*) from t1 group by a, b ``` Query is as following: ```sq select count(*) from t1 ``` the result is +----------+ | count(*) | +----------+ | 24 | +----------+ after rewritten by materialized view successfully If mv part partition is invalid, can not compensate union all, because result is wrong after compensate union all. +----------+ | count(*) | +----------+ | 24 | | 3 | +----------+ This pr fix this.
gavinchou
pushed a commit
that referenced
this pull request
Sep 25, 2024
…terialized view (#40803) ## Proposed changes This is brought by #36056 Not all query after rewritten successfully can compensate union all Such as: mv def sql is as following, partition column is a ```sql select a, b, count(*) from t1 group by a, b ``` Query is as following: ```sq select count(*) from t1 ``` the result is +----------+ | count(*) | +----------+ | 24 | +----------+ after rewritten by materialized view successfully If mv part partition is invalid, can not compensate union all, because result is wrong after compensate union all. +----------+ | count(*) | +----------+ | 24 | | 3 | +----------+ This pr fix this.
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Oct 17, 2024
…terialized view (apache#40803) This is brought by apache#36056 Not all query after rewritten successfully can compensate union all Such as: mv def sql is as following, partition column is a ```sql select a, b, count(*) from t1 group by a, b ``` Query is as following: ```sq select count(*) from t1 ``` the result is +----------+ | count(*) | +----------+ | 24 | +----------+ after rewritten by materialized view successfully If mv part partition is invalid, can not compensate union all, because result is wrong after compensate union all. +----------+ | count(*) | +----------+ | 24 | | 3 | +----------+ This pr fix this.
16 tasks
morrySnow
pushed a commit
that referenced
this pull request
Nov 3, 2025
…r above scan (#57343) ### What problem does this PR solve? Related PR: #36056 Problem Summary: if mv is defined as following CREATE MATERIALIZED VIEW mv_11 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic from orders where o_custkey > 1 group by o_orderstatus, o_orderdate, o_orderpriority; there is filter `where o_custkey > 1` in mv, if query is as following, should be rewritten successfully but fail, because the filter `o_custkey > 1` is lost compare and could not compensate the filter, the pr fixed this. select o_orderstatus, o_orderpriority, grouping_id(o_orderstatus, o_orderpriority), grouping_id(o_orderstatus), grouping(o_orderstatus), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders where o_custkey > 1 group by ROLLUP (o_orderstatus, o_orderpriority);
github-actions bot
pushed a commit
that referenced
this pull request
Nov 3, 2025
…r above scan (#57343) ### What problem does this PR solve? Related PR: #36056 Problem Summary: if mv is defined as following CREATE MATERIALIZED VIEW mv_11 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic from orders where o_custkey > 1 group by o_orderstatus, o_orderdate, o_orderpriority; there is filter `where o_custkey > 1` in mv, if query is as following, should be rewritten successfully but fail, because the filter `o_custkey > 1` is lost compare and could not compensate the filter, the pr fixed this. select o_orderstatus, o_orderpriority, grouping_id(o_orderstatus, o_orderpriority), grouping_id(o_orderstatus), grouping(o_orderstatus), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders where o_custkey > 1 group by ROLLUP (o_orderstatus, o_orderpriority);
github-actions bot
pushed a commit
that referenced
this pull request
Nov 3, 2025
…r above scan (#57343) ### What problem does this PR solve? Related PR: #36056 Problem Summary: if mv is defined as following CREATE MATERIALIZED VIEW mv_11 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic from orders where o_custkey > 1 group by o_orderstatus, o_orderdate, o_orderpriority; there is filter `where o_custkey > 1` in mv, if query is as following, should be rewritten successfully but fail, because the filter `o_custkey > 1` is lost compare and could not compensate the filter, the pr fixed this. select o_orderstatus, o_orderpriority, grouping_id(o_orderstatus, o_orderpriority), grouping_id(o_orderstatus), grouping(o_orderstatus), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders where o_custkey > 1 group by ROLLUP (o_orderstatus, o_orderpriority);
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Nov 4, 2025
…r above scan (apache#57343) ### What problem does this PR solve? Related PR: apache#36056 Problem Summary: if mv is defined as following CREATE MATERIALIZED VIEW mv_11 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select o_orderstatus, o_orderdate, o_orderpriority, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic from orders where o_custkey > 1 group by o_orderstatus, o_orderdate, o_orderpriority; there is filter `where o_custkey > 1` in mv, if query is as following, should be rewritten successfully but fail, because the filter `o_custkey > 1` is lost compare and could not compensate the filter, the pr fixed this. select o_orderstatus, o_orderpriority, grouping_id(o_orderstatus, o_orderpriority), grouping_id(o_orderstatus), grouping(o_orderstatus), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from orders where o_custkey > 1 group by ROLLUP (o_orderstatus, o_orderpriority);
16 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approved
Indicates a PR has been approved by one committer.
dev/2.1.5-merged
dev/3.0.0-merged
reviewed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
Support grouping_sets, cube, rollup query rewrite by materialized view, if mv group by fields contains all the group by fields in query.
For example as following:
mv def
the query following can rewrite successfully by mv above
if query group by fields is sub of mv group by fields, and the query aggregate function extends
RollupTraitit can also rewrites successfully, for example query as following.
this is applicable for
CUBE,ROLLUP