Skip to content

Conversation

@seawinde
Copy link
Contributor

Proposed changes

This extend the query rewrite by materialized view ability
For example mv def is

       CREATE MATERIALIZED VIEW mv1
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1') 
        AS
         select
         count(o_totalprice),
         o_shippriority,
         o_orderstatus,
         bin(o_orderkey)
         from orders
         group by
         o_orderstatus,
         o_shippriority,
         bin(o_orderkey);

the query as following can be rewritten by materialized view successfully
though sum(distinct o_shippriority) in query is not appear in mv output, but query aggregate function is distinct and it use
the group by dimension in mv, in this scene, the sum(distinct o_shippriority) can use mv group dimension o_shippriority
directly and the result is true.

Suppport the following distinct aggregate function currently, others are supported in the furture on demand

  • max(distinct arg)
  • min(distinct arg)
  • sum(distinct arg)
  • avg(distinct arg)
  • count(distinct arg)
        select 
        count(o_totalprice),
         max(distinct o_shippriority),
         min(distinct o_shippriority),
         avg(distinct o_shippriority),
         sum(distinct o_shippriority) / count(distinct o_shippriority)
         o_orderstatus,
         bin(o_orderkey)
         from orders
        group by
        o_orderstatus,
        bin(o_orderkey);

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39852 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 170be7e07522f0c3b29bf0df991a4e8751fd3350, data reload: false

------ Round 1 ----------------------------------
q1	17877	4417	4353	4353
q2	2663	200	206	200
q3	11652	1097	1083	1083
q4	10309	745	828	745
q5	7725	2742	2667	2667
q6	223	142	135	135
q7	966	616	608	608
q8	9246	2068	2071	2068
q9	8826	6489	6473	6473
q10	8974	3694	3696	3694
q11	450	236	245	236
q12	450	233	224	224
q13	17777	2974	2989	2974
q14	274	229	219	219
q15	509	491	491	491
q16	533	377	376	376
q17	962	641	661	641
q18	7952	7498	7343	7343
q19	2999	1528	1483	1483
q20	660	309	341	309
q21	4956	3302	3196	3196
q22	393	337	334	334
Total cold run time: 116376 ms
Total hot run time: 39852 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4332	4379	4205	4205
q2	369	272	263	263
q3	2954	2704	2687	2687
q4	1854	1586	1643	1586
q5	5253	5291	5442	5291
q6	211	124	124	124
q7	2162	1735	1740	1735
q8	3137	3324	3281	3281
q9	8320	8303	8316	8303
q10	3888	3667	3629	3629
q11	584	476	464	464
q12	752	600	594	594
q13	17425	2959	2987	2959
q14	292	261	261	261
q15	518	472	471	471
q16	474	410	420	410
q17	1728	1498	1448	1448
q18	7519	7570	7389	7389
q19	1696	1464	1497	1464
q20	1978	1805	1759	1759
q21	4809	4715	4752	4715
q22	614	523	529	523
Total cold run time: 70869 ms
Total hot run time: 53561 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 171638 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 170be7e07522f0c3b29bf0df991a4e8751fd3350, data reload: false

query1	935	373	373	373
query2	6473	2443	2298	2298
query3	6644	204	204	204
query4	19841	17391	17236	17236
query5	4153	464	486	464
query6	245	160	171	160
query7	4603	298	285	285
query8	309	290	290	290
query9	8414	2390	2345	2345
query10	586	304	271	271
query11	10566	10010	9999	9999
query12	139	86	83	83
query13	1640	371	358	358
query14	9356	6669	7347	6669
query15	232	199	185	185
query16	7776	255	257	255
query17	1896	528	515	515
query18	1962	267	268	267
query19	190	189	148	148
query20	91	80	78	78
query21	204	129	124	124
query22	4452	4002	4010	4002
query23	33600	33053	32750	32750
query24	11805	2883	2857	2857
query25	660	350	348	348
query26	1846	151	152	151
query27	2965	300	306	300
query28	7575	2036	2011	2011
query29	1147	608	616	608
query30	284	149	152	149
query31	959	745	756	745
query32	88	51	56	51
query33	778	293	276	276
query34	959	474	460	460
query35	741	627	605	605
query36	1084	934	923	923
query37	193	69	68	68
query38	2847	2725	2746	2725
query39	850	792	790	790
query40	275	124	128	124
query41	56	54	52	52
query42	123	97	103	97
query43	564	529	549	529
query44	1253	721	727	721
query45	192	164	168	164
query46	1080	741	709	709
query47	1874	1799	1807	1799
query48	366	282	296	282
query49	1175	407	395	395
query50	764	378	380	378
query51	6792	6868	6638	6638
query52	99	100	93	93
query53	361	288	283	283
query54	886	423	432	423
query55	74	72	73	72
query56	273	251	255	251
query57	1129	1039	1049	1039
query58	246	227	266	227
query59	3507	3034	3284	3034
query60	295	268	300	268
query61	93	92	92	92
query62	648	448	443	443
query63	315	284	282	282
query64	9821	2258	1750	1750
query65	3137	3153	3109	3109
query66	1323	347	392	347
query67	15613	15043	15071	15043
query68	4610	525	534	525
query69	465	291	309	291
query70	1200	1141	1131	1131
query71	390	261	266	261
query72	7184	5521	5424	5424
query73	740	325	315	315
query74	5961	5528	5547	5528
query75	3440	2658	2653	2653
query76	2625	962	892	892
query77	446	322	290	290
query78	10277	9786	9778	9778
query79	2776	513	501	501
query80	1118	449	452	449
query81	583	220	231	220
query82	955	110	101	101
query83	227	167	169	167
query84	246	87	87	87
query85	1644	288	272	272
query86	502	329	337	329
query87	3263	3085	3045	3045
query88	4410	2346	2332	2332
query89	480	363	378	363
query90	1791	186	185	185
query91	127	97	98	97
query92	65	47	51	47
query93	1828	513	495	495
query94	1204	182	180	180
query95	404	301	311	301
query96	573	272	266	266
query97	3230	3061	3045	3045
query98	218	205	202	202
query99	1115	840	796	796
Total cold run time: 275519 ms
Total hot run time: 171638 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.74 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 170be7e07522f0c3b29bf0df991a4e8751fd3350, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.04
query3	0.23	0.06	0.06
query4	1.66	0.07	0.09
query5	0.50	0.49	0.49
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.50	0.49
query10	0.55	0.57	0.55
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.59	0.60	0.58
query14	0.76	0.76	0.78
query15	0.83	0.82	0.81
query16	0.36	0.38	0.37
query17	0.95	0.98	0.95
query18	0.22	0.24	0.23
query19	1.81	1.73	1.72
query20	0.01	0.01	0.00
query21	15.45	0.66	0.66
query22	4.35	6.68	2.13
query23	18.28	1.40	1.27
query24	2.12	0.23	0.22
query25	0.15	0.09	0.08
query26	0.26	0.18	0.18
query27	0.08	0.09	0.08
query28	13.23	1.02	1.00
query29	12.63	3.32	3.29
query30	0.26	0.07	0.06
query31	2.86	0.38	0.38
query32	3.29	0.47	0.47
query33	2.88	2.92	2.90
query34	17.10	4.41	4.34
query35	4.44	4.48	4.45
query36	0.65	0.48	0.48
query37	0.19	0.15	0.15
query38	0.15	0.14	0.15
query39	0.04	0.03	0.03
query40	0.18	0.14	0.14
query41	0.10	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.42 s
Total hot run time: 30.74 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 17, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@zfr9527 zfr9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@starocean999 starocean999 merged commit c199947 into apache:master Jun 17, 2024
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
…e function is distinct (#36318)

## Proposed changes

This extend the query rewrite by materialized view ability
For example mv def is
>            CREATE MATERIALIZED VIEW mv1
>             BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1') 
>             AS
>              select
>              count(o_totalprice),
>              o_shippriority,
>              o_orderstatus,
>              bin(o_orderkey)
>              from orders
>              group by
>              o_orderstatus,
>              o_shippriority,
>              bin(o_orderkey);

the query as following can be rewritten by materialized view
successfully
though `sum(distinct o_shippriority)` in query is not appear in mv
output, but query aggregate function is distinct and it use
the group by dimension in mv, in this scene, the `sum(distinct
o_shippriority)` can use mv group dimension `o_shippriority`
directly and the result is true.

Suppport the following distinct aggregate function currently, others are
supported in the furture on demand

- max(distinct arg)
- min(distinct arg)
- sum(distinct arg)
- avg(distinct arg)
- count(distinct arg)

>             select 
>             count(o_totalprice),
>              max(distinct o_shippriority),
>              min(distinct o_shippriority),
>              avg(distinct o_shippriority),
> sum(distinct o_shippriority) / count(distinct o_shippriority)
>              o_orderstatus,
>              bin(o_orderkey)
>              from orders
 >             group by
 >             o_orderstatus,
 >             bin(o_orderkey);
seawinde added a commit to seawinde/doris that referenced this pull request Jul 11, 2024
…e function is distinct (apache#36318)

## Proposed changes

This extend the query rewrite by materialized view ability
For example mv def is
>            CREATE MATERIALIZED VIEW mv1
>             BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1') 
>             AS
>              select
>              count(o_totalprice),
>              o_shippriority,
>              o_orderstatus,
>              bin(o_orderkey)
>              from orders
>              group by
>              o_orderstatus,
>              o_shippriority,
>              bin(o_orderkey);

the query as following can be rewritten by materialized view
successfully
though `sum(distinct o_shippriority)` in query is not appear in mv
output, but query aggregate function is distinct and it use
the group by dimension in mv, in this scene, the `sum(distinct
o_shippriority)` can use mv group dimension `o_shippriority`
directly and the result is true.

Suppport the following distinct aggregate function currently, others are
supported in the furture on demand

- max(distinct arg)
- min(distinct arg)
- sum(distinct arg)
- avg(distinct arg)
- count(distinct arg)

>             select 
>             count(o_totalprice),
>              max(distinct o_shippriority),
>              min(distinct o_shippriority),
>              avg(distinct o_shippriority),
> sum(distinct o_shippriority) / count(distinct o_shippriority)
>              o_orderstatus,
>              bin(o_orderkey)
>              from orders
 >             group by
 >             o_orderstatus,
 >             bin(o_orderkey);
morrySnow pushed a commit that referenced this pull request Jul 12, 2024
cherry-pick from master
pr: #36318
commitId: c199947

pr: #36111
commitId: 35ebef6

pr: #36175
commitId: 4c8e66b

pr: #36414
commitId: 5e009b5

pr: #36770
commitId: 19e2126

pr: #36567
commitId: 3da8351
morrySnow pushed a commit that referenced this pull request Aug 13, 2025
…ys nullable (#52960)

### What problem does this PR solve?

Related PR: #36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
…ys nullable (#52960)

### What problem does this PR solve?

Related PR: #36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
…ys nullable (#52960)

### What problem does this PR solve?

Related PR: #36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
…ys nullable (#52960)

### What problem does this PR solve?

Related PR: #36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
seawinde added a commit to seawinde/doris that referenced this pull request Aug 14, 2025
…ys nullable (apache#52960)

### What problem does this PR solve?

Related PR: apache#36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
seawinde added a commit to seawinde/doris that referenced this pull request Aug 14, 2025
…ys nullable (apache#52960)

### What problem does this PR solve?

Related PR: apache#36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
seawinde added a commit to seawinde/doris that referenced this pull request Aug 14, 2025
…ys nullable (apache#52960)

### What problem does this PR solve?

Related PR: apache#36318

Problem Summary:

materaialized view def is as fllowing:

create materialized view as
select k1, k3, sum(k2), count(k4) from ${tblName} group by k1, k3;

`sum(k2) ` nullable is true
if query is as following, would rewritten fail with err info 'query
aggregate function roll up fail', the pr fix this

select sum(distinct k1) from agg_use_key_direct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants