Skip to content

Conversation

@seawinde
Copy link
Contributor

Proposed changes

Support query rewrite by materialized view when query is aggregate and materialized view has no aggregate
this maybe improve query spped, because it can save expression evaluation by use the expression result in materialized view.

this also support single table rewrite.

For example as follwoing:
mv def is:

       CREATE MATERIALIZED VIEW mv1
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1') 
        AS
         select case when o_shippriority > 1 and o_orderkey IN (4, 5) then o_custkey else o_shippriority end,
         o_orderstatus,
         bin(o_orderkey),
         l_suppkey,
         l_linenumber
         from orders
         left join lineitem on o_orderkey =  l_orderkey;

the query as following can be rewritten by mv successfully

        select
        count(case when o_shippriority > 1 and o_orderkey IN (4, 5) then o_custkey else o_shippriority end) as count_case,
        o_orderstatus,
        bin(o_orderkey)
        from orders
        left join lineitem on o_orderkey =  l_orderkey
        where l_linenumber = 4
       group by
        o_orderstatus,
        bin(o_orderkey);

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39747 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8eec5d10f6e0b1bdff1f2a38404559864e87a6a1, data reload: false

------ Round 1 ----------------------------------
q1	17635	4311	4231	4231
q2	2033	198	191	191
q3	10465	1173	1112	1112
q4	10190	727	800	727
q5	7483	2676	2614	2614
q6	220	138	136	136
q7	953	608	605	605
q8	9228	2051	2081	2051
q9	9014	6489	6499	6489
q10	9059	3723	3698	3698
q11	449	237	243	237
q12	500	228	227	227
q13	17773	3004	3043	3004
q14	270	225	227	225
q15	504	474	465	465
q16	527	380	377	377
q17	966	678	773	678
q18	7900	7371	7320	7320
q19	8027	1515	1501	1501
q20	668	310	331	310
q21	4890	3219	3907	3219
q22	385	337	330	330
Total cold run time: 119139 ms
Total hot run time: 39747 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4338	4277	4249	4249
q2	356	270	298	270
q3	3077	2957	2946	2946
q4	1953	1682	1753	1682
q5	5521	5491	5449	5449
q6	240	134	135	134
q7	2251	1829	1862	1829
q8	3257	3403	3388	3388
q9	8670	8819	8695	8695
q10	4206	3710	3845	3710
q11	604	479	490	479
q12	833	688	649	649
q13	17420	3172	3176	3172
q14	301	292	276	276
q15	535	505	479	479
q16	488	435	438	435
q17	1820	1535	1505	1505
q18	8265	8097	7723	7723
q19	1867	1738	1745	1738
q20	3149	1925	1858	1858
q21	5884	4868	5080	4868
q22	626	547	569	547
Total cold run time: 75661 ms
Total hot run time: 56081 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173488 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8eec5d10f6e0b1bdff1f2a38404559864e87a6a1, data reload: false

query1	916	386	384	384
query2	6367	2376	2315	2315
query3	6629	205	205	205
query4	19024	17339	17408	17339
query5	3711	484	459	459
query6	239	163	162	162
query7	4577	297	293	293
query8	332	286	292	286
query9	8651	2444	2464	2444
query10	556	300	286	286
query11	10550	10092	10030	10030
query12	120	90	87	87
query13	1635	379	364	364
query14	10070	6945	7649	6945
query15	229	190	196	190
query16	7688	269	268	268
query17	1398	563	523	523
query18	1959	273	289	273
query19	220	159	155	155
query20	92	84	85	84
query21	203	129	132	129
query22	4395	4104	3884	3884
query23	33814	33793	33662	33662
query24	11059	2906	2904	2904
query25	615	416	404	404
query26	739	164	157	157
query27	2342	335	342	335
query28	6068	2122	2127	2122
query29	914	648	674	648
query30	256	158	164	158
query31	952	780	759	759
query32	89	55	56	55
query33	653	300	301	300
query34	896	499	493	493
query35	746	648	639	639
query36	1172	998	1004	998
query37	146	71	74	71
query38	2947	2868	2831	2831
query39	886	846	842	842
query40	208	133	137	133
query41	56	55	55	55
query42	112	107	112	107
query43	602	533	539	533
query44	1111	715	732	715
query45	198	171	164	164
query46	1065	732	723	723
query47	1861	1744	1780	1744
query48	382	296	311	296
query49	831	417	418	417
query50	762	400	403	400
query51	6746	6690	6679	6679
query52	108	90	95	90
query53	362	298	299	298
query54	876	452	468	452
query55	77	72	75	72
query56	284	263	261	261
query57	1117	1059	1050	1050
query58	267	255	250	250
query59	3314	3141	3000	3000
query60	320	286	286	286
query61	93	88	89	88
query62	605	447	437	437
query63	319	300	296	296
query64	8541	2309	1747	1747
query65	3152	3115	3133	3115
query66	742	336	325	325
query67	15532	14840	14754	14754
query68	6198	525	537	525
query69	646	464	413	413
query70	1162	1041	1109	1041
query71	463	279	272	272
query72	7188	5518	5248	5248
query73	788	331	327	327
query74	5868	5464	5576	5464
query75	3736	2628	2677	2628
query76	3151	977	930	930
query77	638	302	305	302
query78	10226	9872	9627	9627
query79	2270	527	526	526
query80	1449	539	479	479
query81	531	223	220	220
query82	770	104	109	104
query83	199	168	163	163
query84	270	85	88	85
query85	1150	284	284	284
query86	446	343	300	300
query87	3290	3092	3073	3073
query88	3833	2440	2444	2440
query89	473	404	392	392
query90	1711	193	193	193
query91	131	100	99	99
query92	64	52	50	50
query93	2469	511	504	504
query94	986	190	191	190
query95	418	326	326	326
query96	601	266	269	266
query97	3226	3039	3029	3029
query98	215	202	252	202
query99	1197	856	816	816
Total cold run time: 269261 ms
Total hot run time: 173488 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.1 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8eec5d10f6e0b1bdff1f2a38404559864e87a6a1, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.10	0.08
query5	0.52	0.47	0.48
query6	1.13	0.73	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.54	0.48	0.49
query10	0.53	0.55	0.53
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.59	0.61
query14	0.76	0.81	0.79
query15	0.84	0.82	0.81
query16	0.36	0.37	0.36
query17	1.01	0.97	1.02
query18	0.23	0.23	0.27
query19	1.80	1.73	1.69
query20	0.02	0.01	0.01
query21	15.41	0.66	0.64
query22	4.39	7.12	1.42
query23	18.24	1.38	1.34
query24	2.36	0.22	0.22
query25	0.15	0.11	0.09
query26	0.26	0.19	0.17
query27	0.09	0.09	0.08
query28	13.26	1.01	1.01
query29	12.66	3.24	3.26
query30	0.27	0.06	0.06
query31	2.87	0.39	0.40
query32	3.24	0.47	0.47
query33	2.89	2.93	2.89
query34	16.94	4.41	4.45
query35	4.47	4.48	4.45
query36	0.66	0.48	0.48
query37	0.17	0.16	0.16
query38	0.15	0.14	0.15
query39	0.04	0.03	0.04
query40	0.17	0.13	0.14
query41	0.10	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.05	0.04
Total cold run time: 109.65 s
Total hot run time: 30.1 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 14, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@starocean999 starocean999 merged commit 649f9bc into apache:master Jun 14, 2024
dataroaring pushed a commit that referenced this pull request Jun 17, 2024
…y is aggregate and materialized view has no aggregate (#36278)

## Proposed changes

Support query rewrite by materialized view when query is aggregate and
materialized view has no aggregate
this maybe improve query spped, because it can save expression
evaluation by use the expression result in materialized view.

this also support single table rewrite.

For example as follwoing:
mv def is:

>            CREATE MATERIALIZED VIEW mv1
>             BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1') 
>             AS
> select case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end,
>              o_orderstatus,
>              bin(o_orderkey),
>              l_suppkey,
>              l_linenumber
>              from orders
>              left join lineitem on o_orderkey =  l_orderkey;


the query as following can be rewritten by mv successfully

>             select
> count(case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end) as count_case,
>             o_orderstatus,
>             bin(o_orderkey)
>             from orders
>             left join lineitem on o_orderkey =  l_orderkey
>             where l_linenumber = 4
>            group by
>             o_orderstatus,
>             bin(o_orderkey);
seawinde added a commit to seawinde/doris that referenced this pull request Jun 20, 2024
…y is aggregate and materialized view has no aggregate (apache#36278)

## Proposed changes

Support query rewrite by materialized view when query is aggregate and
materialized view has no aggregate
this maybe improve query spped, because it can save expression
evaluation by use the expression result in materialized view.

this also support single table rewrite.

For example as follwoing:
mv def is:

>            CREATE MATERIALIZED VIEW mv1
>             BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1') 
>             AS
> select case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end,
>              o_orderstatus,
>              bin(o_orderkey),
>              l_suppkey,
>              l_linenumber
>              from orders
>              left join lineitem on o_orderkey =  l_orderkey;


the query as following can be rewritten by mv successfully

>             select
> count(case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end) as count_case,
>             o_orderstatus,
>             bin(o_orderkey)
>             from orders
>             left join lineitem on o_orderkey =  l_orderkey
>             where l_linenumber = 4
>            group by
>             o_orderstatus,
>             bin(o_orderkey);
seawinde added a commit to seawinde/doris that referenced this pull request Jul 8, 2024
…y is aggregate and materialized view has no aggregate (apache#36278)

## Proposed changes

Support query rewrite by materialized view when query is aggregate and
materialized view has no aggregate
this maybe improve query spped, because it can save expression
evaluation by use the expression result in materialized view.

this also support single table rewrite.

For example as follwoing:
mv def is:

>            CREATE MATERIALIZED VIEW mv1
>             BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1') 
>             AS
> select case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end,
>              o_orderstatus,
>              bin(o_orderkey),
>              l_suppkey,
>              l_linenumber
>              from orders
>              left join lineitem on o_orderkey =  l_orderkey;


the query as following can be rewritten by mv successfully

>             select
> count(case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end) as count_case,
>             o_orderstatus,
>             bin(o_orderkey)
>             from orders
>             left join lineitem on o_orderkey =  l_orderkey
>             where l_linenumber = 4
>            group by
>             o_orderstatus,
>             bin(o_orderkey);
seawinde added a commit to seawinde/doris that referenced this pull request Jul 10, 2024
…y is aggregate and materialized view has no aggregate (apache#36278)

## Proposed changes

Support query rewrite by materialized view when query is aggregate and
materialized view has no aggregate
this maybe improve query spped, because it can save expression
evaluation by use the expression result in materialized view.

this also support single table rewrite.

For example as follwoing:
mv def is:

>            CREATE MATERIALIZED VIEW mv1
>             BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1') 
>             AS
> select case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end,
>              o_orderstatus,
>              bin(o_orderkey),
>              l_suppkey,
>              l_linenumber
>              from orders
>              left join lineitem on o_orderkey =  l_orderkey;


the query as following can be rewritten by mv successfully

>             select
> count(case when o_shippriority > 1 and o_orderkey IN (4, 5) then
o_custkey else o_shippriority end) as count_case,
>             o_orderstatus,
>             bin(o_orderkey)
>             from orders
>             left join lineitem on o_orderkey =  l_orderkey
>             where l_linenumber = 4
>            group by
>             o_orderstatus,
>             bin(o_orderkey);
morrySnow pushed a commit that referenced this pull request Jul 11, 2024
…y is aggregate and materialized view has no aggregate (#36278) (#37497)

cherry-pick from master
pr: #36278
commitId: 649f9bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants