Skip to content

Conversation

@seawinde
Copy link
Contributor

What problem does this PR solve?

cherry-pick: #48222
commitId: 3806b97

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…lid (apache#48222)

### What problem does this PR solve?
Fix nest mtmv rewrite fail when bottom mtmv cache is invalid

such as bottom mv is mv_1 and mv_2 as following:

**mv_1**
        select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
        sum(o_totalprice) as sum_total,
        max(o_totalprice) as max_total,
        min(o_totalprice) as min_total,
        count(*) as count_all,
        bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1,
        bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2
        from lineitem_1
        inner join orders_1
        on lineitem_1.l_orderkey = orders_1.o_orderkey
        where lineitem_1.l_shipdate >= "2023-10-17"
        group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey

**mv_2**
        select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey,
        t.agg1 as agg1,
        t.sum_total as agg3,
        t.max_total as agg4,
        t.min_total as agg5,
        t.count_all as agg6,
        cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
        from ${mv_1} as t
        inner join partsupp_1
        on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
        where partsupp_1.ps_suppkey > 1
        group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6

   **mv_3**
select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6
        from ${mv_2} as t1
        left join ${mv_2} as t2
        on t1.l_orderkey = t2.l_orderkey
        where t1.l_orderkey > 1
        group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6

```

query as following would fail if mtmvCache invalid in mv_1 and mv_2, the
pr fix this

```sql
select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6
        from (
            select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey,
            t.agg1 as agg1,
            t.sum_total as agg3,
            t.max_total as agg4,
            t.min_total as agg5,
            t.count_all as agg6,
            cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
            from (
                select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
                sum(o_totalprice) as sum_total,
                max(o_totalprice) as max_total,
                min(o_totalprice) as min_total,
                count(*) as count_all,
                bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1,
                bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2
                from lineitem_1
                inner join orders_1
                on lineitem_1.l_orderkey = orders_1.o_orderkey
                where lineitem_1.l_shipdate >= "2023-10-17"
                group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
            ) as t
            inner join partsupp_1
            on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
            where partsupp_1.ps_suppkey > 1
            group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6
        ) as t1
        left join (
            select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey,
            t.agg1 as agg1,
            t.sum_total as agg3,
            t.max_total as agg4,
            t.min_total as agg5,
            t.count_all as agg6,
            cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
            from (
                select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
                sum(o_totalprice) as sum_total,
                max(o_totalprice) as max_total,
                min(o_totalprice) as min_total,
                count(*) as count_all,
                bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1,
                bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2
                from lineitem_1
                inner join orders_1
                on lineitem_1.l_orderkey = orders_1.o_orderkey
                where lineitem_1.l_shipdate >= "2023-10-17"
                group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
            ) as t
            inner join partsupp_1
            on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
            where partsupp_1.ps_suppkey > 1
            group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6
        ) as t2
        on t1.l_orderkey = t2.l_orderkey
        where t1.l_orderkey > 1
        group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6
@seawinde seawinde requested a review from dataroaring as a code owner March 17, 2025 02:43
@Thearas
Copy link
Contributor

Thearas commented Mar 17, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40070 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 254c6e5ae9a1857b8f82cfed964f3cbff27e22fa, data reload: false

------ Round 1 ----------------------------------
q1	17577	6939	6675	6675
q2	2063	168	159	159
q3	10603	1100	1121	1100
q4	10566	730	846	730
q5	7768	2904	2908	2904
q6	224	135	140	135
q7	956	639	604	604
q8	9362	1937	2033	1937
q9	6615	6420	6377	6377
q10	7031	2219	2319	2219
q11	467	280	261	261
q12	400	206	204	204
q13	17791	2999	2992	2992
q14	241	213	214	213
q15	513	480	459	459
q16	666	600	600	600
q17	1007	564	522	522
q18	7256	6732	6623	6623
q19	1424	1049	1078	1049
q20	465	202	199	199
q21	3931	3137	3096	3096
q22	1084	1023	1012	1012
Total cold run time: 108010 ms
Total hot run time: 40070 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6601	6537	6552	6537
q2	331	233	238	233
q3	2877	2750	2887	2750
q4	2005	1802	1798	1798
q5	5748	5753	5765	5753
q6	210	127	127	127
q7	2243	1863	1806	1806
q8	3397	3566	3496	3496
q9	8887	8943	8938	8938
q10	3583	3526	3505	3505
q11	588	510	497	497
q12	819	600	608	600
q13	9053	3302	3116	3116
q14	312	275	286	275
q15	524	466	469	466
q16	695	660	667	660
q17	1821	1620	1625	1620
q18	8362	7795	7692	7692
q19	1658	1446	1426	1426
q20	2157	1869	1897	1869
q21	5518	5260	5303	5260
q22	1169	1064	1012	1012
Total cold run time: 68558 ms
Total hot run time: 59436 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197439 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 254c6e5ae9a1857b8f82cfed964f3cbff27e22fa, data reload: false

query1	1325	934	920	920
query2	6229	2036	2013	2013
query3	10953	4558	4344	4344
query4	66512	28902	23269	23269
query5	4923	461	470	461
query6	410	164	175	164
query7	5679	305	306	305
query8	305	219	222	219
query9	9278	2629	2596	2596
query10	467	275	256	256
query11	17409	15415	15902	15415
query12	166	106	107	106
query13	1558	440	443	440
query14	9724	7773	6766	6766
query15	215	191	177	177
query16	7151	431	530	431
query17	1117	594	563	563
query18	1882	303	303	303
query19	236	155	155	155
query20	117	111	109	109
query21	210	105	105	105
query22	4820	4617	4578	4578
query23	34504	34139	34092	34092
query24	6075	2903	2871	2871
query25	533	396	410	396
query26	664	164	167	164
query27	1814	358	367	358
query28	4189	2432	2454	2432
query29	728	472	451	451
query30	243	168	168	168
query31	1018	855	810	810
query32	67	57	57	57
query33	424	319	311	311
query34	902	504	511	504
query35	832	725	740	725
query36	1104	979	966	966
query37	118	79	69	69
query38	4100	4026	4002	4002
query39	1562	1471	1435	1435
query40	206	101	103	101
query41	52	51	50	50
query42	116	101	101	101
query43	532	505	495	495
query44	1196	835	854	835
query45	188	175	171	171
query46	1149	739	745	739
query47	2073	2015	2003	2003
query48	474	395	405	395
query49	744	406	406	406
query50	832	427	425	425
query51	7426	7195	7179	7179
query52	101	91	96	91
query53	261	184	187	184
query54	559	452	449	449
query55	76	79	85	79
query56	268	253	255	253
query57	1259	1141	1125	1125
query58	229	211	218	211
query59	3313	3122	2987	2987
query60	287	256	262	256
query61	107	116	108	108
query62	790	669	667	667
query63	224	185	187	185
query64	1401	678	649	649
query65	3277	3204	3199	3199
query66	731	299	289	289
query67	15854	15524	15633	15524
query68	4033	581	575	575
query69	437	271	264	264
query70	1177	1180	1078	1078
query71	361	265	264	264
query72	6392	4027	4032	4027
query73	745	353	363	353
query74	10212	9181	9288	9181
query75	3363	2675	2663	2663
query76	1928	1083	1029	1029
query77	453	306	287	287
query78	10648	9701	9616	9616
query79	1290	604	600	600
query80	844	415	422	415
query81	504	241	237	237
query82	1291	87	83	83
query83	240	140	148	140
query84	278	83	76	76
query85	875	313	290	290
query86	345	304	299	299
query87	4515	4256	4430	4256
query88	3726	2389	2347	2347
query89	421	302	291	291
query90	2022	184	189	184
query91	176	149	168	149
query92	67	48	50	48
query93	1709	557	562	557
query94	880	289	291	289
query95	358	259	249	249
query96	607	278	278	278
query97	3311	3156	3240	3156
query98	211	205	197	197
query99	1591	1308	1299	1299
Total cold run time: 318298 ms
Total hot run time: 197439 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 254c6e5ae9a1857b8f82cfed964f3cbff27e22fa, data reload: false

query1	0.03	0.02	0.03
query2	0.06	0.03	0.03
query3	0.24	0.06	0.06
query4	1.62	0.11	0.11
query5	0.51	0.51	0.50
query6	1.13	0.72	0.72
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.56	0.50	0.49
query10	0.55	0.54	0.54
query11	0.15	0.11	0.11
query12	0.14	0.11	0.11
query13	0.60	0.62	0.60
query14	2.73	2.76	2.89
query15	0.89	0.82	0.82
query16	0.42	0.42	0.38
query17	1.05	1.05	1.10
query18	0.24	0.22	0.22
query19	1.96	1.93	2.04
query20	0.01	0.01	0.01
query21	15.36	0.60	0.60
query22	2.57	2.86	1.59
query23	16.97	1.00	0.83
query24	3.52	1.37	1.07
query25	0.29	0.11	0.04
query26	0.57	0.14	0.14
query27	0.05	0.04	0.04
query28	9.71	0.47	0.46
query29	12.59	3.25	3.22
query30	0.24	0.06	0.06
query31	2.87	0.38	0.38
query32	3.27	0.47	0.46
query33	2.98	2.96	3.06
query34	17.07	4.56	4.53
query35	4.62	4.54	4.53
query36	0.67	0.48	0.50
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 106.78 s
Total hot run time: 32.22 s

@morrySnow morrySnow changed the title [fix](mtmv) Fix nest mtmv rewrite fail when bottom mtmv cache is invalid (#48222) branch-3.0: [fix](mtmv) Fix nest mtmv rewrite fail when bottom mtmv cache is invalid #48222 Mar 18, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 21d66ca into apache:branch-3.0 Mar 19, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants