Skip to content

Conversation

@zddr
Copy link
Contributor

@zddr zddr commented Jun 20, 2025

zddr and others added 21 commits June 19, 2025 17:27
PaimonUtil
PaimonPartitionInfo
PaimonSchemaCacheValue
PaimonExternalTable
use latest
Previously, when using Paimon to create MTMV, it was not possible to
perceive changes in partition lists and data, so only `refresh
materialized view mv1 complete` could be used to force full refresh.

This PR obtains the partition list of Paimon, the last update time of
the partition, and the latest snapshotId of the table.

Therefore, MTMV can be partitioned based on Paimon tables and perceive
changes in data, automatically refreshing partitions

mtmv support paimon partition refresh
apache#44419)

When using the mvcc table to obtain partition snapshots and other
operations, the snapshotId parameter needs to be included
…ad of partitionId (apache#44415)

The partition ID of external data sources is meaningless, and some data
sources only have partition names, so the return result of partition
pruning is replaced with name instead of ID
…44567)

Previously, external partition cropping only supported Hive. If you want
to support other types of tables, you need to understand the internal
processing logic of partition pruning.

This PR abstracts the logic of partition pruning, and other tables can
be implemented by simply covering a few methods of externalTable

[opt](planner) Unified external partition prune interface
…resh and partition pruning (apache#44673)

- Add `MvccTable` to represent a table that supports querying specified
version data
- Add the `MvccSnapshot` interface to store snapshot information of mvcc
at a certain moment in time
- Add the `MvccSnapshot` parameter to the method of the
`MTMVRelatedTableIf `interface to retrieve data of a specified version
- Partition pruning related methods combined with the `MvccSnapshot`
parameter are used to obtain partition information for a specified
version
- Load the snapshot information of mvccTable at the beginning of the
query plan and store it in StatementContext

Unified external table interface supporting partition refresh and
partition pruning
Previously, transparent rewriting of the external table could only be
done as a whole or without rewriting.

Now supports partial partition rewriting and direct lookup of the base
table for some partitions.

mtmv partition rewrite support external table
In the previous PR, a snapshot of the table was obtained and stored in
the statementContext at the beginning of the query.
The modification of this PR is to ensure that the same metadata is used
during the query process. When calling the relevant interface, snapshot
needs to be obtained from statementContext as a parameter and passed to
the relevant method

Related PR: apache#44911 apache#44673
…the latest data (apache#44911)

Problem Summary:
- add `PaimonMetadataCacheMgr` in `ExternalMetaCacheMgr` to manage
snapshotCache of paimon table
- move paimonSchemaCache to PaimonMetadataCacheMgr, and add schemaId as
part of key
- PaimonExternalTable overrides the methods in ExternalTable and
supports partition pruning
- PaimonExternalTable implements the MvcTable interface, supporting the
retrieval of snapshot data from the cache during queries to avoid cache
refreshes that may result in different versions of metadata being used
in a single query
- MTMVTask retrieves snapshot data of mvccTable before the task starts
to avoid cache refresh that may result in different versions of metadata
being used in a single refresh task

Paimon queries the data in the cache instead of querying the latest data

behavior changes of query  paimon table:
- FE has just started and is query the latest data
- Paimon data has changed, Doris is still query the previous data
- After the snapshot cache expires, Doris will query the latest data
- desc paimon; The schema corresponding to the snapshotId in the
snapshot cache is displayed
Previously, when using Iceberg to create MTMV, it was not possible to
perceive changes in partition lists and data, so only ```refresh
materialized view mv1 complete ```could be used to force full refresh.

This PR obtains the partition list of Iceberg, the last update time of
the partition, and the latest snapshotId of the table.

Therefore, MTMV can be partition based on Iceberg tables and perceive
changes in data, automatically refreshing partitions

For now, we only support single partition column table and the partition
transform must one of hour, day, month or year.
Will support Identity transform soon.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None
…he#45652)

- MTMV allow paimon table has multi partition keys
- add case
1. Implement MvccTable interface for IcebertExternalTable
2. IcebergExternalTable overrides the methods in ExternalTable and
supports partition pruning
3. Add snapshot cache in IcebergMetadataCache to store
IcebergExternalTable partition infos.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None
…more test case for iceberg mtmv. (apache#46257)

### What problem does this PR solve?

Support show iceberg external table partition.
We convert iceberg partition to doris range partition in
IcebergExternalTable. This PR add show partition function for
IcebergExternalTable, this make it possible to add regression test.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None
…e more readable (apache#47166)

- before, paimon and iceberg put snapshotId to MTMVVersionSnapshot ,now
change to MTMVSnapshotIdSnapshot
- `compatiblePartitions` only consider OlapTable, because other
TableType not has history data
- Delete constructor methods without id in MTMVVersionSnapshot to avoid misuse
…efore run async mv task (apache#48172)

Problem Summary:

before this PR,  external catalog metadata will be sync
when refresh async mv that based on external table.
after this PR, remove sync metadata action, but the data
in async mv still consistent with query in Doris on external table.

metadata cache of external table no longer be refreshed before run async mv task
…resh feature for Hudi external tables. (apache#49956)

Problem Summary:

Support asynchronous materialized view partition refresh feature for
Hudi external tables.
…pache#50979)

Problem Summary:

related pr: apache#48172

This pr(apache#48172) had changed the code logical of method
`beforeMTMVRefresh`, but this pr(apache#49956) added the code back.
So we delete this code.
@zddr zddr requested a review from morrySnow as a code owner June 20, 2025 06:21
@Thearas
Copy link
Contributor

Thearas commented Jun 20, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zddr
Copy link
Contributor Author

zddr commented Jun 20, 2025

run buildall

…mon table as an unpartitioned table (apache#46641)

When retrieving data of type Paimon Date in version 0.9 from the system
table, the value is an integer and cannot be converted to type Date.

This issue has been fixed in Paimon's latest code.

This PR downgrades this situation without affecting user data queries
@zddr
Copy link
Contributor Author

zddr commented Jun 20, 2025

run buildall

@zddr
Copy link
Contributor Author

zddr commented Jun 20, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39924 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2f0af605f77b2db009daa9d8c7c9b93b3f558ebe, data reload: false

------ Round 1 ----------------------------------
q1	17621	6843	6647	6647
q2	2068	171	184	171
q3	10575	1083	1178	1083
q4	10576	774	719	719
q5	7800	2886	2858	2858
q6	222	136	140	136
q7	979	612	594	594
q8	9593	1983	2070	1983
q9	7211	6390	6433	6390
q10	6971	2265	2329	2265
q11	460	267	254	254
q12	404	220	222	220
q13	17796	2970	2969	2969
q14	236	216	207	207
q15	527	470	491	470
q16	485	390	375	375
q17	1014	590	564	564
q18	7280	6691	6667	6667
q19	1388	1134	1026	1026
q20	496	207	202	202
q21	3974	3243	3142	3142
q22	1104	1001	982	982
Total cold run time: 108780 ms
Total hot run time: 39924 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6594	6610	6594	6594
q2	324	233	236	233
q3	2922	2795	2966	2795
q4	1990	1771	1805	1771
q5	5759	5790	5825	5790
q6	206	130	127	127
q7	2208	1840	1792	1792
q8	3436	3574	3525	3525
q9	8960	8937	9004	8937
q10	3592	3539	3551	3539
q11	596	492	536	492
q12	809	589	587	587
q13	7599	3131	3186	3131
q14	299	262	271	262
q15	522	469	458	458
q16	490	455	465	455
q17	1858	1638	1593	1593
q18	8219	7818	7824	7818
q19	1731	1641	1639	1639
q20	2077	1795	1824	1795
q21	5416	4958	5065	4958
q22	1116	1069	1058	1058
Total cold run time: 66723 ms
Total hot run time: 59349 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197124 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2f0af605f77b2db009daa9d8c7c9b93b3f558ebe, data reload: false

query1	1284	911	906	906
query2	6379	1965	1988	1965
query3	10963	4443	4438	4438
query4	61381	29463	23478	23478
query5	5296	449	442	442
query6	410	202	196	196
query7	5444	322	310	310
query8	309	219	225	219
query9	8721	2575	2566	2566
query10	496	277	266	266
query11	17930	15165	15759	15165
query12	170	114	105	105
query13	1465	425	429	425
query14	9886	7408	7247	7247
query15	201	186	180	180
query16	7147	490	495	490
query17	1154	580	578	578
query18	1860	331	320	320
query19	211	160	160	160
query20	121	111	112	111
query21	201	107	111	107
query22	4766	4418	4658	4418
query23	34799	34075	34028	34028
query24	6196	2956	2922	2922
query25	527	426	431	426
query26	660	176	166	166
query27	1829	354	362	354
query28	3728	2127	2128	2127
query29	685	452	443	443
query30	237	152	155	152
query31	961	826	836	826
query32	70	64	58	58
query33	477	300	301	300
query34	936	507	531	507
query35	835	728	721	721
query36	1068	966	986	966
query37	111	69	66	66
query38	4052	3961	3999	3961
query39	1516	1479	1485	1479
query40	202	107	104	104
query41	51	50	50	50
query42	124	106	102	102
query43	531	509	510	509
query44	1202	817	810	810
query45	188	169	167	167
query46	1133	749	728	728
query47	1989	1931	1897	1897
query48	495	373	388	373
query49	728	378	394	378
query50	847	462	436	436
query51	7493	7184	7130	7130
query52	103	93	91	91
query53	262	193	194	193
query54	577	479	462	462
query55	80	80	79	79
query56	276	248	287	248
query57	1332	1219	1207	1207
query58	225	216	224	216
query59	3226	3133	3067	3067
query60	283	260	254	254
query61	110	106	104	104
query62	782	675	682	675
query63	210	186	182	182
query64	1459	668	642	642
query65	3267	3210	3203	3203
query66	704	289	301	289
query67	15905	15541	15606	15541
query68	4318	587	579	579
query69	447	267	270	267
query70	1170	1121	1086	1086
query71	347	264	259	259
query72	6392	3982	3962	3962
query73	757	350	354	350
query74	10261	8963	9279	8963
query75	3243	2666	2665	2665
query76	2098	1112	1047	1047
query77	476	281	298	281
query78	10709	9532	9521	9521
query79	1836	599	614	599
query80	1343	425	430	425
query81	522	223	225	223
query82	1219	92	87	87
query83	270	138	149	138
query84	278	79	76	76
query85	1041	317	285	285
query86	363	266	266	266
query87	4430	4252	4221	4221
query88	3951	2401	2375	2375
query89	409	295	297	295
query90	2048	188	189	188
query91	186	146	154	146
query92	65	50	53	50
query93	2331	545	548	545
query94	803	294	291	291
query95	365	256	260	256
query96	631	290	283	283
query97	3301	3176	3201	3176
query98	216	195	202	195
query99	1623	1293	1319	1293
Total cold run time: 315715 ms
Total hot run time: 197124 ms

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run buildall

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run buildall

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run buildall

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run p0

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run cloud_p0

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39673 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 327e49961cfd8f6619c31c3f040ec0d79c524061, data reload: false

------ Round 1 ----------------------------------
q1	17572	6775	6602	6602
q2	2077	169	172	169
q3	10588	1116	1168	1116
q4	10573	753	785	753
q5	7760	2828	2831	2828
q6	215	135	135	135
q7	968	609	609	609
q8	9368	1956	2020	1956
q9	6633	6369	6398	6369
q10	6969	2276	2281	2276
q11	463	263	255	255
q12	395	211	208	208
q13	17790	2976	3007	2976
q14	225	221	212	212
q15	508	448	473	448
q16	443	384	376	376
q17	942	583	527	527
q18	7249	6772	6686	6686
q19	1326	1089	956	956
q20	468	202	200	200
q21	3892	3055	3112	3055
q22	1114	1035	961	961
Total cold run time: 107538 ms
Total hot run time: 39673 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6585	6585	6536	6536
q2	331	241	232	232
q3	2955	2740	2810	2740
q4	2022	1819	1766	1766
q5	5720	5732	5688	5688
q6	213	128	129	128
q7	2133	1817	1826	1817
q8	3376	3540	3554	3540
q9	8934	8779	8953	8779
q10	3591	3539	3557	3539
q11	594	493	483	483
q12	806	598	628	598
q13	9574	3112	3194	3112
q14	286	290	265	265
q15	515	462	478	462
q16	492	435	447	435
q17	1833	1632	1593	1593
q18	8142	7671	7740	7671
q19	1681	1466	1601	1466
q20	2157	1873	1856	1856
q21	5098	5135	5042	5042
q22	1134	1060	1066	1060
Total cold run time: 68172 ms
Total hot run time: 58808 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196106 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 327e49961cfd8f6619c31c3f040ec0d79c524061, data reload: false

query1	1298	906	894	894
query2	6368	1998	1934	1934
query3	10993	4448	4530	4448
query4	61867	29181	23433	23433
query5	5154	449	438	438
query6	415	193	169	169
query7	5423	305	321	305
query8	319	226	233	226
query9	8487	2581	2575	2575
query10	457	289	258	258
query11	17955	15224	15772	15224
query12	160	105	109	105
query13	1427	441	421	421
query14	10423	7247	6685	6685
query15	203	190	186	186
query16	7193	469	516	469
query17	1175	594	596	594
query18	1838	319	327	319
query19	209	160	169	160
query20	122	105	117	105
query21	210	103	102	102
query22	4668	4544	4507	4507
query23	34303	34166	33801	33801
query24	6104	2998	2922	2922
query25	549	426	429	426
query26	687	182	172	172
query27	2222	355	364	355
query28	3890	2157	2163	2157
query29	721	481	475	475
query30	230	160	155	155
query31	996	826	823	823
query32	68	55	58	55
query33	453	302	295	295
query34	910	506	512	506
query35	854	718	732	718
query36	1054	959	991	959
query37	122	75	68	68
query38	3988	3939	3950	3939
query39	1530	1480	1463	1463
query40	213	99	99	99
query41	47	46	46	46
query42	115	105	119	105
query43	529	488	507	488
query44	1200	831	817	817
query45	187	170	176	170
query46	1147	721	720	720
query47	2057	1939	1952	1939
query48	436	343	340	340
query49	741	409	396	396
query50	857	436	440	436
query51	7420	7293	7391	7293
query52	105	92	92	92
query53	265	187	188	187
query54	580	471	464	464
query55	80	80	82	80
query56	255	258	259	258
query57	1305	1240	1198	1198
query58	212	209	221	209
query59	3315	3102	3003	3003
query60	277	257	248	248
query61	113	107	109	107
query62	772	692	687	687
query63	213	182	187	182
query64	1368	649	625	625
query65	3257	3218	3175	3175
query66	704	288	287	287
query67	16084	15551	15537	15537
query68	4253	621	590	590
query69	422	261	253	253
query70	1160	1096	1125	1096
query71	357	255	258	255
query72	6315	4002	4007	4002
query73	745	349	363	349
query74	10363	8838	9333	8838
query75	3348	2595	2657	2595
query76	2152	1107	1040	1040
query77	498	272	268	268
query78	10632	9628	9520	9520
query79	2017	612	609	609
query80	916	421	415	415
query81	524	218	217	217
query82	174	90	86	86
query83	167	139	139	139
query84	286	75	79	75
query85	913	308	290	290
query86	387	301	292	292
query87	4413	4233	4228	4228
query88	4525	2354	2339	2339
query89	407	286	286	286
query90	1944	187	185	185
query91	138	108	107	107
query92	56	49	50	49
query93	2118	552	562	552
query94	787	269	294	269
query95	367	263	258	258
query96	615	280	280	280
query97	3306	3119	3175	3119
query98	206	206	199	199
query99	1571	1272	1284	1272
Total cold run time: 314981 ms
Total hot run time: 196106 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.36 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 327e49961cfd8f6619c31c3f040ec0d79c524061, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.06
query4	1.62	0.10	0.10
query5	0.53	0.50	0.52
query6	1.15	0.72	0.75
query7	0.02	0.02	0.01
query8	0.05	0.03	0.03
query9	0.58	0.50	0.50
query10	0.57	0.58	0.56
query11	0.15	0.11	0.11
query12	0.14	0.11	0.10
query13	0.61	0.60	0.59
query14	0.77	0.81	0.80
query15	0.84	0.82	0.82
query16	0.37	0.37	0.40
query17	1.05	1.06	1.07
query18	0.24	0.22	0.21
query19	2.01	1.87	1.79
query20	0.02	0.01	0.01
query21	15.42	0.60	0.59
query22	2.33	2.16	2.11
query23	16.98	0.93	0.76
query24	2.76	1.77	1.76
query25	0.23	0.11	0.18
query26	0.44	0.13	0.13
query27	0.04	0.05	0.04
query28	9.42	0.49	0.49
query29	12.60	3.20	3.18
query30	0.25	0.06	0.06
query31	2.86	0.40	0.39
query32	3.24	0.46	0.45
query33	2.99	2.97	3.01
query34	17.09	4.55	4.56
query35	4.55	4.58	4.64
query36	0.67	0.48	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.12
query41	0.07	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.41 s
Total hot run time: 31.36 s

@zddr
Copy link
Contributor Author

zddr commented Jun 23, 2025

run cloud_p0

@morrySnow morrySnow merged commit 4234a27 into apache:branch-3.1 Jun 23, 2025
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants