-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-3.1:[enhance](mtmv)MTMV support iceberg/hudi/paimon #52040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PaimonUtil PaimonPartitionInfo PaimonSchemaCacheValue PaimonExternalTable use latest Previously, when using Paimon to create MTMV, it was not possible to perceive changes in partition lists and data, so only `refresh materialized view mv1 complete` could be used to force full refresh. This PR obtains the partition list of Paimon, the last update time of the partition, and the latest snapshotId of the table. Therefore, MTMV can be partitioned based on Paimon tables and perceive changes in data, automatically refreshing partitions mtmv support paimon partition refresh
apache#44419) When using the mvcc table to obtain partition snapshots and other operations, the snapshotId parameter needs to be included
…ad of partitionId (apache#44415) The partition ID of external data sources is meaningless, and some data sources only have partition names, so the return result of partition pruning is replaced with name instead of ID
…44567) Previously, external partition cropping only supported Hive. If you want to support other types of tables, you need to understand the internal processing logic of partition pruning. This PR abstracts the logic of partition pruning, and other tables can be implemented by simply covering a few methods of externalTable [opt](planner) Unified external partition prune interface
…resh and partition pruning (apache#44673) - Add `MvccTable` to represent a table that supports querying specified version data - Add the `MvccSnapshot` interface to store snapshot information of mvcc at a certain moment in time - Add the `MvccSnapshot` parameter to the method of the `MTMVRelatedTableIf `interface to retrieve data of a specified version - Partition pruning related methods combined with the `MvccSnapshot` parameter are used to obtain partition information for a specified version - Load the snapshot information of mvccTable at the beginning of the query plan and store it in StatementContext Unified external table interface supporting partition refresh and partition pruning
Previously, transparent rewriting of the external table could only be done as a whole or without rewriting. Now supports partial partition rewriting and direct lookup of the base table for some partitions. mtmv partition rewrite support external table
In the previous PR, a snapshot of the table was obtained and stored in the statementContext at the beginning of the query. The modification of this PR is to ensure that the same metadata is used during the query process. When calling the relevant interface, snapshot needs to be obtained from statementContext as a parameter and passed to the relevant method Related PR: apache#44911 apache#44673
…the latest data (apache#44911) Problem Summary: - add `PaimonMetadataCacheMgr` in `ExternalMetaCacheMgr` to manage snapshotCache of paimon table - move paimonSchemaCache to PaimonMetadataCacheMgr, and add schemaId as part of key - PaimonExternalTable overrides the methods in ExternalTable and supports partition pruning - PaimonExternalTable implements the MvcTable interface, supporting the retrieval of snapshot data from the cache during queries to avoid cache refreshes that may result in different versions of metadata being used in a single query - MTMVTask retrieves snapshot data of mvccTable before the task starts to avoid cache refresh that may result in different versions of metadata being used in a single refresh task Paimon queries the data in the cache instead of querying the latest data behavior changes of query paimon table: - FE has just started and is query the latest data - Paimon data has changed, Doris is still query the previous data - After the snapshot cache expires, Doris will query the latest data - desc paimon; The schema corresponding to the snapshotId in the snapshot cache is displayed
Previously, when using Iceberg to create MTMV, it was not possible to perceive changes in partition lists and data, so only ```refresh materialized view mv1 complete ```could be used to force full refresh. This PR obtains the partition list of Iceberg, the last update time of the partition, and the latest snapshotId of the table. Therefore, MTMV can be partition based on Iceberg tables and perceive changes in data, automatically refreshing partitions For now, we only support single partition column table and the partition transform must one of hour, day, month or year. Will support Identity transform soon. Issue Number: close #xxx Related PR: #xxx Problem Summary: None
…he#45652) - MTMV allow paimon table has multi partition keys - add case
1. Implement MvccTable interface for IcebertExternalTable 2. IcebergExternalTable overrides the methods in ExternalTable and supports partition pruning 3. Add snapshot cache in IcebergMetadataCache to store IcebergExternalTable partition infos. Issue Number: close #xxx Related PR: #xxx Problem Summary: None
…more test case for iceberg mtmv. (apache#46257) ### What problem does this PR solve? Support show iceberg external table partition. We convert iceberg partition to doris range partition in IcebergExternalTable. This PR add show partition function for IcebergExternalTable, this make it possible to add regression test. Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None
Add more test case for Iceberg mtmv.
…e more readable (apache#47166) - before, paimon and iceberg put snapshotId to MTMVVersionSnapshot ,now change to MTMVSnapshotIdSnapshot - `compatiblePartitions` only consider OlapTable, because other TableType not has history data - Delete constructor methods without id in MTMVVersionSnapshot to avoid misuse
…efore run async mv task (apache#48172) Problem Summary: before this PR, external catalog metadata will be sync when refresh async mv that based on external table. after this PR, remove sync metadata action, but the data in async mv still consistent with query in Doris on external table. metadata cache of external table no longer be refreshed before run async mv task
…resh feature for Hudi external tables. (apache#49956) Problem Summary: Support asynchronous materialized view partition refresh feature for Hudi external tables.
…pache#50979) Problem Summary: related pr: apache#48172 This pr(apache#48172) had changed the code logical of method `beforeMTMVRefresh`, but this pr(apache#49956) added the code back. So we delete this code.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
…mon table as an unpartitioned table (apache#46641) When retrieving data of type Paimon Date in version 0.9 from the system table, the value is an integer and cannot be converted to type Date. This issue has been fixed in Paimon's latest code. This PR downgrades this situation without affecting user data queries
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 39924 ms |
TPC-DS: Total hot run time: 197124 ms |
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
Contributor
Author
|
run p0 |
Contributor
Author
|
run cloud_p0 |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 39673 ms |
TPC-DS: Total hot run time: 196106 ms |
ClickBench: Total hot run time: 31.36 s |
Contributor
Author
|
run cloud_p0 |
morrySnow
approved these changes
Jun 23, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pick:
#43959
#44419
#44415
#44567
#44673
#44998
#45273
#44911
#44726
#45652
#45659
#46257
#46641
#47026
#47166
#48172
#49956
#50979