[iceberg] support migrate iceberg table suffering schema evolution #5078

LsomeYeah · 2025-02-13T11:46:42Z

Purpose

Linked issue: close #xxx

In pr #4639 and #4878, we had supported migrating iceberg table managed by hadoop-catalog or hive-catalog to paimon. This pr aims to support migrating the iceberg table which had suffered once or several times schema evolution.

Paimon stores the schema-id in each DataFileMeta for reading data files which had suffered schema evolution, so we extract the schema-id used by each iceberg data file and record it in the corresponding paimon DataFileMeta, and this makes paimon can handle the schema evolution case.

Tests

IcebergMigrateTest#testDeleteColumn
IcebergMigrateTest#testRenameColumn
IcebergMigrateTest#testAddColumn
IcebergMigrateTest#testReorderColumn
IcebergMigrateTest#testUpdateColumn
IcebergMigrateTest#testMigrateWithRandomIcebergEvolution

API and Format

Documentation

…g to paimon # Conflicts: # paimon-core/src/main/java/org/apache/paimon/iceberg/metadata/IcebergDataField.java

…g table in hive is external

…mon table to source table if migrating success [core] alter access permisssion of getDataTypeFromType()

# Conflicts: # paimon-flink/paimon-flink-common/src/main/resources/META-INF/services/org.apache.paimon.factories.Factory # Conflicts: # paimon-flink/paimon-flink-common/src/main/resources/META-INF/services/org.apache.paimon.factories.Factory

…IcebergTableProcedureITCase

…iding dependency conflicts

wwj6591812 · 2025-02-13T13:44:08Z

paimon-core/src/main/java/org/apache/paimon/iceberg/manifest/IcebergDataFileMeta.java

    private final InternalMap lowerBounds;
    private final InternalMap upperBounds;

+    // only used for iceberg migrate


I think it is better : "only used for migrate iceberg table to paimon with schema evolution".

Before migration, we do not check whether the Iceberg table has undergone schema evolution. schemaId will be used in all cases for iceberg migration.

wwj6591812 · 2025-02-13T13:47:17Z

paimon-core/src/main/java/org/apache/paimon/iceberg/migrate/IcebergMigrator.java

-        // get iceberg current schema
-        IcebergSchema icebergSchema =
-                icebergMetadata.schemas().get(icebergMetadata.currentSchemaId());
+    public List<TableSchema> icebergSchemasToPaimonSchemas(IcebergMetadata icebergMetadata) {


wwj6591812 · 2025-02-13T13:57:14Z

paimon-core/src/main/java/org/apache/paimon/iceberg/migrate/IcebergMigrator.java

        }
    }

+    public long getSchemaIdFromIcebergManifestFile(Path manifestPath, FileIO fileIO) {


wwj6591812 · 2025-02-13T14:07:14Z

...ink/paimon-flink-common/src/main/java/org/apache/paimon/flink/utils/TableMigrationUtils.java

+            String targetDatabase,
+            String targetTableName,
+            Integer parallelism,
+            Map<String, String> options,


no need of arg options

wwj6591812 · 2025-02-13T14:08:24Z

...paimon-flink-1.18/src/main/java/org/apache/paimon/flink/procedure/MigrateTableProcedure.java

-                        ParameterUtils.parseCommaSeparatedKeyValues(properties))
-                .executeMigrate();
+                        ParameterUtils.parseCommaSeparatedKeyValues(properties));
+        LOG.info("create migrator success.");


I think it is better : "create migrator success and begin executeMigrate."

wwj6591812 · 2025-02-13T14:10:37Z

...mon-flink-common/src/main/java/org/apache/paimon/flink/action/MigrateIcebergTableAction.java

+            Map<String, String> catalogConfig,
+            String icebergProperties,
+            String tableProperties,
+            Integer parallelism) {


add @nullable before parallelism

wwj6591812 · 2025-02-13T14:11:50Z

...ink-common/src/main/java/org/apache/paimon/flink/procedure/MigrateIcebergTableProcedure.java

@@ -0,0 +1,90 @@
+/*


Should we also modify the doc of these action and procedure in this PR?

Sorry, the content about hive-catalog and procedure is contained in #4878. This pr should not contained commits that had been merged. I'll close this pr and create a new pr. And the doc of iceberg migration will be submitted later in another separate pr.

wwj6591812 · 2025-02-13T14:16:05Z

...talog/src/main/java/org/apache/paimon/iceberg/migrate/IcebergMigrateHiveMetadataFactory.java

+public class IcebergMigrateHiveMetadataFactory implements IcebergMigrateMetadataFactory {
+    @Override
+    public String identifier() {
+        return IcebergOptions.StorageType.HIVE_CATALOG.toString() + "_migrate";


no need call toString()

wwj6591812 · 2025-02-13T14:16:05Z

...talog/src/main/java/org/apache/paimon/iceberg/migrate/IcebergMigrateHiveMetadataFactory.java

+
+/** Factory to create {@link IcebergMigrateHiveMetadata}. */
+public class IcebergMigrateHiveMetadataFactory implements IcebergMigrateMetadataFactory {
+    @Override


add a blank line

wwj6591812 · 2025-02-13T14:16:11Z

...hive-catalog/src/main/java/org/apache/paimon/iceberg/migrate/IcebergMigrateHiveMetadata.java

+public class IcebergMigrateHiveMetadata implements IcebergMigrateMetadata {
+    private static final Logger LOG = LoggerFactory.getLogger(IcebergMigrateHiveMetadata.class);
+
+    public static final String TABLE_TYPE_PROP = "table_type";


LsomeYeah added 13 commits February 13, 2025 11:12

[iceberg] Introduce feature and IT cases to migrate table from iceber…

2f9f2a0

…g to paimon # Conflicts: # paimon-core/src/main/java/org/apache/paimon/iceberg/metadata/IcebergDataField.java

[core][hive] delete iceberg physical data using fileIO because iceber…

9e84b46

…g table in hive is external

[procedure] remove 'target_table', delete source table and rename pai…

09df7c5

…mon table to source table if migrating success [core] alter access permisssion of getDataTypeFromType()

[pom][hive] remove useless dependency

1e7937e

[procedure] small fix

1380215

[procedure][fix] change the port num for TestHiveMetastore in Migrate…

2727667

…IcebergTableProcedureITCase

[procedure][test] make testMigrateIcebergTableProcedure a random test

68fee7f

[core][hive] remove FileIO argument in constructor

6b84657

[core][wip] support iceberg schema evolution

ef138c3

[core] delete useless segments

18247a2

[core] change the way of getting schema id from manifest file for avo…

c42678d

…iding dependency conflicts

resolve rebase conflicts

f08bcaa

wwj6591812 reviewed Feb 13, 2025

View reviewed changes

LsomeYeah closed this Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[iceberg] support migrate iceberg table suffering schema evolution #5078

[iceberg] support migrate iceberg table suffering schema evolution #5078

Uh oh!

LsomeYeah commented Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

LsomeYeah Feb 14, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

LsomeYeah Feb 14, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

wwj6591812 Feb 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[iceberg] support migrate iceberg table suffering schema evolution #5078

[iceberg] support migrate iceberg table suffering schema evolution #5078

Uh oh!

Conversation

LsomeYeah commented Feb 13, 2025

Purpose

Tests

API and Format

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants