Flink: Support creating table and altering table in Flink SQL #1393

JingsongLi · 2020-08-27T05:40:19Z

Create table: Only support identity partition, wait for Flink SQL support hidden partition, but can load all Iceberg tables.
Alter table:
- Currently, Flink SQL only support altering table properties, not support altering schema and partition.
- Also support location, current-snapshot-id and cherry-pick-snapshot-id like Spark.

openinx · 2020-08-27T09:04:58Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+
+    Schema icebergSchema = FlinkSchemaUtil.convert(table.getSchema());
+    PartitionSpec spec = toPartitionSpec(((CatalogTable) table).getPartitionKeys(), icebergSchema);
+    Map<String, String> options = Maps.newHashMap(table.getOptions());


nit: better to use ImmutableMap ?

openinx · 2020-08-27T09:10:10Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    Table icebergTable = getIcebergTable(tablePath);
+    CatalogTable table = toCatalogTable(icebergTable);
+
+    // Currently, Flink SQL only support altering table properties.


What's the reason that we could not support adding /removing/renaming column ?

No Flink DLL to add/removing/renaming column...

I should also note that support for adding/removing/renaming columns cannot be done by comparing CatalogTable instances, unless the Flink schema contains Iceberg column IDs.

The problem is clear when you consider a simple example:

Iceberg table schema: id bigint, a float, b float

Flink table schema: id bigint, x float, y float

There are two ways to get the Flink schema: rename a -> x and b -> y, or drop a, drop b, add x, add y. Guessing which one was intended by the user is not okay because it would corrupt data. If the values from a are read when projecting x after a was actually dropped, then this is a serious correctness bug.

Also note that there are some transformations that can't be detected. For example, drop a then add a. The result should be that all values of column a are discarded. This happens when the wrong data was written to a column but the column is still needed for newer data.

Good point.
If there is only one operation in a single call, it seems feasible. And actually, in the SQL DDLs, only a single type is in a single SQL.
But yes, this API is unclear, if we look at it from the API level alone, there are too many possibilities...
I'll add comments in the code.

openinx · 2020-08-27T09:32:18Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+
+    oldOptions.keySet().forEach(k -> {
+      if (!newTable.getOptions().containsKey(k)) {
+        setProperties.put(k, null);


Q: Does this align with the flink sql semantics ?
I saw the document said: "Set one or more properties in the specified table. If a particular property is already set in the table, override the old value with the new one."

ALTER TABLE [catalog_name.][db_name.]table_name SET (key1=val1, key2=val2, ...)

For the existing key-values (in old table ) which don't appear in the new table, should we remove them from old table ? ( The document did not describe this case clearly, just for confirmation).

You can take a look to tests.
The existing key-values will be keep. The newTable.getOptions() is not just from alter DDL. It is already merged with old options.
Actually, there is not DDL to delete key-value too...

It is already merged with old options.

If the new options are already merged with old options, so for the key in old options, shouldn't it be always in new options ? Seems there's no reason to add this sentence here ?

What do you mean?
The new options is new all options.
For example:
old: {'j' = 'am', 'p' = 'an'}
alter: ALTER TABLE t UNSET TBLPROPERTIES ('j')
newTable.getOptions() will be: {'p' = 'an'}

I'll add a test for unsetting PROPERTIES.

I think it is okay to diff the property sets like this, but it seems like it would be easier not to. Right now, Flink has to apply the changes, then this code diffs the property sets, then Iceberg will re-apply the changes. In addition, this model doesn't work for schema updates, as I noted above.

But properties updates does not have column IDs. As long as the last is the same.
Sorry, I don't get your point.

setProperties.put(k, null) ? The javadoc from Map said :

* @throws NullPointerException if the specified key or value is null * and this map does not permit null keys or values * @throws IllegalArgumentException if some property of the specified key * or value prevents it from being stored in this map */ V put(K key, V value);

My point is that there isn't a correctness problem so this is okay. But, this causes Flink to do much more work because it has to apply changes from SQL, then recover those changes by comparing property maps, and pass the changes to Iceberg so that Iceberg can apply the changes. It is easier to pass the changes directly to Iceberg if the Flink API can be updated to support it.

Got it, I think we can have a try in Flink.

openinx · 2020-08-27T09:59:42Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    TableSchema schema = FlinkSchemaUtil.toSchema(FlinkSchemaUtil.convert(table.schema()));
+    List<String> partitionKeys = toPartitionKeys(table.spec(), table.schema());
+
+    // NOTE: We can not create a IcebergCatalogTable, because Flink optimizer may use CatalogTableImpl to copy a new


nit: we may need to change this comment ? Actually, I did not found any class named IcebergCatalogTable, or I misunderstood something ?

I mean, we cannot create a IcebergCatalogTable class extends CatalogTable to carry iceberg Table Object.

Got it, maybe could write this comment more clear.

rdblue · 2020-08-28T22:40:04Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    // NOTE: We can not create a IcebergCatalogTable, because Flink optimizer may use CatalogTableImpl to copy a new
+    // catalog table.
+    // Let's re-loading table from Iceberg catalog when creating source/sink operators.
+    return new CatalogTableImpl(schema, partitionKeys, table.properties(), null);


What is null? Could you add a comment?

rdblue · 2020-08-28T22:44:59Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

-      // catalog table.
-      // Let's re-loading table from Iceberg catalog when creating source/sink operators.
-      return new CatalogTableImpl(tableSchema, table.properties(), null);
+  private Table getIcebergTable(ObjectPath tablePath) throws TableNotExistException {


get is not a very specific verb. I usually prefer load for cases like this because it more accurately describes what is happening.

I agree on the get, but it does align with the interface's getTable method which this is used in. However, it also calls out to load so the argument that the iceberg specific stuff might stick with load vs get is still valid.

OK, it is an iceberg table, change it to load.

rdblue · 2020-08-28T22:55:32Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+  }
+
+  private static void validateFlinkTable(CatalogBaseTable table) {
+    Preconditions.checkArgument(table instanceof CatalogTable, "The Table should be a CatalogTable.");


Is there a case where CatalogBaseTable doesn't implement CatalogTable?

CatalogTable is a subinterface that inherits from CatalogBaseTable. So definitely, yes.

See the java docs on the current CatalogBaseTable in Flink:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/catalog/CatalogBaseTable.html

It is CatalogView, Iceberg catalog does not support views, so if there is a view, should be a bug...

rdblue · 2020-08-28T22:56:23Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+      throw new UnsupportedOperationException("Creating table with watermark specs is not supported yet.");
+    }
+
+    if (schema.getPrimaryKey().isPresent()) {


Is this something we should add to Iceberg for Flink use cases? What does Flink use the primary key for?

I found this which might answer your question: https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API

In particular, here are the proposed changes:

Proposed Changes We suggest to introduce the concept of primary key constraint as a hint for FLINK to leverage for optimizations. Primary key constraints tell that a column or a set of columns of a table or a view are unique and they do not contain null. Neither of columns in a primary can be nullable. Primary key therefore uniquely identify a row in a table.

So it sounds just like an RDBMS primary key.

Note however, that even in the FLIP (which is just the proposal and not necessarily the finished product), it does state that there's no planned enforcement on the PK. It's up to the user to ensure that the PK is non-null and unique.

Primary key validity checks SQL standard specifies that a constraint can either be ENFORCED or NOT ENFORCED. This controls if the constraint checks are performed on the incoming/outgoing data. Flink does not own the data therefore the only mode we want to support is the NOT ENFORCED mode. Its up to the user to ensure that the query enforces key integrity.

So I agree here that throwing might be the most useful option and that there's likely nothing on the iceberg side to be added to enforce this as Flink doesn't enforce it either. In an entirely streaming setting, ensuring unique keys would be rather difficult and so to me it somewhat sounds like the PK is just more metadata that could very well be in TBLPROPERTIES.

But a more experienced Flink SQL user than myself might have more to say on the matter. I've never attempted to enforce a PK when using Flink SQL. Sounds like the work to do so would involve custom operators etc.

TLDR: The Primary Key is just a constraint, which is currently part of Flink's Table spec but goes unenforced and is up to the user. It does not appear as though the PK info is supported in any UpsertSinks etc, though that may be discussed / planned in the future. Support in the DDL for Primary Key constraints is relatively new (Flink 1.11 / current, with support in the API coming in at Flink 1.10).

Thanks @kbendick .
At present, in Flink, PK is mainly used to process CDC stream.

For example, if user access a Kafka source of a CDC stream, the user can define a primary key. In this way, the downstream can perform efficient dynamic table / static table conversion (restore to the original static table) according to a certain primary key.

For example, when the stream data (CDC) is written into a JDBC database, the user can define primary key. In this way, Flink can insert the data into the database by using the upsert writing way.

I think, If iceberg supports CDC native processing in the future, we may be able to use it.

rdblue · 2020-08-28T22:58:06Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+        // Not created by Flink SQL.
+        // For compatibility with iceberg tables, return empty.
+        // TODO modify this after Flink support partition transform.
+        return Collections.emptyList();


All tables with any partition transform other than identity appear to be unpartitioned? Why not return all of the identity fields at least?

To me, it seems like adding all of the identity fields (but not the transformed fields) would likely be incorrect. Although returning an empty list when the table is partitioned seems like a possible correctness bug to me too.

Should we consider throwing an exception in this case instead until such a time that Flink supports partition transforms?

+1 to likely be incorrect.
But I don't want to throw an exception because it can read the existing iceberg table.
Actually, the returned partition keys are useless, except that Flink can show users the meta information of the table.

All partition operations are directly delegated to specific source / sink, so Flink does not need to see partition information.
I tend to support it so that we can read existing Iceberg tables.

I was thinking that since operations are delegated to Iceberg, correctness is not an issue. It would be nice to show users which columns are partition columns so they can see which ones are good candidates for query predicates. I don't think this is a blocker, though.

I see, maybe we can expose these information in properties.

rdblue · 2020-08-28T22:59:33Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+      table.manageSnapshots().cherrypick(newSnapshotId).commit();
+    }
+
+    Transaction transaction = table.newTransaction();


Should we do the operations above this point in the transaction as well? That seems reasonable to me. I'm not sure why we don't in other places.

Looks like the manageSnapshots is unsupported in TransactionTable.

That explains it. Thanks!

rdblue · 2020-08-28T23:00:44Z

flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogTable.java

+    Assume.assumeFalse("HadoopCatalog does not support relocate table", isHadoopCatalog);
+
+    tEnv.executeSql("CREATE TABLE tl(id BIGINT)");
+    tEnv.executeSql("ALTER TABLE tl SET('location'='/tmp/location')");


Does Flink support the LOCATION clause, or just the table property?

EDIT: In the CREATE TABLE documentation, I cannot find a reference to LOCATION outside of table properties: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html#create-table

However, looking forward at the Flink 1.12 prerelease docs, it appears that there's added support for a Hive Dialect which supports LOCATION clause.

https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#create-1

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [ [ROW FORMAT row_format] [STORED AS file_format] ] [LOCATION fs_path] [TBLPROPERTIES (property_name=property_value, ...)]

Yes, just the table property...
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/filesystem.html
Flink Filesystem connector also support location using path property.

The Hive dialect should only works for Hive tables. There are lots of tricky things..

rdblue · 2020-08-28T23:01:44Z

This looks mostly good to me. I just have a few questions.

kbendick · 2020-08-30T02:00:39Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+      if ("location".equalsIgnoreCase(entry.getKey())) {
+        location = entry.getValue();
+      } else {
+        properties.put(entry.getKey(), entry.getValue());


Should location still be placed in the table properties or will that cause some kind of conflict / error?

I think there is no conflict/error, but I think it is good to reduce duplicate storage, cause iceberg has saved this information.

I'd prefer not to duplicate it in table properties. Then we would have to worry about keeping the two in sync.

kbendick · 2020-08-30T02:12:33Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+      throw new UnsupportedOperationException("Creating table with watermark specs is not supported yet.");
+    }
+
+    if (schema.getPrimaryKey().isPresent()) {


I found this which might answer your question: https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API

In particular, here are the proposed changes:

Proposed Changes We suggest to introduce the concept of primary key constraint as a hint for FLINK to leverage for optimizations. Primary key constraints tell that a column or a set of columns of a table or a view are unique and they do not contain null. Neither of columns in a primary can be nullable. Primary key therefore uniquely identify a row in a table.

So it sounds just like an RDBMS primary key.

Note however, that even in the FLIP (which is just the proposal and not necessarily the finished product), it does state that there's no planned enforcement on the PK. It's up to the user to ensure that the PK is non-null and unique.

Primary key validity checks SQL standard specifies that a constraint can either be ENFORCED or NOT ENFORCED. This controls if the constraint checks are performed on the incoming/outgoing data. Flink does not own the data therefore the only mode we want to support is the NOT ENFORCED mode. Its up to the user to ensure that the query enforces key integrity.

So I agree here that throwing might be the most useful option and that there's likely nothing on the iceberg side to be added to enforce this as Flink doesn't enforce it either. In an entirely streaming setting, ensuring unique keys would be rather difficult and so to me it somewhat sounds like the PK is just more metadata that could very well be in TBLPROPERTIES.

But a more experienced Flink SQL user than myself might have more to say on the matter. I've never attempted to enforce a PK when using Flink SQL. Sounds like the work to do so would involve custom operators etc.

TLDR: The Primary Key is just a constraint, which is currently part of Flink's Table spec but goes unenforced and is up to the user. It does not appear as though the PK info is supported in any UpsertSinks etc, though that may be discussed / planned in the future. Support in the DDL for Primary Key constraints is relatively new (Flink 1.11 / current, with support in the API coming in at Flink 1.10).

kbendick · 2020-08-30T02:23:58Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+  }
+
+  private static void validateFlinkTable(CatalogBaseTable table) {
+    Preconditions.checkArgument(table instanceof CatalogTable, "The Table should be a CatalogTable.");


CatalogTable is a subinterface that inherits from CatalogBaseTable. So definitely, yes.

See the java docs on the current CatalogBaseTable in Flink:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/catalog/CatalogBaseTable.html

kbendick · 2020-08-30T02:26:27Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+        // Not created by Flink SQL.
+        // For compatibility with iceberg tables, return empty.
+        // TODO modify this after Flink support partition transform.
+        return Collections.emptyList();


To me, it seems like adding all of the identity fields (but not the transformed fields) would likely be incorrect. Although returning an empty list when the table is partitioned seems like a possible correctness bug to me too.

Should we consider throwing an exception in this case instead until such a time that Flink supports partition transforms?

kbendick · 2020-08-30T02:27:32Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    // don't allow setting the snapshot and picking a commit at the same time because order is ambiguous and choosing
+    // one order leads to different results
+    Preconditions.checkArgument(setSnapshotId == null || pickSnapshotId == null,
+        "Cannot set the current the current snapshot ID and cherry-pick snapshot changes");


Nit: Duplication of the words the current in the Preconditions string. It currently reads Cannot set the current the current.

JingsongLi · 2020-08-31T02:21:29Z

Thanks a lot for your review~ @rdblue and @kbendick I'll take a look~

openinx · 2020-08-31T08:45:02Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

  }

+  private CatalogTable toCatalogTable(Table table) {
+    TableSchema schema = FlinkSchemaUtil.toSchema(FlinkSchemaUtil.convert(table.schema()));


nit: how about making the iceberg schema -> flink TableSchema conversion to be a static method inside FlinkSchemaUtil ? The table sink pr #1348 will also depend on this static method (https://github.com/apache/iceberg/pull/1348/files#diff-0ad7dfff9cfa32fbb760796d976fd650R50).

static looks good to me, but I think we can keep it in this class.

openinx · 2020-08-31T08:47:20Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    TableSchema schema = FlinkSchemaUtil.toSchema(FlinkSchemaUtil.convert(table.schema()));
+    List<String> partitionKeys = toPartitionKeys(table.spec(), table.schema());
+
+    // NOTE: We can not create a IcebergCatalogTable, because Flink optimizer may use CatalogTableImpl to copy a new


Got it, maybe could write this comment more clear.

openinx · 2020-08-31T08:53:36Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+      throw new UnsupportedOperationException("Creating table with primary key is not supported yet.");
+    }
+  }
+


Do we need to add a TODO indicating that we flink only support identity partition now but will support hidden column future ?

Already have todo in toPartitionKeys.

openinx · 2020-08-31T09:04:26Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+
+    oldOptions.keySet().forEach(k -> {
+      if (!newTable.getOptions().containsKey(k)) {
+        setProperties.put(k, null);


setProperties.put(k, null) ? The javadoc from Map said :

* @throws NullPointerException if the specified key or value is null * and this map does not permit null keys or values * @throws IllegalArgumentException if some property of the specified key * or value prevents it from being stored in this map */ V put(K key, V value);

openinx · 2020-08-31T09:05:58Z

flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java

+    if (!setProperties.isEmpty()) {
+      UpdateProperties updateProperties = transaction.updateProperties();
+      setProperties.forEach((k, v) -> {
+        if (v == null) {


The v should never be null in HashMap ?

HashMap allows nulls:

* Hash table based implementation of the <tt>Map</tt> interface. This * implementation provides all of the optional map operations, and permits * <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt> * class is roughly equivalent to <tt>Hashtable</tt>, except that it is * unsynchronized and permits nulls.)

rdblue · 2020-09-02T22:18:42Z

+1

Thank @JingsongLi, I merged this to master. And thanks to @openinx and @kbendick for reviewing!

Flink: Support creating table and altering table in Flink SQL

b003b29

probot-autolabeler bot added the flink label Aug 27, 2020

Fix case

87f8c79

openinx reviewed Aug 27, 2020

View reviewed changes

JingsongLi added 2 commits August 28, 2020 16:23

Address comments

5adac5c

checkstyle

2d1aed6

rdblue reviewed Aug 28, 2020

View reviewed changes

kbendick reviewed Aug 30, 2020

View reviewed changes

Address comments

6eb6a5c

openinx reviewed Aug 31, 2020

View reviewed changes

JingsongLi added 2 commits August 31, 2020 17:27

Address comment

e1dcb7f

checkstyles

ac06e52

openinx mentioned this pull request Sep 2, 2020

Flink: Support table sink. #1348

Merged

rdblue merged commit d4dee8e into apache:master Sep 2, 2020

JingsongLi deleted the ddl branch November 5, 2020 09:40

Flink: Support creating table and altering table in Flink SQL #1393

Flink: Support creating table and altering table in Flink SQL #1393

Uh oh!

Conversation

JingsongLi commented Aug 27, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Aug 31, 2020 •

edited

Loading

JingsongLi Aug 31, 2020 •

edited

Loading

JingsongLi Aug 31, 2020 •

edited

Loading

JingsongLi Aug 31, 2020 •

edited

Loading

kbendick Aug 30, 2020 •

edited

Loading