-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Implement Catalogs.createTable and Catalogs.dropTable #1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC: @rdblue, @massdosage - another piece of Hive integration |
core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
Outdated
Show resolved
Hide resolved
| * @param location a path URI (e.g. hdfs:///warehouse/my_table) | ||
| * @return true if the table was dropped | ||
| */ | ||
| public boolean dropTable(String location) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wasn't implemented before because it is not part of the Tables API, but now that this is the only implementation, maybe we should consider just deprecating the Tables API and making HadoopTables a stand-alone class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would merit another discussion, and another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
| public static Table createTable(Configuration conf, Properties props) { | ||
| String schemaString = props.getProperty(InputFormatConfig.TABLE_SCHEMA); | ||
| Preconditions.checkNotNull(schemaString, "Table schema not set"); | ||
| Schema schema = SchemaParser.fromJson(props.getProperty(InputFormatConfig.TABLE_SCHEMA)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the reason why the examples specify a table property. Can we instead use Hive schema DDL and convert it to Iceberg? Similarly, can we get the identity partition fields that way to create a spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the serialized schema for the Catalogs interface. Other systems like Impala, Presto, etc. might want to use it as well.
I would like to tackle the Hive schema DDL in another PR. The data is available in HiveIcebergSerDe.initialize in a somewhat convoluted way. I would like to get it there and convert it to the Iceberg Schema string. From there I would only push the Iceberg related stuff down further.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to do this in a separate PR. I just really don't want to require setting properties with JSON schema or spec representations as the way to use Iceberg. It's okay for a way to customize if there isn't syntax, but normal cases should just use DDL.
| } | ||
| } | ||
|
|
||
| Optional<Catalog> catalog = loadCatalog(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat out of scope: We might want to build a Catalog for this logic so that this class can avoid loading and checking the catalog in every method. The catalog would get created with the configuration and handle this delegation internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveCatalog has a cache, but this might be useful for other Catalogs as well.
This seems like a good idea to pursue, but I do not promise anything here, as I have too much on my plate currently
|
Thanks, @pvary! Mostly looks good, but I'd like to fix the |
…ion. Javadoc is revisited
| lastMetadata = ops.current(); | ||
| } | ||
| } else { | ||
| throw new NoSuchTableException("Table does not exist at location: %s, so it can not be dropped", location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have caught this yesterday, but shouldn't this return false instead of throwing the exception? That's what all the other drop methods do. If the table doesn't exist, it isn't an exceptional case. It just returns false to signal that it nothing needed to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have caught this yesterday, but shouldn't this return
falseinstead of throwing the exception? That's what all the otherdropmethods do. If the table doesn't exist, it isn't an exceptional case. It just returns false to signal that it nothing needed to be done.
Good point! Finding it now is better than finding it after pushing the PR 😄
Fixed.
|
Thanks for the fix! I merged this. |
If tools want to implement drop and create table operations using the Catalogs they need a common interface to accessing the functionality.
For example these methods will be used by HiveMetaHook to allow Iceberg table creation through Hive SQL commands
The patch includes 2 commits: