-
Notifications
You must be signed in to change notification settings - Fork 3k
Register existing tables in Iceberg HiveCatalog #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| anotherTable.newAppend().appendFile(anotherFile).commit(); | ||
|
|
||
| // verify that both tables continue to function independently | ||
| Assert.assertNotEquals(table.currentSnapshot().manifests(), anotherTable.currentSnapshot().manifests()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say that the registered table won't have previous_metadata_location in its table properties. This doesn't seem like an issue to me.
| * @param metadataFileLocation the location of a metadata file | ||
| * @return a Table instance | ||
| */ | ||
| Table registerTable(TableIdentifier identifier, String metadataFileLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, import would be a better name.
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
5413464 to
afd9de1
Compare
afd9de1 to
f1595b0
Compare
|
@aokolnychyi, this looks ready. But before we commit it, I want to suggest an alternative for you to consider. What about instead of adding the This would primarily be an administrator change, so we could expose it through a method in |
|
I also thought about registering tables by giving a pointer to the metadata location and not a specific metadata file. However, my assumption was that a location can be shared by multiple Iceberg tables (which is possible because we have random UUIDs in metadata version file names). So, we need to somehow determine what is the last metadata file for a particular table if the location is shared (which can be done using Table UUID we added recently, but someone has to find that UUID anyway). As always, I am open to considering any alternatives. To avoid changes to the |
That isn't quite what I'm suggesting. I'm suggesting that to do the same thing that I think the UUID would be fine. You'd just have to reload the table from the |
|
I think I am getting it now but there are a couple of questions I want to clarify before updating the PR. Of course, having a way to roll back the entire table state and minimizing changes to the I am not sure we need to create a table and then replace its metadata to simply register a table from a metadata file. If we do so, someone might actually query the table before we swap the metadata location. This snippet creates a table and sets the location correctly from the beginning: In theory, this can be executed directly without the catalog and will be sufficient to create a table form an existing metadata file. As suggested, we can extend |
Why is this? Wouldn't the table use the client pool from the Catalog instance? I think To avoid the issue of a table being available before the metadata is replaced, we should add the create/replace transaction methods to |
|
@rdblue You are right, we can fetch Then table state can be rolled back using these lines: Having said that, I think we can close this PR. What do you think? |
|
@aokolnychyi, sounds good to me if that works for you! I didn't think about using |
This PR allows us to register existing tables in Iceberg
HiveCatalogand resolves #251.