-
Notifications
You must be signed in to change notification settings - Fork 3k
Catalogs: Add support for unique table locations via catalog property #12892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@RussellSpitzer @kbendick @rdblue could you take a look? What do you think? |
|
+1 to having a catalog property for unique table locations |
|
I think this makes a lot of sense but I'm not sure if this should be a client side decision. I'd like us to explore the idea of "owned locations" for tables and talk more about catalog responsibilities. I think as a nice "best effort" feature this is a good thing to do, but I really think the Catalog needs to own/manage where tables are allowed to be located. |
|
Brief notes from the Catalog sync: Common sentiment was that this is good to get in, REST Catalogs can still do what they want (including ignoring client generated unique paths). More reviewers should be incoming as well |
|
Really looking forward having this unique table locations feature in Iceberg. |
|
Is there anything I should add to the PR? |
|
@nastra, I see you contributed a lot to catalogs, could you please review the PR? |
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java
Outdated
Show resolved
Hide resolved
|
Thank you for the review! I'll commit all the changes separately and rebase them into a single commit when it's ready because I have some questions. |
3364136 to
ea47451
Compare
|
@nastra do you have more comments? |
aws/src/test/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
aws/src/test/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueLocation.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueLocation.java
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueLocation.java
Outdated
Show resolved
Hide resolved
|
This PR is quite big, I might be able to review it next week. Thanks, Peter |
aws/src/integration/java/org/apache/iceberg/aws/glue/GlueTestBase.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java
Outdated
Show resolved
Hide resolved
04ca57f to
3584048
Compare
|
Thank you for the review. I made changes for all suggestions and comments, added |
aws/src/main/java/org/apache/iceberg/aws/dynamodb/DynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
| CatalogProperties.UNIQUE_TABLE_LOCATION, | ||
| "true")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we hijack all of the tests to use unique location? This seems problematic for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. I don’t see an easy way to avoid this. To make UNIQUE_TABLE_LOCATION configurable, I would either have to stop using TestBaseWithCatalog or update all of its implementations, which feels disproportionate for this change.
Longer term, I think it would make sense for UNIQUE_TABLE_LOCATION to default to true, similar to NessieCatalog, because non-unique table locations feel more like an issue than a rarely used feature to me.
For now, I’ve disabled the tests related to unique table locations for REST catalogs.
|
Sorry for the late review @davseitsev. I had other tasks to do |
|
@davseitsev: What’s the current status of this PR? I noticed it on my to-do list, but it’s been quite a while since I have seen this. 😢 |
|
Hi @pvary, thanks for checking in. I also dropped the REST catalog tests for unique-table-location after your review, as RESTServerExtension is not easily configurable. This PR doesn’t wire unique-table-location through the REST catalog yet, and I think that deserves a separate discussion and follow-up PR, so excluding those tests for now seemed the least confusing option. So as long as you don’t have any further comments, I consider this ready for another look / for merging. |
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
| return false; | ||
| } | ||
|
|
||
| protected boolean supportsTableRename() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this flag needed? seems like an unrelated change to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s related to how the base CatalogTests behave for catalogs that don't support rename, like TestBigQueryCatalog.
Once I added my tests into CatalogTests, implementations that don't support table rename started failing them. The supportsTableRename() flag is the hook lets such catalogs opt out of the rename-related tests.
The base class wraps those tests in assumeThat(supportsTableRename()), and TestBigQueryCatalog overrides supportsTableRename() to return false instead of overriding and disabling each test with @Disabled("BigQuery Metastore does not support rename tables").
That's also consistent with the new UNIQUE_TABLE_LOCATION property being defined as "only relevant for catalogs which support rename".
If you’d prefer to keep this PR strictly focused, I can drop supportsTableRename() from CatalogTests for now and reintroduce the explicit @Disabled("BigQuery Metastore does not support rename tables") methods in TestBigQueryCatalog, and override+disable my tests in it.
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java
Outdated
Show resolved
Hide resolved
|
Resolved conflicts with the latest master |
|
Merged latest master |
It's revival and extension of #2850, regards to @sshkvar.
This PR introduces a new catalog property,
unique-table-location, which enables generating unique table locations for catalogs that support table rename operations. The feature is disabled by default to preserve current behavior.When enabled, a unique suffix is added to the table path, ensuring that each table has its own dedicated storage location including scenarios involving table renames. This addresses a key issue where, after renaming a table and creating a new one with the original name, both tables would otherwise share the same location. Such overlap can lead to:
DeleteOrphanFilesSparkAction, which may inadvertently delete files belonging to other tables in the shared location.NessieCatalog already supports it, but it's not configurable:
iceberg/nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
Line 258 in ee2ffb4
Such feature was added to Trino a while ago trinodb/trino#6063 and related discussion trinodb/trino#5632 (comment)