Skip to content

Conversation

@davseitsev
Copy link

It's revival and extension of #2850, regards to @sshkvar.

This PR introduces a new catalog property, unique-table-location, which enables generating unique table locations for catalogs that support table rename operations. The feature is disabled by default to preserve current behavior.

When enabled, a unique suffix is added to the table path, ensuring that each table has its own dedicated storage location including scenarios involving table renames. This addresses a key issue where, after renaming a table and creating a new one with the original name, both tables would otherwise share the same location. Such overlap can lead to:

  • Data loss during the DeleteOrphanFilesSparkAction, which may inadvertently delete files belonging to other tables in the shared location.
  • Difficulties in analyzing table-specific storage costs, as storage usage cannot be cleanly attributed to individual tables.
  • Inability to apply path-based rules such as S3 Intelligent-Tiering, Lifecycle Rules, or fine-grained permissions, which depend on isolated storage paths.

NessieCatalog already supports it, but it's not configurable:

return location + "_" + UUID.randomUUID();

Such feature was added to Trino a while ago trinodb/trino#6063 and related discussion trinodb/trino#5632 (comment)

@davseitsev
Copy link
Author

@RussellSpitzer @kbendick @rdblue could you take a look? What do you think?

@mrcnc
Copy link
Contributor

mrcnc commented May 6, 2025

+1 to having a catalog property for unique table locations

@RussellSpitzer
Copy link
Member

I think this makes a lot of sense but I'm not sure if this should be a client side decision. I'd like us to explore the idea of "owned locations" for tables and talk more about catalog responsibilities. I think as a nice "best effort" feature this is a good thing to do, but I really think the Catalog needs to own/manage where tables are allowed to be located.

@RussellSpitzer
Copy link
Member

Brief notes from the Catalog sync: Common sentiment was that this is good to get in, REST Catalogs can still do what they want (including ignoring client generated unique paths). More reviewers should be incoming as well

@kongul
Copy link

kongul commented May 19, 2025

Really looking forward having this unique table locations feature in Iceberg.
Business analysts in our company got used to renaming tables a lot

@davseitsev
Copy link
Author

Is there anything I should add to the PR?

@davseitsev
Copy link
Author

@nastra, I see you contributed a lot to catalogs, could you please review the PR?

@davseitsev
Copy link
Author

Thank you for the review! I'll commit all the changes separately and rebase them into a single commit when it's ready because I have some questions.

@davseitsev davseitsev force-pushed the main branch 2 times, most recently from 3364136 to ea47451 Compare August 10, 2025 18:04
@github-actions github-actions bot added the GCP label Aug 10, 2025
@davseitsev
Copy link
Author

@nastra do you have more comments?
Also I'm wandering if it makes sense to restrict table creation in existing directory if the user specifies custom location. Maybe it makes sense to validate it if unique-table-location=true

@pvary
Copy link
Contributor

pvary commented Aug 12, 2025

This PR is quite big, I might be able to review it next week. Thanks, Peter

@davseitsev davseitsev force-pushed the main branch 2 times, most recently from 04ca57f to 3584048 Compare August 30, 2025 20:07
@davseitsev
Copy link
Author

Thank you for the review. I made changes for all suggestions and comments, added testDropDoesntCorruptTable test to CatalogTests

Comment on lines 73 to 74
CatalogProperties.UNIQUE_TABLE_LOCATION,
"true"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we hijack all of the tests to use unique location? This seems problematic for me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. I don’t see an easy way to avoid this. To make UNIQUE_TABLE_LOCATION configurable, I would either have to stop using TestBaseWithCatalog or update all of its implementations, which feels disproportionate for this change.

Longer term, I think it would make sense for UNIQUE_TABLE_LOCATION to default to true, similar to NessieCatalog, because non-unique table locations feel more like an issue than a rarely used feature to me.

For now, I’ve disabled the tests related to unique table locations for REST catalogs.

@pvary
Copy link
Contributor

pvary commented Oct 14, 2025

Sorry for the late review @davseitsev. I had other tasks to do

@pvary
Copy link
Contributor

pvary commented Nov 26, 2025

@davseitsev: What’s the current status of this PR? I noticed it on my to-do list, but it’s been quite a while since I have seen this. 😢

@davseitsev
Copy link
Author

Hi @pvary, thanks for checking in.
I’ve just pushed the last small changes, and from my side the PR is now up to date. The production changes are relatively small; most of the diff is in tests (CatalogTests and Spark tests), and I’ve addressed all comments from you and @nastra.

I also dropped the REST catalog tests for unique-table-location after your review, as RESTServerExtension is not easily configurable. This PR doesn’t wire unique-table-location through the REST catalog yet, and I think that deserves a separate discussion and follow-up PR, so excluding those tests for now seemed the least confusing option.

So as long as you don’t have any further comments, I consider this ready for another look / for merging.

return false;
}

protected boolean supportsTableRename() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this flag needed? seems like an unrelated change to me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s related to how the base CatalogTests behave for catalogs that don't support rename, like TestBigQueryCatalog.

Once I added my tests into CatalogTests, implementations that don't support table rename started failing them. The supportsTableRename() flag is the hook lets such catalogs opt out of the rename-related tests.
The base class wraps those tests in assumeThat(supportsTableRename()), and TestBigQueryCatalog overrides supportsTableRename() to return false instead of overriding and disabling each test with @Disabled("BigQuery Metastore does not support rename tables").

That's also consistent with the new UNIQUE_TABLE_LOCATION property being defined as "only relevant for catalogs which support rename".

If you’d prefer to keep this PR strictly focused, I can drop supportsTableRename() from CatalogTests for now and reintroduce the explicit @Disabled("BigQuery Metastore does not support rename tables") methods in TestBigQueryCatalog, and override+disable my tests in it.

@davseitsev
Copy link
Author

Resolved conflicts with the latest master

@davseitsev
Copy link
Author

Merged latest master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants