Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Sep 7, 2022

Iceberg GC functionalities like remove_orphan_files, expire_snapshots, drop_table (with purge=true) are not aware of the Nessie's branches and tags.
So, when a user accidentally calls these on Nessie-managed tables, these GC functionalities will clean up the files which are still referenced in other branches/tags. So, blocking all the Iceberg GC operations on Nessie-managed tables by default.

Nessie will provide engine agnostic, reference-aware GC functionalities (which doesn't call these Iceberg GC functionalities internally) with CLI support for handling the GC of expired/unreferenced files of a Nessie managed table.

@ajantha-bhat
Copy link
Member Author

Can one of the committers please help merge this PR?
Also please include this in 1.0 milestone or immediate next release milestone (would like to have this fix from the next version itself)

cc: @rdblue, @pvary, @Fokko, @RussellSpitzer, @snazy

@Fokko Fokko added this to the Iceberg 1.0.0 Release milestone Sep 8, 2022
@Fokko Fokko merged commit 3adb883 into apache:master Sep 8, 2022
@Fokko
Copy link
Contributor

Fokko commented Sep 8, 2022

Thanks @ajantha-bhat 🙌🏻

@nastra nastra removed this from the Iceberg 1.0.0 Release milestone Sep 28, 2022
@nastra
Copy link
Contributor

nastra commented Sep 28, 2022

just fyi that I removed this from the Iceberg 1.0.0 Release because 1.0.0 is supposed to be 0.14.1 + spotless + Deprecation cleanups only

@ajantha-bhat
Copy link
Member Author

@nastra: Let us discuss the scope of 1.0.0 in today's sync. I believe it can accommodate few issues as well like this one and #5754.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants