Core: Move deleted files to Hadoop trash if configured #14501

jordepic · 2025-11-04T16:21:49Z

As of now, the HadoopFileIO uses the Java delete
API, which always skips using a configured trash
directory. If the table's hadoop configuration
has trash enabled, we should use it.

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java

core/src/test/java/org/apache/iceberg/hadoop/TestHadoopFileIO.java

anuragmantri

Thanks for the PR @jordepic. This is very useful.

However, it seems like a behavior change (even if trash was enabled previously, Iceberg was not honoring it). IMO, we should make this configurable using a property to avoid surprises (unexpected storage consumption).

anuragmantri · 2025-11-06T00:02:43Z

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java

  }

+  private void deletePath(FileSystem fs, Path toDelete, boolean recursive) throws IOException {
+    Trash trash = new Trash(fs, getConf());


I'm concerned about number of Trash objects we create. Does Hadoop API ensure we can reuse the trash object for a given (fs, conf)?
I couldn't tell from https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/Trash.html#Trash-org.apache.hadoop.fs.FileSystem-org.apache.hadoop.conf.Configuration-

It's a good call. I've added a new toggle that can be put in the hadoop configuration to determine if we want to use the trash for iceberg, following Russel Spitzer's example in other HadoopFileIO changes.

I've taken a look regarding object reuse. The trash can change due to lots of changes in configuration (meaning I'd have to create a cache based on 5+ configuration values which are susceptible to change in the future), unlike the file system (Key doesn't actually rely on conf, just relies on the URI and user group information). With that being said, the change that I made to check for hadoop configuration first makes it so that we don't create the Trash object unless specifically opted into. I hope that this is good enough for now - an iceberg user will now have to opt into this change to experience any possible object churn.

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java

danielcweeks · 2025-11-12T18:35:50Z

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java

+      return;
+    }
+    Trash trash = new Trash(fs, getConf());
+    if (!trash.isEnabled()) {


If you can configure the trash at the Hadoop Conf level, why are we adding a separate configuration? Shouldn't we just do this when the HadoopFileIO is initialized and rely on the Hadoop trash conf? It feels like we're adding two separate configs to enable trash.

Fair enough. A previous commenter here wanted to do this in order to make the trash "opt-in", not "opt-out", for those who had already configured it.

I agree though, if trash is configured the normal way, we should use it.

Sorry, my understanding was that the current java API for delete always "ignores" the trash config even if it is set server side. In which case, not ignoring it anymore could be potentially unexpected. But I guess that is the right thing to do (honor the trash config if it is set via Hadoop conf). I'm ok with just using the Hadoop conf. Thanks!

I'm not particularly familiar with the way the Hadoop Trash works, but it's quite odd to me. The Trash isn't natively part of the FileSystem (like versioned objects are native to cloud storage). This leaves an awkward situation where, if we respect the core-site.xml config, then we turn it on everywhere like you said @anuragmantri (assuming the deleter has access to the trash?).

I could see this going both ways in that if you configure the core-site.xml and didn't realize there was a second property to set, you would lose data. You also potentially have the issue where deleted data goes to the trash configured for whoever deleted it (so two separate deletes could end up in completely different locations).

I'm not sure what's right here and open to suggestions.

I could see this going both ways in that if you configure the core-site.xml and didn't realize there was a second property to set, you would lose data.

To be fair, this is the status quo

You also potentially have the issue where deleted data goes to the trash configured for whoever deleted it (so two separate deletes could end up in completely different locations).

I suppose that would be the case for HDFS trash in general - even if writing parquet files without iceberg, each writer might potentially use a different trash based on configuration.

I think these are just limitations of the existing system, but let me know if you disagree

danielcweeks · 2025-11-12T18:40:47Z

@jordepic I'm a little concerned about the utility of this. If we're just relocating arbitrary files into the trash location, how do you know which table it was associated with? In isolation it seems like it would make sense, but across a warehouse, this feels like it would be really difficult to reconstruct anything.

jordepic · 2025-11-12T21:06:49Z

@jordepic I'm a little concerned about the utility of this. If we're just relocating arbitrary files into the trash location, how do you know which table it was associated with? In isolation it seems like it would make sense, but across a warehouse, this feels like it would be really difficult to reconstruct anything.

@danielcweeks a file at path /iceberg/tablename/data/.... is relocated to /.Trash/current/iceberg/tablename/data/...

It doesn't go to a completely arbitrary path!

anuragmantri · 2025-11-14T00:15:45Z

In isolation it seems like it would make sense, but across a warehouse, this feels like it would be really difficult to reconstruct anything.

I agree that this may not very useful in isolation, but can we still let the client use the trash in HadoopIO (if configured) and have the ability for users to restore the table state. We have had examples where users are able to restore accidentally deleted files via cloud providers' object lifecycle policies but could not do so in Hadoop environments because the client was not using the trash.

ludlows · 2025-11-15T10:47:15Z

it would be nice to provide a table level parameter to control this behavior

jordepic · 2025-11-17T16:48:16Z

it would be nice to provide a table level parameter to control this behavior

@ludlows That was basically the first iteration of my change. I think that @danielcweeks felt this level of control was unnecessary, and that those that configure their hadoop to use the trash should use it.

Open to more discussion here.

danielcweeks · 2025-11-18T18:17:08Z

@jordepic and @ludlows After looking a little more into the way trash works, I don't think this is something we want to turn on at a table level (especially considering how this implementation works).

The Trash feature in Hadoop/HDFS is quite strange as it's a client, config, and cluster level feature that all depend upon each other. For example, the client has to respect the config and initialize the Trash and perform a move operation otherwise it's ignored. The config has to be set and configured properly to a location the user has access to. Finally, if you don't apply the configuration to both the client and the NameNode, then cleanup won't be performed properly.

Given all of that, this feels very much like a administrator-level feature that needs to be configured (this appears to be the case for Cloudera already, though I don't know if engines like Hive/Impala respect the trash settings).

It could be potentially dangerous to allow users to configure this on a per-table basis because cleanup may not be configured, which may result in data that should be deleted, persisting in the file system. There's also nothing that appears to prevent the configuration from being applied to other file-system implementations (like S3A), which would be bad (data copy, no cleanup), but I feel like we should discourage that. @jordepic Is there anything we can do to prevent this?

I'm not a huge fan of this approach, but it seems like what we have to work with.

jordepic · 2025-11-18T19:21:30Z

It could be potentially dangerous to allow users to configure this on a per-table basis because cleanup may not be configured, which may result in data that should be deleted, persisting in the file system.

When you call trash.isEnabled(), it checks whether the TrashPolicy.isEnabled(), and in the TrashPolicyDefault, isEnabled() ensures that the deletion interval is > 0. So I think this may be a non issue. If people override their trash class to be something else, it could be an issue.

For reference:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java#L62

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicy.java#L142

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java#L126

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java#L130

Hive also seems to employ the HDFS trash:
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java#L80

There's also nothing that appears to prevent the configuration from being applied to other file-system implementations (like S3A), which would be bad (data copy, no cleanup), but I feel like we should discourage that. @jordepic Is there anything we can do to prevent this?

I'm less sure what you mean on this one. We aren't making this change in the s3 file IO, but I'm less familiar with the differences between that and s3a.

danielcweeks · 2025-11-18T22:00:27Z

When you call trash.isEnabled(), it checks whether the TrashPolicy.isEnabled(), and in the TrashPolicyDefault, isEnabled() ensures that the deletion interval is > 0. So I think this may be a non issue. If people override their trash class to be something else, it could be an issue.

The issue is that the config can be different for the client than for the NameNode. So if a client configures interval > 0, but the NameNode does not have that config, then a client will move data files, but they will never be cleaned up.

I'm less sure what you mean on this one. We aren't making this change in the s3 file IO, but I'm less familiar with the differences between that and s3a.

HadoopFileIO is an abstraction for all Hadoop FileSystem implementations (DistributedFileSystem, S3AFileSystem, GCSFileSystem, etc.). That means that if I enable this in core-side.xml and use a s3 mapped scheme, I would trigger the move behavior, which I don't think we want for non HDFS file systems. The config (fs.trash.interval) is not specific to a scheme, so it appears to be global for all file system implementations.

jordepic · 2025-11-18T22:10:54Z

The issue is that the config can be different for the client than for the NameNode. So if a client configures interval > 0, but the NameNode does not have that config, then a client will move data files, but they will never be cleaned up.

Good point. Though, at the end of the day, I'm not sure that I see this differently from any other misconfiguration that an iceberg user might have that would adversely impact them. For example, we misconfigured a table location and then removed an entire hadoop directory thinking they were orphan files, haha!

HadoopFileIO is an abstraction for all Hadoop FileSystem implementations (DistributedFileSystem, S3AFileSystem, GCSFileSystem, etc.). That means that if I enable this in core-side.xml and use a s3 mapped scheme, I would trigger the move behavior, which I don't think we want for non HDFS file systems. The config (fs.trash.interval) is not specific to a scheme, so it appears to be global for all file system implementations.

Also a fair point. I think that I could resolve this one pretty safely using some instanceOf checks on the FileSystem object. Are you at all opposed to that?

As of now, the HadoopFileIO uses the Java delete API, which always skips using a configured trash directory. If the table's hadoop configuration has trash enabled, we should use it. We should also only do this for implementations where trashing files is acceptable. In our case, this is the LocalFileSystem and the DistributedFileSystem.

danielcweeks

@jordepic I think this looks good. When we get closer to the 1.11 release, please reach out to the release manager to highlight this in the release notes as it could have an impact on people running hdfs.

Thanks!

manuzhang · 2025-11-21T01:46:25Z

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java


+  private void deletePath(FileSystem fs, Path toDelete, boolean recursive) throws IOException {
+    Trash trash = new Trash(fs, getConf());
+    if ((fs instanceof LocalFileSystem || fs instanceof DistributedFileSystem)


How about ViewFileSystem?

steveloughran · 2026-01-05T19:07:34Z

@jordepic @danielcweeks joining in very late here.

that trash api really exists to stop users doing things on the command line, so hadoop fs -rm -rf / doesn't (on s3a:// we don't let you delete root as an alternative).
Database operations tend not not go through trash on the basis that databases can do their own thing and/or you need disaster recovery mechanisms at this point.

I do see hive has it as a safety check, presumably someone did a DROP TABLE and changed their mind. I suspect it is not used on every file deletion though, more the whole-table operations, because one aspect of trash it likes to be atomic: moving a whole table in there gives you that.

S3aFs doesn't like trash as the PoV there is "S3 versioning may not be atomic but it's a lot faster than renaming".

We've discussed having a plugin policy here where each fs could have its own .HADOOP-18013. ABFS: add cloud trash policy with per-schema policy selection; superceded by something with active development apache/hadoop#8063 .

I'll see about getting that in.

Regarding this patch

it's going to cause problems in HD/Insight as micrsoft don't put the HDFS Jars on the classpath, and this has explicit reference to the classes.
it doesn't let people turn on trash on azure storage or elsewhere if they want it.

What about just a configuration option "iceberg.hadoop.trash.schemas" to take a list of filesystem schemas "hdfs, viewfs, file, abfs" for which trash is enabled?.

manuzhang · 2026-01-06T01:41:26Z

What about just a configuration option "iceberg.hadoop.trash.schemas" to take a list of filesystem schemas "hdfs, viewfs, file, abfs" for which trash is enabled?.

I like this idea as viewfs was not handled in this PR. @steveloughran Do you plan to open a follow-up PR?

steveloughran · 2026-01-06T10:57:56Z

@manuzhang yes, I also want to get the bulk delete api calls in for cloud delete performance; the changes here are complicating that. I can do this change first. Then when ozone adds its own trash policy, it'll be easy to support

jordepic · 2026-01-14T18:52:30Z

Hi @steveloughran ! I'm sorry for the very late response on my end here. I'm happy to review or take care of the follow up change - let me know what you prefer.

steveloughran · 2026-01-16T19:35:03Z

I'll have a go at the change

As of now, the HadoopFileIO uses the Java delete API, which always skips using a configured trash directory. If the table's hadoop configuration has trash enabled, we should use it. We should also only do this for implementations where trashing files is acceptable. In our case, this is the LocalFileSystem and the DistributedFileSystem. Co-authored-by: Jordan Epstein <jordan.epstein@imc.com>

steveloughran · 2026-01-21T18:36:45Z

started the new PR. Also discovered a regression here, the moveToTrash() code raises an FNFE if there's no file/dir at the end of the path.

Caused by: java.io.FileNotFoundException: File file:/var/folders/4n/w4cjr_d95kg9bxkl6sz3n3ym0000gr/T/junit-2971716127417174714/missing does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:980)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1301)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:970)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
	at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:139)
	at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:140)
	at org.apache.iceberg.hadoop.HadoopFileIO.deletePath(HadoopFileIO.java:247)
	at org.apache.iceberg.hadoop.HadoopFileIO.deleteFile(HadoopFileIO.java:109)
	... 4 more

This means that if ever a file which has already been deleted is deleted again: failure. Whereas filesystem.delete() is just a no-op.

Going to catch an FNFE in moving a file.

steveloughran · 2026-01-27T18:10:08Z

The Trash feature in Hadoop/HDFS is quite strange as it's a client, config, and cluster level feature that all depend upon each other. For example, the client has to respect the config and initialize the Trash and perform a move operation otherwise it's ignored. The config has to be set and configured properly to a location the user has access to.

For HDFS the client actually asks the service what the policy is

    public FsServerDefaults getServerDefaults() throws IOException {
        return this.dfs.getServerDefaults();
    }

ViewFS does this for the resolved FS of a path, so will get it for hdfs there.

The PR #15111 uses this info so the entire trash settings should be picked up from the store. If a client config has trash off when working with local fs, s3, abfs etc, when it interacts with hdfs it'll still get the settings from that cluster.

github-actions bot added the core label Nov 4, 2025

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from b0ba8b9 to 8d07d49 Compare November 4, 2025 16:50

manuzhang reviewed Nov 5, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java Outdated Show resolved Hide resolved

manuzhang reviewed Nov 5, 2025

View reviewed changes

core/src/test/java/org/apache/iceberg/hadoop/TestHadoopFileIO.java Show resolved Hide resolved

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from 8d07d49 to 5cb16cf Compare November 5, 2025 16:14

anuragmantri reviewed Nov 6, 2025

View reviewed changes

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from 5cb16cf to efc6a8f Compare November 6, 2025 15:25

danielcweeks self-requested a review November 10, 2025 19:24

danielcweeks reviewed Nov 12, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java Outdated Show resolved Hide resolved

danielcweeks reviewed Nov 12, 2025

View reviewed changes

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from efc6a8f to ba28083 Compare November 12, 2025 21:03

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from ba28083 to d0d62e8 Compare November 12, 2025 21:23

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from d0d62e8 to e490367 Compare November 18, 2025 22:26

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from e490367 to 416c041 Compare November 18, 2025 23:01

danielcweeks approved these changes Nov 20, 2025

View reviewed changes

danielcweeks merged commit 06c1e0a into apache:main Nov 20, 2025
44 checks passed

manuzhang reviewed Nov 21, 2025

View reviewed changes

steveloughran mentioned this pull request Jan 20, 2026

Core: HadoopFileIO to take list of filesystem schemas to enable trash for #15093

Open

3 tasks

Core: Move deleted files to Hadoop trash if configured #14501

Core: Move deleted files to Hadoop trash if configured #14501

Conversation

jordepic commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anuragmantri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielcweeks commented Nov 12, 2025

Uh oh!

jordepic commented Nov 12, 2025

Uh oh!

anuragmantri commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ludlows commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordepic commented Nov 17, 2025

Uh oh!

danielcweeks commented Nov 18, 2025

Uh oh!

jordepic commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielcweeks commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordepic commented Nov 18, 2025

Uh oh!

danielcweeks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Jan 5, 2026

Uh oh!

manuzhang commented Jan 6, 2026

Uh oh!

steveloughran commented Jan 6, 2026

Uh oh!

jordepic commented Jan 14, 2026

Uh oh!

steveloughran commented Jan 16, 2026

Uh oh!

steveloughran commented Jan 21, 2026

Uh oh!

steveloughran commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jordepic commented Nov 4, 2025 •

edited

Loading

anuragmantri commented Nov 14, 2025 •

edited

Loading

ludlows commented Nov 15, 2025 •

edited

Loading

jordepic commented Nov 18, 2025 •

edited

Loading

danielcweeks commented Nov 18, 2025 •

edited

Loading