File I/O Submodule for TableOperations

In https://github.com/Netflix/iceberg/issues/107 it was discussed that `InputFile` and `OutputFile` instances should be pluggable. We discussed the fact that provision of `InputFile` and `OutputFile` instances should be handled by the `TableOperations` API. However, the Spark data source in particular only uses `HadoopInputFile#fromPath` for reading and `HadoopOutputFile#fromPath` for writing. Using `TableOperations#newInputFile` and `TableOperations#newOutputFile`, would also be difficult because calling these methods on the executors would require `TableOperations` instances to be `Serializable`.

We propose having the `TableOperations` API provide a `FileIO` module that handles the narrow role of reading, creating / writing, and deleting files. We propose the following:

```
interface FileIO extends Serializable {
  InputFile newInputFile(String path);
  OutputFile newOutputFile(String path);
  void deleteFile(String path);
}
```

Then the following method would be added to `TableOperations`, and we would remove `TableOperations#newInputFile` and `TableOperations#newMetadataFile`.

```
interface TableOperations {
  FileIO fileIo();
  String resolveNewMetadataPath(String metadataFilename);
}
```

The need for `resolveNewMetadataPath` is because the new `FileIO` abstraction considers all locations as full paths, but the old method `TableOperations#newMetadataFile` assumes the argument is a file name, not a full path. Therefore now callers that used to call `TableOperations#newMetadataFile` should first retrieve the full path and then pass that along to `FileIO#newOutputFile`. For convenience we could add a helper default method like so:

```
interface TableOperations {
  FileIO fileIo();
  String resolveNewMetadataPath(String metadataFilename);
  default OutputFile newMetadataFile(String fileName) {
    return fileIo().newOutputFile(resolveMetadataPath(fileName));
  }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File I/O Submodule for TableOperations #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File I/O Submodule for TableOperations #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions