-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Currently, the default FileSystemFactoryOptions::exclude_invalid_files will silently ignore unsupported files (either IO error, not of the valid format, corruption, missing compression codecs, etc...) when creating a FileSystemSource.
We should change this behavior to propagate an error in the Inspect/Finish calls by default and allow the user to toggle exclude_invalid_files. The error should contain at least the file path and a decipherable error (if possible).
Reporter: Francois Saint-Jacques / @fsaintjacques
Assignee: Francois Saint-Jacques / @fsaintjacques
Related issues:
- [Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset (is related to)
- [C++][Python][Dataset] Provide an option to toggle validation and schema inference in FileSystemDatasetFactoryOptions (is related to)
Note: This issue was originally created as ARROW-7673. Please see the migration documentation for further details.