-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Currently format specific scan options are embedded as members of the corresponding subclass of FileFormat. Extracting these to an options struct would provide better separation of concerns; currently the only way to scan a parquet formatted dataset with different options is to reconstruct it in a differently optioned format from its component files.
CsvFileFormat could retain ParseOptions as a member, since (for example) tab-separated vs comma-separated values can justifiably be considered different formats.
Reporter: Ben Kietzman / @bkietz
Assignee: David Li / @lidavidm
Related issues:
- [C++][Dataset] Add ConvertOptions and ReadOptions to CsvFileFormat (relates to)
- [GLib] Add CsvFragmentScanOption support (is related to)
- [R] Accept format-specific scan options in collect() (is related to)
- [C++][Dataset] Extract IpcFragmentScanOptions, ParquetFragmentScanOptions (is related to)
PRs and other links:
Note: This issue was originally created as ARROW-9749. Please see the migration documentation for further details.