-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Currently: a fragment is a product of a scan; it is a lazy collection of scan tasks corresponding to a data source which is logically singular (like a single file, a single row group, ...). It would be more useful if instead a fragment were the direct object of a scan; one scans a fragment (or a collection of fragments):
-
Remove
ScanOptionsfrom Fragment's properties and move it intoFragment::Scanparameters. -
Remove
ScanOptionsfromDataset::GetFragments. We can provide an overload to support predicate pushdown in FileSystemDataset and UnionDatasetDataset::GetFragments(std::shared_ptr<Expression> predicate). -
Expose lazy accessor to Fragment::physical_schema()
-
Consolidate ScanOptions and ScanContext
This will lessen the cognitive dissonance between fragments and files since fragments will no longer include references to scan properties.
Reporter: Francois Saint-Jacques / @fsaintjacques
Assignee: Francois Saint-Jacques / @fsaintjacques
Related issues:
- [C++/Python][Dataset] Support schema evolution for integer columns (blocks)
- [C++][Dataset] Dataset should instantiate Fragment (blocks)
PRs and other links:
Note: This issue was originally created as ARROW-8065. Please see the migration documentation for further details.