Skip to content

[C++][Dataset] Untangle Dataset, Fragment and ScanOptions #24278

@asfimport

Description

@asfimport

Currently: a fragment is a product of a scan; it is a lazy collection of scan tasks corresponding to a data source which is logically singular (like a single file, a single row group, ...). It would be more useful if instead a fragment were the direct object of a scan; one scans a fragment (or a collection of fragments):

  1. Remove ScanOptions from Fragment's properties and move it into Fragment::Scan parameters.

  2. Remove ScanOptions from Dataset::GetFragments. We can provide an overload to support predicate pushdown in FileSystemDataset and UnionDataset Dataset::GetFragments(std::shared_ptr<Expression> predicate).

  3. Expose lazy accessor to Fragment::physical_schema()

  4. Consolidate ScanOptions and ScanContext

    This will lessen the cognitive dissonance between fragments and files since fragments will no longer include references to scan properties.

Reporter: Francois Saint-Jacques / @fsaintjacques
Assignee: Francois Saint-Jacques / @fsaintjacques

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-8065. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions