Skip to content

[C++][Python][Dataset] Provide an option to toggle validation and schema inference in FileSystemDatasetFactoryOptions #24271

@asfimport

Description

@asfimport

This can be costly and is not always necessary.

At the same time we could move file validation into the scan tasks; currently all files are inspected as the dataset is constructed, which can be expensive if the filesystem is slow. We'll be performing the validation multiple times but the check will be cheap since at scan time we'll be reading the file into memory anyway.

Reporter: Ben Kietzman / @bkietz
Assignee: Francois Saint-Jacques / @fsaintjacques

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-8058. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions