Skip to content

[C++][Dataset] Let datasets be viewable with non-identical schema #24366

@asfimport

Description

@asfimport

It would be useful to allow some schema unification capability after discovery has completed. For example, if a FileSystemDataset is being wrapped into a UnionDataset with another and their schemas are unifiable then there is no reason we can't create the UnionDataset (rather than emitting an error because the schemas are not identical).

I think this behavior will be most naturally expressed in C++ like so:

virtual Result<Dataset> Dataset::ReplaceSchema(std::shared_ptr<Schema> schema) const = 0;

which will raise an error if the provided schema is not unifiable with the current dataset schema.

If this needs to be extended to non trivial projections then this will probably warrant a separate class, ProjectedDataset or so. Definitely follow up material (if desired)

Reporter: Ben Kietzman / @bkietz
Assignee: Ben Kietzman / @bkietz

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-8164. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions