Skip to content

[Python] Refine higher level dataset API #24184

@asfimport

Description

@asfimport

Provide a more intuitive way to construct nested dataset:

1. instead of using confusing factory function
   dataset([
        factory("s3://old-taxi-data", format="parquet"),
        factory("local/path/to/new/data", format="csv")
   ])
   
1. let the user to construct a new dataset directly from dataset objects
   dataset([ 
       dataset("s3://old-taxi-data", format="parquet"),
       dataset("local/path/to/new/data", format="csv")
   ])

In the future we might want to introduce a new Dataset class which wraps functionality of both the dataset actory and the materialized dataset enabling optimizations over rediscovery of already materialized datasets.

Reporter: Krisztian Szucs / @kszucs
Assignee: Krisztian Szucs / @kszucs

PRs and other links:

Note: This issue was originally created as ARROW-7965. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions