Skip to content

[Python][Rust] Create extension point in python for Dataset/Scanner #33986

@changhiskhan

Description

@changhiskhan

Describe the enhancement requested

As the Arrow ecosystem grows ever richer, desire paths emerge :)

Integrating Arrow based projects written in Rust works great across the C data interface. But it doesn't allow lazy execution or pushdowns in the same way that pyarrow Dataset/Scanner's do.

My proposal here is to expose Dataset/Scanner python abc's with s.t. rust libraries can extend via pyo3+python so higher level tooling (like duckdb for example, can query these without having to transfer the whole Table into memory first).

In keeping with the same principles as the C data interface, I think it would be sufficient for this python interface to be very minimal: Dataset with schema and scanner methods, Scanner with projected_schema and to_reader methods. The to_reader should return a RecordBatchReader which would then link pyo3 datasets into the Arrow C data interface.

Thanks for your consideration!

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions