-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
As the Arrow ecosystem grows ever richer, desire paths emerge :)
Integrating Arrow based projects written in Rust works great across the C data interface. But it doesn't allow lazy execution or pushdowns in the same way that pyarrow Dataset/Scanner's do.
My proposal here is to expose Dataset/Scanner python abc's with s.t. rust libraries can extend via pyo3+python so higher level tooling (like duckdb for example, can query these without having to transfer the whole Table into memory first).
In keeping with the same principles as the C data interface, I think it would be sufficient for this python interface to be very minimal: Dataset with schema and scanner methods, Scanner with projected_schema and to_reader methods. The to_reader should return a RecordBatchReader which would then link pyo3 datasets into the Arrow C data interface.
Thanks for your consideration!
Component(s)
Python