-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
At the moment, from python you can write a dataset with ds.write_dataset for example starting from a list of record batches.
But this currently needs to be an actual list (or gets converted to a list), so an iterator or generated gets fully consumed (potentially bringing the record batches in memory), before starting to write.
We should also be able to use the python iterator itself to back a RecordBatchIterator-like object, that can be consumed while writing the batches.
We already have a arrow::py::PyRecordBatchReader that might be useful here.
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: David Li / @lidavidm
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-10882. Please see the migration documentation for further details.