[Python][Dataset] Writing dataset from python iterator of record batches

At the moment, from python you can write a dataset with `ds.write_dataset` for example starting from a **list** of record batches. 

But this currently needs to be an actual list (or gets converted to a list), so an iterator or generated gets fully consumed (potentially bringing the record batches in memory), before starting to write. 

We should also be able to use the python iterator itself to back a `RecordBatchIterator`-like object, that can be consumed while writing the batches.

We already have a `arrow::py::PyRecordBatchReader` that might be useful here.

**Reporter**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-10882) / @jorisvandenbossche
**Assignee**: [David Li](https://issues.apache.org/jira/browse/ARROW-10882) / @lidavidm
#### Related issues:
- [[C++][Dataset] Separate datasets backed by readers from InMemoryDataset](https://github.com/apache/arrow/issues/28047) (is related to)
#### PRs and other links:
- [GitHub Pull Request #9802](https://github.com/apache/arrow/pull/9802)

<sub>**Note**: *This issue was originally created as [ARROW-10882](https://issues.apache.org/jira/browse/ARROW-10882). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python][Dataset] Writing dataset from python iterator of record batches #26817

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python][Dataset] Writing dataset from python iterator of record batches #26817

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions