[C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset?

The current `pyarrow.parquet.read_table`/`ParquetFile` can work with buffer (reader) objects (file-like objects, pyarrow.Buffer, pyarrow.BufferReader) as input when dealing with single files. This functionality is for example being used by pandas and kartothek (in addition to being extensively used in our own tests as well).

While we could keep the old implementation to handle single files (which is different from the ParquetDataset logic), there are also some advantages of being able to handle this in the Datasets API.  
For example, this would enable to filtering functionality of the datasets API, also for this single-file buffers use case, which would be a nice enhancement (currently, `read_table` does not support `filters` in case of single files, which is eg why kartothek implements this themselves).

Would this be possible to support?

The `arrow::dataset::FileSource` already has PATH and BUFFER enum types (https://github.com/apache/arrow/blob/08f8bff05af37921ff1e5a2b630ce1e7ec1c0ede/cpp/src/arrow/dataset/file_base.h#L46-L49), so it seems in principle possible to create a FileSource (for a FileSystemDataset / FileFragment) from a buffer instead of from a path?

**Reporter**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-8074) / @jorisvandenbossche
**Assignee**: [Ben Kietzman](https://issues.apache.org/jira/browse/ARROW-8074) / @bkietz
#### PRs and other links:
- [GitHub Pull Request #6645](https://github.com/apache/arrow/pull/6645)
- [GitHub Pull Request #7156](https://github.com/apache/arrow/pull/7156)

<sub>**Note**: *This issue was originally created as [ARROW-8074](https://issues.apache.org/jira/browse/ARROW-8074). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

	class ARROW_DS_EXPORT FileSource {
	public:
	// NOTE(kszucs): it'd be better to separate the BufferSource from FileSource
	enum DatasetType { PATH, BUFFER };

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset? #24286

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset? #24286

Description

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions