[C++][Dataset] Projection pushdown in IPC (feather) format

The datasets API uses the RecordBatchFileReader to read feather files.  This reader will always "read" the entire file.  If the file is memory mapped this might not be a true read.  However, the datasets API never uses memory mapped files.

This large read from RAM (or worse, disk) becomes a bottleneck for simple queries that load only a few columns from the dataset.

The fix may be to modify the reader to seek out and pluck only the needed data.  Or the fix may be to modify the datasets API to use memory mapped files when possible (although the former approach seems more generally applicable).

This is related to ARROW-8250 but that issue seems more focused on row filtering while this issue is for column filtering.



**Reporter**: [Weston Pace](https://issues.apache.org/jira/browse/ARROW-14503) / @westonpace
#### Related issues:
- [[C++] Enable fine-grained I/O (coalescing) in IPC reader](https://github.com/apache/arrow/issues/28430) (duplicates)

<sub>**Note**: *This issue was originally created as [ARROW-14503](https://issues.apache.org/jira/browse/ARROW-14503). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Dataset] Projection pushdown in IPC (feather) format #30060

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++][Dataset] Projection pushdown in IPC (feather) format #30060

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions