[C++] ReadRangeCache should not retain data after read

I've added a unit test of the issue here: https://github.com/westonpace/arrow/tree/experiment/read-range-cache-retention

We use the ReadRangeCache for pre-buffering IPC and parquet files.  Sometimes those files are quite large (gigabytes).  The usage is roughly:

for X in num_row_groups:
  CacheAllThePiecesWeNeedForRowGroupX
  WaitForPiecesToArriveForRowGroupX
  ReadThePiecesWeNeedForRowGroupX

However, once we've read in row group X and passed it on to Acero, etc. we do not release the data for row group X.  The read range cache's entries vector still holds a pointer to the buffer.  The data is not released until the file reader itself is destroyed which only happens when we have finished processing an entire file.

This leads to excessive memory usage when pre-buffering is enabled.

This could potentially be a little difficult to implement because a single read range's cache entry could be shared by multiple ranges so we will need some kind of reference counting to know when we have fully finished with an entry and can release it.

**Reporter**: [Weston Pace](https://issues.apache.org/jira/browse/ARROW-17599) / @westonpace
**Assignee**: [Percy Camilo Triveño Aucahuasi](https://issues.apache.org/jira/browse/ARROW-17599) / @aucahuasi
**Watchers**: [Rok Mihevc](https://issues.apache.org/jira/browse/ARROW-17599) / @rok
#### Related issues:
- [[C++] Implement a read range process without caching](https://github.com/apache/arrow/issues/33311) (is related to)
- [Lower memory usage with filters](https://github.com/apache/arrow/issues/32838) (is related to)
#### PRs and other links:
- [GitHub Pull Request #14226](https://github.com/apache/arrow/pull/14226)

<sub>**Note**: *This issue was originally created as [ARROW-17599](https://issues.apache.org/jira/browse/ARROW-17599). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++] ReadRangeCache should not retain data after read #32846

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] ReadRangeCache should not retain data after read #32846

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions