[Python] DataSet uses too much memory when filtering

I'm running this query over a 14 GB Arrow IPC file:

```python
>>> ds = dataset.dataset("foo.ipc", format="ipc")
>>> t = ds.to_table(filter=dataset.field('ID') <= 1000).to_pandas()
>>> t
[snip]
[914 rows x 617 columns]
```

If I'm reading the documentation correctly, it should scan the file collecting the results, but not load it in memory. However, the RSS grows up to about 14 GB while running it, then goes back down.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] DataSet uses too much memory when filtering #7338

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] DataSet uses too much memory when filtering #7338

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions