I'm running this query over a 14 GB Arrow IPC file:
>>> ds = dataset.dataset("foo.ipc", format="ipc")
>>> t = ds.to_table(filter=dataset.field('ID') <= 1000).to_pandas()
>>> t
[snip]
[914 rows x 617 columns]
If I'm reading the documentation correctly, it should scan the file collecting the results, but not load it in memory. However, the RSS grows up to about 14 GB while running it, then goes back down.