Perf: limit pushdown — avoid materializing full partitions for LIMIT queries

## Problem

The partition factory always materializes the full partition regardless of query:
```python
df = pivot(ds.isel(block))  # always loads all 38.4M rows
```

For queries like `SELECT * FROM ds LIMIT 100`, DataFusion applies the limit **after** the factory produces its batch. The factory receives no signal to stop early.

Currently `limit` is passed from `PrunableStreamingTable::scan()` down to DataFusion's `StreamingTable`, which applies it at the execution-plan level — but only after the factory has already materialized the full partition.

## Proposed fix

Pass an optional row limit into the factory (similar to projection pushdown). The `scan()` method can thread the limit through `PyArrowStreamPartition` so the factory can stop generating rows once the limit is reached.

```python
def make_stream(projection, limit=None) -> pa.RecordBatchReader:
    rows_needed = limit or total_rows
    # only materialize up to rows_needed rows
```

## Impact

For ARCO-ERA5, `LIMIT 10` currently reads 38.4M rows from the first matching partition. With this fix it would read at most `batch_size` rows.

Parent: #126

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: limit pushdown — avoid materializing full partitions for LIMIT queries #131

Problem

Proposed fix

Impact

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Perf: limit pushdown — avoid materializing full partitions for LIMIT queries #131

Description

Problem

Proposed fix

Impact

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions