Commit 8ef53b7

and

committed

refactor: defer morsel decoder build to Morsel::into_stream

The previous `build_stream` built every morsel's `RowFilter`, `ParquetPushDecoder`, `AsyncFileReader`, and `Projector` eagerly in a single loop inside the file planner — before any morsel was scheduled. That loop ran on the scheduler thread and was visible as a 10–15% regression vs. main on ClickBench-partitioned queries that have many row-group morsels per file (e.g. Q15, Q16 at pushdown=off). Replace `ParquetStreamMorsel` (which held a pre-built `BoxStream`) with `ParquetLazyMorsel`, which holds only the per-chunk `ParquetAccessPlan` plus an `Arc<LazyMorselShared>` of the file-level state. The decoder and reader are constructed inside `Morsel::into_stream`, so each morsel pays its setup cost only when the scheduler actually picks it up, and the work is distributed across worker threads instead of serialised on the planner. `FilePruner` is `!Clone` and drives whole-file early-stop via `EarlyStoppingStream`, so it still lives on chunk 0's morsel only. The warm `async_file_reader` from metadata / page-index / bloom-filter load is dropped at the end of `build_stream` — every morsel mints a fresh reader via the factory at `into_stream` time. For both built-in factories (`DefaultParquetFileReaderFactory`, `CachedParquetFileReaderFactory`) the "warm cache" benefit of reusing a reader is negligible because the underlying `Arc<dyn ObjectStore>` / `Arc<dyn FileMetadataCache>` is already shared across readers, so the simplification is free. Local ClickBench-partitioned, 10 iterations, pushdown=off (M-series): | Query | main | eager (before) | lazy (this commit) | |-------|------:|---------------:|-------------------:| | Q14 | 325 | 335 | 313 ms | | Q15 | 309 | 358 | 302 ms | | Q16 | 911 | 1049 | 786 ms | | Q24 | 48 | 55 | 56 ms | | Q26 | 41 | 45 | 45 ms | Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1 parent ff805cf commit 8ef53b7Copy full SHA for 8ef53b7

1 file changed

datafusion/datasource-parquet/src
- opener.rs

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 8ef53b7

File tree

0 commit comments