Skip to content

feat: split Parquet files into row-group-sized morsels#59

Draft
adriangb wants to merge 12 commits intomainfrom
row-group-morsel-split
Draft

feat: split Parquet files into row-group-sized morsels#59
adriangb wants to merge 12 commits intomainfrom
row-group-morsel-split

Conversation

@adriangb
Copy link
Copy Markdown
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

Follow-up to PR apache#21351 (Dynamic work scheduling in FileStream), which explicitly called out "Splitting files into smaller units (e.g. across row groups)" as a deferred next step. Today each Parquet file becomes exactly one morsel — a single ParquetPushDecoder over the entire pruned ParquetAccessPlan. That coarse granularity:

  • keeps the follow-on "steal row-group work across sibling FileStreams" effort blocked — there's no row-group-sized unit for the SharedWorkSource to hand out;
  • prevents pipelining row-group decode with downstream operator work inside a single partition;
  • makes LIMIT and dynamic-filter early-stop wait for whole-file granularity.

The Morselizer / MorselPlanner abstraction already supports returning Vec<Box<dyn Morsel>> from a single MorselPlan, and ScanState already drains multi-morsel plans FIFO. This PR takes advantage of that: RowGroupsPrunedParquetOpen::build_stream now emits N streams, one per row-group chunk.

What changes are included in this PR?

  • ParquetAccessPlan::split_into_chunks (new, pub(crate)). Packs consecutive surviving row groups into chunks bounded by a row budget (default 100k rows) and a compressed-byte budget (default 64 MiB). A single oversized row group still becomes its own chunk; no sub-row-group splitting is introduced in this PR. Skip entries are carried by the currently open chunk without forcing a boundary.
  • RowGroupsPrunedParquetOpen::build_stream now returns Vec<BoxStream> instead of a single stream. Per chunk: prepare the access plan slice, build a fresh ParquetPushDecoder, mint a fresh AsyncFileReader via ParquetFileReaderFactory::create_reader (the first chunk reuses the reader that loaded metadata / page index / bloom filters so its warm cache state is preserved). Row filter is rebuilt per chunk because RowFilter is not Clone.
  • ParquetOpenState::Ready now holds Vec<BoxStream>. ParquetMorselPlanner::plan maps each stream into a ParquetStreamMorsel. Empty Ready (file fully pruned) terminates the planner via Ok(None) instead of emitting an empty morsel plan.
  • EarlyStoppingStream attaches to the first chunk only. FilePruner is !Clone and stateful; keeping it on chunk 0 preserves whole-file early-stop on dynamic-filter narrowing.
  • Row-group reversal for sort pushdown is applied per-chunk via the existing PreparedAccessPlan::reverse, and the chunks Vec is reversed so the first emitted morsel corresponds to what was originally the file's last row groups.
  • Chunk budgets are fields on ParquetMorselizer, defaulting to the new module-level constants DEFAULT_MORSEL_MAX_ROWS / DEFAULT_MORSEL_MAX_COMPRESSED_BYTES. Wiring through ParquetSource for user configuration is deliberately deferred to a follow-up — this PR keeps the public surface unchanged.

Are these changes tested?

Yes.

  • Unit tests on ParquetAccessPlan::split_into_chunks cover: empty plan, all-skip, one-chunk-per-row-group when budget is tight, packing when budget allows, oversized single row group, byte-bounded splits, Skip preserved inside a chunk, Skip between chunks, Selection preserved verbatim in its chunk.
  • Integration tests in opener.rs:
    • test_row_group_split_produces_multiple_morsels — 3 row groups × 3 rows with a row budget of 3 → 3 morsels; concatenated output matches the single-morsel reference.
    • test_row_group_split_packs_within_budget — budget of 6 rows packs 2 row groups into the first morsel, leaving the third in its own.
    • test_row_group_split_honors_user_skip — a user-supplied ParquetAccessPlan with Skip in the middle round-trips correctly.
    • test_row_group_split_with_reversereverse_row_groups=true + split budget emits morsels with the originally-last row group first.
  • All 111 existing datasource-parquet library tests still pass (including the full test_reverse_scan_* suite). datafusion core lib (403) and core integration (931) also pass unchanged.

Are there any user-facing changes?

No observable semantic change — scans still produce the same rows in the same order. Runtime behavior changes: a multi-row-group Parquet scan now produces multiple morsels per file, each with its own ParquetPushDecoder + AsyncFileReader. For very small row groups this introduces per-morsel setup overhead that's amortized by the packing budget; for queries that hit LIMIT or dynamic-filter prunes mid-file, it generally lets the scan stop sooner.

🤖 Generated with Claude Code

@adriangb
Copy link
Copy Markdown
Member Author

run benchmarks

comphead and others added 10 commits April 21, 2026 00:27
…pache#21708)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…nputs (apache#21704)

## Which issue does this PR close?

- Closes apache#21702.

## Rationale for this change

`array_concat` hit an internal cast error when given a mix of `List` and
`LargeList` (or `FixedSizeList` and `LargeList`) arguments:

```sql
> select array_concat(make_array(1, 2), arrow_cast([3, 4], 'LargeList(Int64)'));
DataFusion error: Internal error: could not cast array of type List(Int64) to arrow_array::array::list_array::GenericListArray<i64>.
```

`ArrayConcat::coerce_types` was coercing only the base element type,
leaving the outer container alone. When the resolved return type is
`LargeList`, `array_concat_inner` later tries to downcast each arg to
`GenericListArray<i64>`, which fails for any `List` argument that
slipped through.

## What changes are included in this PR?

In `ArrayConcat::coerce_types`, after coercing the base type, also
promote each input's outermost `List` to `LargeList` when the return
type is a `LargeList`. `FixedSizeList` inputs already go through
`FixedSizedListToList` first and then get promoted too. Per-arg
dimensionality is preserved, so nested cases keep working with
`align_array_dimensions`.

## Are these changes tested?

Yes, added sqllogictests in `array_concat.slt` covering:
- `List` + `LargeList`
- `LargeList` + `List`
- `FixedSizeList` + `LargeList`
- Three-way mix `List`, `LargeList`, `List`

Each one also asserts `arrow_typeof(...) = LargeList(Int64)`.

## Are there any user-facing changes?

Queries that previously returned an internal cast error now return the
concatenated `LargeList` as expected. No API changes.
…messages (apache#20387)

## Which issue does this PR close?
- Closes apache#20386.

## Rationale for this change
`memory_limit` (`RuntimeEnvBuilder::new().with_memory_limit()`)
configuration uses `greedy` memory pool as `default`. However, if
`memory_pool` (`RuntimeEnvBuilder::new().with_memory_pool()`) is set, it
overrides by expected `memory_pool` config such as `fair`. Also, if both
`memory_limit` and `memory_pool` configs are not set, `unbounded` memory
pool will be used so it can be useful to expose `ultimately
used/selected pool` as part of `ResourcesExhausted` error message for
the end user awareness and the user may need to switch used memory pool
(`greedy`, `fair`, `unbounded`),
- Also, [this comparison
table](lance-format/lance#3601 (comment))
is an example use-case for both `greedy` and `fair` memory pools runtime
behaviors and this addition can help for this kind of comparison table
by exposing used memory pool info as part of native logs.

Please find following example use-cases by `datafusion-cli`:
**Case1**: datafusion-cli result when `memory-limit` and
`top-memory-consumers > 0` are set:
```
eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 3

DataFusion CLI v53.0.0
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
caused by
Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as:
  ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB,
  DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B,
  ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
```
**Case2**: datafusion-cli result when `memory-limit` and
`top-memory-consumers = 0` (disabling top memory consumers logging) are
set:
```
eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 0

DataFusion CLI v53.0.0
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
caused by
Resources exhausted: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
```
**Case3**: datafusion-cli result when only `memory-limit`, `memory-pool`
and `top-memory-consumers > 0` are set:
```
eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --mem-pool-type fair --top-memory-consumers 3 --command 'select * from generate_series(1,500000) as t1(v1) order by v1;'

DataFusion CLI v53.0.0
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
caused by
Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as:
  ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB,
  ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B,
  DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: fair(pool_size: 10.0 MB)
```

## What changes are included in this PR?
- Adding name property to MemoryPool instances,
- Expose used MemoryPool info to Resources Exhausted error messages

## Are these changes tested?
Yes and updating existing test cases.

## Are there any user-facing changes?
Yes, being updated Resources Exhausted error messages.
…pache#21749)

## Which issue does this PR close?

- Closes apache#21751.

## Rationale for this change

Profiling the planner suggests that a surprising amount of time was
being spent doing tree rewriting in the logical optimizer. One culprit
is `TreeNodeContainer::map_elements()` for `Box<C>` and `Arc<C>`, which
do the following:

* Fetch the inner `C` value from the `Box`/`Arc`
* Pass the innter value to the closure
* Wrap the return value of the closure in a newly allocated `Box` /
`Arc`, respectively

This allocates a fresh `Box` or `Arc` for every node visited while
walking an expression or logical plan, even if the tree rewrite we're
doing didn't modify the expression/plan node.

Instead, we can reuse the current `Box<C>` or `Arc<C>`: use
`std::mem::take()` to swap the inner value with `C::default()`, pass the
inner value to the closure, and put the result back in the original
container. Swapping the inner value with `C::default()` means the
container always has a valid value, which is important if the closure
panics.

For `Arc<C>`, we need to use `Arc::make_mut()`, which only clones if the
`Arc` is not unique.

This reduces the bytes allocated to plan TPC-H Q13 by ~22% (988 kB ->
765 kB), and reduces allocated blocks by 8.5% (210k -> 192k).

## What changes are included in this PR?

* Optimize `Box<C>::map_elements()` and `Arc<C>::map_elements()` as
described above
* Change `map_children()` for `Expr::Alias` to use `map_elements()`,
rather than invoking `f(*expr)` directly; this ensures that it can take
advantage of this optimization
* Make `LogicalPlan::default()` use a shared `DFSchema`, rather than
allocating a fresh `DFSchema` for every call. Because `default()` is not
in the hot path for tree rewriting, it is important that it is cheap
* Add unit tests for new `map_elements()` behavior
* Add note to migration guide for breaking API change

## Are these changes tested?

Yes, plus new unit tests added.

## Are there any user-facing changes?

Yes: `TreeNodeContainer` impls for `Box<C>` and `Arc<C>` now require `C:
Default`. This is a breaking API change for third-party code that
implements `TreeNodeContainer` for a custom type. The fix is usually
straightforward.
…nts (apache#20904)

## Which issue does this PR close?
Does not close but part of
apache#20766


## Rationale for this change
Details are in apache#20766. But main idea is to use existing distinct count
information to optimize joins similar to how Spark/Trino does

## What changes are included in this PR?
This PR extends cardinality estimation for semi/anti joins using
distinct counts

## Are these changes tested?
I've added cases but not sure if I should've added benchmarks on this. 

## Are there any user-facing changes?
No

---------

Co-authored-by: Alessandro Solimando <alessandro.solimando@gmail.com>
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
8.0.0 to 8.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v8.1.0 🌈 New input <code>no-project</code></h2>
<h2>Changes</h2>
<p>This add the a new boolean input <code>no-project</code>.
It only makes sense to use in combination with
<code>activate-environment: true</code> and will append <code>--no
project</code> to the <code>uv venv</code> call. This is for example
useful <a
href="https://redirect.github.com/astral-sh/setup-uv/issues/854">if you
have a pyproject.toml file with parts unparseable by uv</a></p>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add input no-project in combination with activate-environment <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/856">#856</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>fix: grant contents:write to validate-release job <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/860">#860</a>)</li>
<li>Add a release-gate step to the release workflow <a
href="https://github.com/zanieb"><code>@​zanieb</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/859">#859</a>)</li>
<li>Draft commitish releases <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/858">#858</a>)</li>
<li>Add action-types.yml to instructions <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/857">#857</a>)</li>
<li>chore: update known checksums for 0.11.7 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/853">#853</a>)</li>
<li>Refactor version resolving <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/852">#852</a>)</li>
<li>chore: update known checksums for 0.11.6 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/850">#850</a>)</li>
<li>chore: update known checksums for 0.11.5 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/845">#845</a>)</li>
<li>chore: update known checksums for 0.11.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/843">#843</a>)</li>
<li>Add a release workflow <a
href="https://github.com/zanieb"><code>@​zanieb</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/839">#839</a>)</li>
<li>chore: update known checksums for 0.11.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/836">#836</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Update ignore-nothing-to-cache documentation <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/833">#833</a>)</li>
<li>Pin setup-uv docs to v8 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/829">#829</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>chore(deps): bump release-drafter/release-drafter from 7.1.1 to
7.2.0 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/855">#855</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/08807647e7069bb48b6ef5acd8ec9567f424441b"><code>0880764</code></a>
fix: grant contents:write to validate-release job (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/860">#860</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/717d6aba0f15312f509f5c4999e34d71ecbab8a9"><code>717d6ab</code></a>
Add a release-gate step to the release workflow (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/859">#859</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/5a911eb3a3983b5e650f2dad95c1ce698ca94378"><code>5a911eb</code></a>
Draft commitish releases (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/858">#858</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/080c31e04cd7155b0ca676d08c7bc260a4476a23"><code>080c31e</code></a>
Add action-types.yml to instructions (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/857">#857</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/b3e97d2ba1a1eed7e9d1f8456dd06c3b725bc3a6"><code>b3e97d2</code></a>
Add input no-project in combination with activate-environment (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/856">#856</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/7dd591db9557f680290587fcc578372813b9ff64"><code>7dd591d</code></a>
chore(deps): bump release-drafter/release-drafter from 7.1.1 to 7.2.0
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/855">#855</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/1541b7762698877904805605192ecd63d0e4787a"><code>1541b77</code></a>
chore: update known checksums for 0.11.7 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/853">#853</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/cdfb2ee6dde255817c739680168ad81e184c4bfb"><code>cdfb2ee</code></a>
Refactor version resolving (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/852">#852</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/cb84d12dc6a0d495b82fcae14fa4559b90698660"><code>cb84d12</code></a>
chore: update known checksums for 0.11.6 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/850">#850</a>)</li>
<li><a
href="https://github.com/astral-sh/setup-uv/commit/1912cc65f2e839707d7a16f2372f30b57d35fd80"><code>1912cc6</code></a>
chore: update known checksums for 0.11.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/845">#845</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/setup-uv/compare/cec208311dfd045dd5311c1add060b2062131d57...08807647e7069bb48b6ef5acd8ec9567f424441b">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=8.0.0&new-version=8.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
One test case in `datafusion-cli` crate is failing locally if you run
all tests through `cargo nextest run`, but passes for `cargo test`

```
        FAIL [   0.375s] datafusion-cli::cli_integration cli_explain_environment_overrides
```

The reason is `nextest` triggers a different build graph, which enforces
a feature flag in `serde_json` dependency.

This PR enforces this feature in the `dev-dependencies` in
`datafusion-cli` crate, so the test become deterministic under different
test setup.

apache#21502 Fixed a similar issue,
and also explains why not enabling it in the global dependencies inside
`Cargo.toml`

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…argo-deps group (apache#21760)

Bumps the all-other-cargo-deps group with 1 update:
[aws-config](https://github.com/smithy-lang/smithy-rs).

Updates `aws-config` from 1.8.15 to 1.8.16
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/smithy-lang/smithy-rs/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aws-config&package-manager=cargo&previous-version=1.8.15&new-version=1.8.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…21758)

Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 4.35.1 to 4.35.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v4.35.2</h2>
<ul>
<li>The undocumented TRAP cache cleanup feature that could be enabled
using the <code>CODEQL_ACTION_CLEANUP_TRAP_CACHES</code> environment
variable is deprecated and will be removed in May 2026. If you are
affected by this, we recommend disabling TRAP caching by passing the
<code>trap-caching: false</code> input to the <code>init</code> Action.
<a
href="https://redirect.github.com/github/codeql-action/pull/3795">#3795</a></li>
<li>The Git version 2.36.0 requirement for improved incremental analysis
now only applies to repositories that contain submodules. <a
href="https://redirect.github.com/github/codeql-action/pull/3789">#3789</a></li>
<li>Python analysis on GHES no longer extracts the standard library,
relying instead on models of the standard library. This should result in
significantly faster extraction and analysis times, while the effect on
alerts should be minimal. <a
href="https://redirect.github.com/github/codeql-action/pull/3794">#3794</a></li>
<li>Fixed a bug in the validation of OIDC configurations for private
registries that was added in CodeQL Action 4.33.0 / 3.33.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3807">#3807</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2">2.25.2</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3823">#3823</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>[UNRELEASED]</h2>
<p>No user facing changes.</p>
<h2>4.35.2 - 15 Apr 2026</h2>
<ul>
<li>The undocumented TRAP cache cleanup feature that could be enabled
using the <code>CODEQL_ACTION_CLEANUP_TRAP_CACHES</code> environment
variable is deprecated and will be removed in May 2026. If you are
affected by this, we recommend disabling TRAP caching by passing the
<code>trap-caching: false</code> input to the <code>init</code> Action.
<a
href="https://redirect.github.com/github/codeql-action/pull/3795">#3795</a></li>
<li>The Git version 2.36.0 requirement for improved incremental analysis
now only applies to repositories that contain submodules. <a
href="https://redirect.github.com/github/codeql-action/pull/3789">#3789</a></li>
<li>Python analysis on GHES no longer extracts the standard library,
relying instead on models of the standard library. This should result in
significantly faster extraction and analysis times, while the effect on
alerts should be minimal. <a
href="https://redirect.github.com/github/codeql-action/pull/3794">#3794</a></li>
<li>Fixed a bug in the validation of OIDC configurations for private
registries that was added in CodeQL Action 4.33.0 / 3.33.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3807">#3807</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2">2.25.2</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3823">#3823</a></li>
</ul>
<h2>4.35.1 - 27 Mar 2026</h2>
<ul>
<li>Fix incorrect minimum required Git version for <a
href="https://redirect.github.com/github/roadmap/issues/1158">improved
incremental analysis</a>: it should have been 2.36.0, not 2.11.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3781">#3781</a></li>
</ul>
<h2>4.35.0 - 27 Mar 2026</h2>
<ul>
<li>Reduced the minimum Git version required for <a
href="https://redirect.github.com/github/roadmap/issues/1158">improved
incremental analysis</a> from 2.38.0 to 2.11.0. <a
href="https://redirect.github.com/github/codeql-action/pull/3767">#3767</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.1">2.25.1</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3773">#3773</a></li>
</ul>
<h2>4.34.1 - 20 Mar 2026</h2>
<ul>
<li>Downgrade default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.3">2.24.3</a>
due to issues with a small percentage of Actions and JavaScript
analyses. <a
href="https://redirect.github.com/github/codeql-action/pull/3762">#3762</a></li>
</ul>
<h2>4.34.0 - 20 Mar 2026</h2>
<ul>
<li>Added an experimental change which disables TRAP caching when <a
href="https://redirect.github.com/github/roadmap/issues/1158">improved
incremental analysis</a> is enabled, since improved incremental analysis
supersedes TRAP caching. This will improve performance and reduce
Actions cache usage. We expect to roll this change out to everyone in
March. <a
href="https://redirect.github.com/github/codeql-action/pull/3569">#3569</a></li>
<li>We are rolling out improved incremental analysis to C/C++ analyses
that use build mode <code>none</code>. We expect this rollout to be
complete by the end of April 2026. <a
href="https://redirect.github.com/github/codeql-action/pull/3584">#3584</a></li>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.0">2.25.0</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3585">#3585</a></li>
</ul>
<h2>4.33.0 - 16 Mar 2026</h2>
<ul>
<li>
<p>Upcoming change: Starting April 2026, the CodeQL Action will skip
collecting file coverage information on pull requests to improve
analysis performance. File coverage information will still be computed
on non-PR analyses. Pull request analyses will log a warning about this
upcoming change. <a
href="https://redirect.github.com/github/codeql-action/pull/3562">#3562</a></p>
<p>To opt out of this change:</p>
<ul>
<li><strong>Repositories owned by an organization:</strong> Create a
custom repository property with the name
<code>github-codeql-file-coverage-on-prs</code> and the type
&quot;True/false&quot;, then set this property to <code>true</code> in
the repository's settings. For more information, see <a
href="https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization">Managing
custom properties for repositories in your organization</a>.
Alternatively, if you are using an advanced setup workflow, you can set
the <code>CODEQL_ACTION_FILE_COVERAGE_ON_PRS</code> environment variable
to <code>true</code> in your workflow.</li>
<li><strong>User-owned repositories using default setup:</strong> Switch
to an advanced setup workflow and set the
<code>CODEQL_ACTION_FILE_COVERAGE_ON_PRS</code> environment variable to
<code>true</code> in your workflow.</li>
<li><strong>User-owned repositories using advanced setup:</strong> Set
the <code>CODEQL_ACTION_FILE_COVERAGE_ON_PRS</code> environment variable
to <code>true</code> in your workflow.</li>
</ul>
</li>
<li>
<p>Fixed <a
href="https://redirect.github.com/github/codeql-action/issues/3555">a
bug</a> which caused the CodeQL Action to fail loading repository
properties if a &quot;Multi select&quot; repository property was
configured for the repository. <a
href="https://redirect.github.com/github/codeql-action/pull/3557">#3557</a></p>
</li>
<li>
<p>The CodeQL Action now loads <a
href="https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization">custom
repository properties</a> on GitHub Enterprise Server, enabling the
customization of features such as
<code>github-codeql-disable-overlay</code> that was previously only
available on GitHub.com. <a
href="https://redirect.github.com/github/codeql-action/pull/3559">#3559</a></p>
</li>
<li>
<p>Once <a
href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private
package registries</a> can be configured with OIDC-based authentication
for organizations, the CodeQL Action will now be able to accept such
configurations. <a
href="https://redirect.github.com/github/codeql-action/pull/3563">#3563</a></p>
</li>
<li>
<p>Fixed the retry mechanism for database uploads. Previously this would
fail with the error &quot;Response body object should not be disturbed
or locked&quot;. <a
href="https://redirect.github.com/github/codeql-action/pull/3564">#3564</a></p>
</li>
<li>
<p>A warning is now emitted if the CodeQL Action detects a repository
property whose name suggests that it relates to the CodeQL Action, but
which is not one of the properties recognised by the current version of
the CodeQL Action. <a
href="https://redirect.github.com/github/codeql-action/pull/3570">#3570</a></p>
</li>
</ul>
<h2>4.32.6 - 05 Mar 2026</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/github/codeql-action/commit/95e58e9a2cdfd71adc6e0353d5c52f41a045d225"><code>95e58e9</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3824">#3824</a>
from github/update-v4.35.2-d2e135a73</li>
<li><a
href="https://github.com/github/codeql-action/commit/6f31bfe060e817d81e938dbec767969d20031e25"><code>6f31bfe</code></a>
Update changelog for v4.35.2</li>
<li><a
href="https://github.com/github/codeql-action/commit/d2e135a73a39154e3a231aeb49163c4661c5b8b1"><code>d2e135a</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3823">#3823</a>
from github/update-bundle/codeql-bundle-v2.25.2</li>
<li><a
href="https://github.com/github/codeql-action/commit/60abb65df09fcf213c398e064c8a80db1f15cdaf"><code>60abb65</code></a>
Add changelog note</li>
<li><a
href="https://github.com/github/codeql-action/commit/5a0a562209255e956ad8aafcee303294e64eefa2"><code>5a0a562</code></a>
Update default bundle to codeql-bundle-v2.25.2</li>
<li><a
href="https://github.com/github/codeql-action/commit/65216971a11ded447a6b76263d5a144519e5eee1"><code>6521697</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3820">#3820</a>
from github/dependabot/github_actions/dot-github/wor...</li>
<li><a
href="https://github.com/github/codeql-action/commit/3c45af2dd258e1623af1898da5c86545b514e028"><code>3c45af2</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3821">#3821</a>
from github/dependabot/npm_and_yarn/npm-minor-345b93...</li>
<li><a
href="https://github.com/github/codeql-action/commit/f1c339364c12f922998186ed897e45e3b4ae8874"><code>f1c3393</code></a>
Rebuild</li>
<li><a
href="https://github.com/github/codeql-action/commit/1024fc496c87e944a93e98d8cf2c09e2c7602a30"><code>1024fc4</code></a>
Rebuild</li>
<li><a
href="https://github.com/github/codeql-action/commit/9dd4cfed96030ccdfe1af4daf7a7964322704fed"><code>9dd4cfe</code></a>
Bump the npm-minor group across 1 directory with 6 updates</li>
<li>Additional commits viewable in <a
href="https://github.com/github/codeql-action/compare/c10b8064de6f491fea524254123dbe5e09572f13...95e58e9a2cdfd71adc6e0353d5c52f41a045d225">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=4.35.1&new-version=4.35.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…che#21757)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.75.10 to 2.75.18.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.75.18</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.26.1.</p>
</li>
<li>
<p>Update <code>wasm-tools@latest</code> to 1.247.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.16.</p>
</li>
<li>
<p>Update <code>espup@latest</code> to 0.17.1.</p>
</li>
<li>
<p>Update <code>trivy@latest</code> to 0.70.0.</p>
</li>
</ul>
<h2>2.75.17</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.9.18.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.15.</p>
</li>
</ul>
<h2>2.75.16</h2>
<ul>
<li>
<p>Update <code>uv@latest</code> to 0.11.7.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.14.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.25.9.</p>
</li>
<li>
<p>Update <code>cargo-machete@latest</code> to 0.9.2.</p>
</li>
<li>
<p>Update <code>cargo-deny@latest</code> to 0.19.4.</p>
</li>
</ul>
<h2>2.75.15</h2>
<ul>
<li>
<p>Update <code>cargo-nextest@latest</code> to 0.9.133.</p>
</li>
<li>
<p>Update <code>biome@latest</code> to 2.4.12.</p>
</li>
</ul>
<h2>2.75.14</h2>
<ul>
<li>
<p>Implement potential workaround for <a
href="https://redirect.github.com/actions/partner-runner-images/issues/169">windows-11-arm
runner bug</a> which sometimes causes installation failure.</p>
<p>The issue where this bug affected the startup of bash was addressed
in 2.71.2, but we received a report that the <a
href="https://redirect.github.com/taiki-e/install-action/pull/1657#issuecomment-4252717651">same
problem seems to occur when starting other commands as well</a>.</p>
</li>
<li>
<p>Update <code>cargo-deny@latest</code> to 0.19.2.</p>
</li>
</ul>
<h2>2.75.13</h2>
<ul>
<li>Update <code>zizmor@latest</code> to 1.24.1.</li>
</ul>
<h2>2.75.12</h2>
<ul>
<li>
<p>Update <code>typos@latest</code> to 1.45.1.</p>
</li>
<li>
<p>Update <code>cargo-xwin@latest</code> to 0.21.5.</p>
</li>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.18.1.</p>
</li>
</ul>
<h2>2.75.11</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.9.20.</p>
</li>
<li>
<p>Update <code>martin@latest</code> to 1.6.0.</p>
</li>
<li>
<p>Update <code>just@latest</code> to 1.50.0.</p>
</li>
<li>
<p>Update <code>tombi@latest</code> to 0.9.19.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.18.</p>
</li>
<li>
<p>Update <code>rclone@latest</code> to 1.73.5.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.17.</p>
</li>
</ul>
<h2>[2.75.18] - 2026-04-19</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.26.1.</p>
</li>
<li>
<p>Update <code>wasm-tools@latest</code> to 1.247.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.16.</p>
</li>
<li>
<p>Update <code>espup@latest</code> to 0.17.1.</p>
</li>
<li>
<p>Update <code>trivy@latest</code> to 0.70.0.</p>
</li>
</ul>
<h2>[2.75.17] - 2026-04-17</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.9.18.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.15.</p>
</li>
</ul>
<h2>[2.75.16] - 2026-04-17</h2>
<ul>
<li>
<p>Update <code>uv@latest</code> to 0.11.7.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.14.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/055f5df8c3f65ea01cd41e9dc855becd88953486"><code>055f5df</code></a>
Release 2.75.18</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/eabf60349346950549ed65f6beb018b4680f7968"><code>eabf603</code></a>
Add note about unset</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/4637b48a5ac188fd1395ec47093a2f53f6e1a2b3"><code>4637b48</code></a>
Early handle inputs</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/7a6306ece23f52d1c9356f8fe0d0dd0f791c7825"><code>7a6306e</code></a>
Update <code>vacuum@latest</code> to 0.26.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/cb13f5ef5263e03d2a7c5675b24ba8374dab72b4"><code>cb13f5e</code></a>
Update mise manifest</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/18cc1a4fb7bd8a9c7c6fc69fda6c5b6b6c477b3c"><code>18cc1a4</code></a>
Update <code>wasm-tools@latest</code> to 1.247.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/c7b05077fec4d0c69ebf2b84456491ae0e31295d"><code>c7b0507</code></a>
Update <code>mise@latest</code> to 2026.4.16</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/0ef4e7650f60cd0dce197648e865d433e0a15151"><code>0ef4e76</code></a>
Update <code>espup@latest</code> to 0.17.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/56ec35f1c0ea059ed79d67351a8376410b7a3c87"><code>56ec35f</code></a>
Update <code>trivy@latest</code> to 0.70.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/6874db14a159fb7865d830a7d60c4414d45c4031"><code>6874db1</code></a>
Update vacuum manifest</li>
<li>Additional commits viewable in <a
href="https://github.com/taiki-e/install-action/compare/85b24a67ef0c632dfefad70b9d5ce8fddb040754...055f5df8c3f65ea01cd41e9dc855becd88953486">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.75.10&new-version=2.75.18)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Each Parquet file previously produced a single morsel containing one
`ParquetPushDecoder` over the full pruned `ParquetAccessPlan`. Morselize
at row-group granularity instead: after all pruning work is done, pack
surviving row groups into chunks bounded by a per-morsel row budget and
compressed-byte budget (defaults: 100k rows, 64 MiB). Each chunk becomes
its own stream so the executor can interleave row-group decode work with
other operators and — in a follow-up — let sibling `FileStream`s steal
row-group-sized units of work across partitions.

A single oversized row group still becomes its own morsel; no
sub-row-group splitting is introduced.

`EarlyStoppingStream` (which is driven by the non-Clone `FilePruner`) is
attached only to the first morsel's stream so the whole file can still
short-circuit on dynamic-filter narrowing. Row-group reversal is applied
per-chunk on the `PreparedAccessPlan` and the chunk list is reversed so
reverse output order is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb force-pushed the row-group-morsel-split branch from 5b0a69a to ff805cf Compare April 21, 2026 14:19
The previous `build_stream` built every morsel's `RowFilter`,
`ParquetPushDecoder`, `AsyncFileReader`, and `Projector` eagerly in a
single loop inside the file planner — before any morsel was scheduled.
That loop ran on the scheduler thread and was visible as a 10–15%
regression vs. main on ClickBench-partitioned queries that have many
row-group morsels per file (e.g. Q15, Q16 at pushdown=off).

Replace `ParquetStreamMorsel` (which held a pre-built `BoxStream`) with
`ParquetLazyMorsel`, which holds only the per-chunk `ParquetAccessPlan`
plus an `Arc<LazyMorselShared>` of the file-level state. The decoder
and reader are constructed inside `Morsel::into_stream`, so each
morsel pays its setup cost only when the scheduler actually picks it
up, and the work is distributed across worker threads instead of
serialised on the planner.

`FilePruner` is `!Clone` and drives whole-file early-stop via
`EarlyStoppingStream`, so it still lives on chunk 0's morsel only.
The warm `async_file_reader` from metadata / page-index / bloom-filter
load is dropped at the end of `build_stream` — every morsel mints a
fresh reader via the factory at `into_stream` time. For both built-in
factories (`DefaultParquetFileReaderFactory`,
`CachedParquetFileReaderFactory`) the "warm cache" benefit of reusing
a reader is negligible because the underlying `Arc<dyn ObjectStore>` /
`Arc<dyn FileMetadataCache>` is already shared across readers, so the
simplification is free.

Local ClickBench-partitioned, 10 iterations, pushdown=off (M-series):

| Query | main  | eager (before) | lazy (this commit) |
|-------|------:|---------------:|-------------------:|
| Q14   |  325  | 335            |             313 ms |
| Q15   |  309  | 358            |             302 ms |
| Q16   |  911  | 1049           |             786 ms |
| Q24   |   48  | 55             |              56 ms |
| Q26   |   41  | 45             |              45 ms |

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants