perf: Parallelize chunk_and_generate with par_chunks by nicole-graus · Pull Request #563 · yetanotherco/lambda_vm

nicole-graus · 2026-04-24T19:53:17Z

Optimization extracted from PR #545.

The 10 Phase 5 trace generators (cpu, memw, memw_aligned, memw_register, load, lt, shift, mul,
dvrm, branch) run serially inside chunk_and_generate, leaving the rayon pool idle.

This PR replaces ops.chunks(max_rows).map(generate).collect() with par_chunks under #[cfg(feature = "parallel")], with a sequential fallback. Adds T: Sync and Sync + Send bounds on the generator.

github-actions · 2026-04-24T19:55:06Z

Codex Code Review

No issues found in the PR diff.

The change is limited to parallelizing chunked trace/table generation behind the existing parallel feature. I did not see unsafe code, shared mutable state, VM semantic changes, or meaningful ordering/logic regressions in the touched code.

Validation note: I attempted cargo check -p lambda-vm-prover --features parallel, but the sandbox could not fetch/use the requested Rust toolchain/dependencies due read-only rustup state and blocked network.

claude · 2026-04-24T19:55:57Z

+        #[cfg(feature = "parallel")]
+        {
+            use rayon::prelude::*;
+            ops.par_chunks(max_rows).map(&generate).collect()


This parallelizes within one chunk_and_generate call — across chunks of the same op type. But the 10 calls at the Phase 5 call site (cpu, memw, memw_aligned, memw_register, load, lt, shift, mul, dvrm, branch) are still sequential. When each op type has only one chunk (ops.len() ≤ max_rows), par_chunks yields a single item and the Rayon pool stays idle — no speedup at all for the common small-trace case.

The higher-leverage optimization would be running the 10 calls in parallel via rayon::scope or nested rayon::join, mirroring the pattern already used for pages/register/halt below. If that's deferred to PR #545, a note at the call site would help.

nicole-graus · 2026-04-24T19:56:10Z

/bench 10

claude · 2026-04-24T19:56:12Z

Review of perf: Parallelize chunk_and_generate with par_chunks

No security issues. One functional concern about the parallelism model.

What the PR does

Adds par_chunks under #[cfg(feature = "parallel")] inside chunk_and_generate, so chunks of the same op type are processed in parallel. Adds the required T: Sync and Fn + Sync + Send bounds.

Correctness

The implementation is correct:

par_chunks().map().collect() is order-preserving in Rayon, so chunk ordering is maintained.
Using &generate (borrow) in the parallel path is intentional and correct — par_chunks().map() requires F: Fn + Sync + Send, and borrowing an impl Fn + Sync satisfies that.
The empty-slice fast-path stays single-threaded, consistent with prior behavior.

Performance concern

The 10 chunk_and_generate calls at the Phase 5 call site remain sequential. When any op type has only one chunk (ops.len() <= max_rows), par_chunks yields a single item and this change contributes zero speedup for that type — which is the common case for smaller programs.

The higher-impact opportunity is running the 10 calls concurrently via rayon::scope or nested rayon::join, mirroring the pattern already used for pages/register/halt. If that's deferred to PR #545, a comment above the Phase 5 block would help.

github-actions · 2026-04-24T20:07:07Z

Benchmark — fib_iterative_8M (median of 3)

_{Table parallelism: 1}

Metric	main	PR	Δ
Peak heap	49533 MB	49533 MB	+0 MB (+0.0%) ⚪
Prove time	70.380s	70.092s	-0.288s (-0.4%) ⚪

✅ No significant change.

✅ Low variance (time: 0.5%, heap: 0.0%)

_{Commit: 4e6f5c0 · Baseline: built from main · Runner: self-hosted bench}

nicole-graus · 2026-04-27T18:44:54Z

/bench k=1

Parallelize chunk_and_generate with par_chunks

874b017

claude Bot reviewed Apr 24, 2026

View reviewed changes

Merge branch 'main' into parallel-phase5-trace-builders

e7ec73a

diegokingston and others added 2 commits April 24, 2026 17:47

Merge branch 'main' into parallel-phase5-trace-builders

490ff9a

Merge branch 'main' into parallel-phase5-trace-builders

4e6f5c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Parallelize chunk_and_generate with par_chunks#563

perf: Parallelize chunk_and_generate with par_chunks#563
nicole-graus wants to merge 4 commits into
mainfrom
parallel-phase5-trace-builders

nicole-graus commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

nicole-graus commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

nicole-graus commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nicole-graus commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Codex Code Review

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

nicole-graus commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026

Review of perf: Parallelize chunk_and_generate with par_chunks

What the PR does

Correctness

Performance concern

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark — fib_iterative_8M (median of 3)

Uh oh!

nicole-graus commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 24, 2026 •

edited

Loading