Skip to content

perf: Parallelize chunk_and_generate with par_chunks#563

Draft
nicole-graus wants to merge 4 commits into
mainfrom
parallel-phase5-trace-builders
Draft

perf: Parallelize chunk_and_generate with par_chunks#563
nicole-graus wants to merge 4 commits into
mainfrom
parallel-phase5-trace-builders

Conversation

@nicole-graus
Copy link
Copy Markdown
Collaborator

Optimization extracted from PR #545.

The 10 Phase 5 trace generators (cpu, memw, memw_aligned, memw_register, load, lt, shift, mul,
dvrm, branch) run serially inside chunk_and_generate, leaving the rayon pool idle.

This PR replaces ops.chunks(max_rows).map(generate).collect() with par_chunks under #[cfg(feature = "parallel")], with a sequential fallback. Adds T: Sync and Sync + Send bounds on the generator.

@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

The change is limited to parallelizing chunked trace/table generation behind the existing parallel feature. I did not see unsafe code, shared mutable state, VM semantic changes, or meaningful ordering/logic regressions in the touched code.

Validation note: I attempted cargo check -p lambda-vm-prover --features parallel, but the sandbox could not fetch/use the requested Rust toolchain/dependencies due read-only rustup state and blocked network.

#[cfg(feature = "parallel")]
{
use rayon::prelude::*;
ops.par_chunks(max_rows).map(&generate).collect()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parallelizes within one chunk_and_generate call — across chunks of the same op type. But the 10 calls at the Phase 5 call site (cpu, memw, memw_aligned, memw_register, load, lt, shift, mul, dvrm, branch) are still sequential. When each op type has only one chunk (ops.len() ≤ max_rows), par_chunks yields a single item and the Rayon pool stays idle — no speedup at all for the common small-trace case.

The higher-leverage optimization would be running the 10 calls in parallel via rayon::scope or nested rayon::join, mirroring the pattern already used for pages/register/halt below. If that's deferred to PR #545, a note at the call site would help.

@nicole-graus
Copy link
Copy Markdown
Collaborator Author

/bench 10

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Review of perf: Parallelize chunk_and_generate with par_chunks

No security issues. One functional concern about the parallelism model.

What the PR does

Adds par_chunks under #[cfg(feature = "parallel")] inside chunk_and_generate, so chunks of the same op type are processed in parallel. Adds the required T: Sync and Fn + Sync + Send bounds.

Correctness

The implementation is correct:

  • par_chunks().map().collect() is order-preserving in Rayon, so chunk ordering is maintained.
  • Using &generate (borrow) in the parallel path is intentional and correct — par_chunks().map() requires F: Fn + Sync + Send, and borrowing an impl Fn + Sync satisfies that.
  • The empty-slice fast-path stays single-threaded, consistent with prior behavior.

Performance concern

The 10 chunk_and_generate calls at the Phase 5 call site remain sequential. When any op type has only one chunk (ops.len() <= max_rows), par_chunks yields a single item and this change contributes zero speedup for that type — which is the common case for smaller programs.

The higher-impact opportunity is running the 10 calls concurrently via rayon::scope or nested rayon::join, mirroring the pattern already used for pages/register/halt. If that's deferred to PR #545, a comment above the Phase 5 block would help.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Benchmark — fib_iterative_8M (median of 3)

Table parallelism: 1

Metric main PR Δ
Peak heap 49533 MB 49533 MB +0 MB (+0.0%) ⚪
Prove time 70.380s 70.092s -0.288s (-0.4%) ⚪

✅ No significant change.

✅ Low variance (time: 0.5%, heap: 0.0%)

Commit: 4e6f5c0 · Baseline: built from main · Runner: self-hosted bench

@nicole-graus
Copy link
Copy Markdown
Collaborator Author

/bench k=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants