perf: parallelize FRI fold with Rayon by MauroToscano · Pull Request #597 · yetanotherco/lambda_vm

MauroToscano · 2026-05-19T19:57:32Z

Summary

Parallelize fold_evaluations_in_place using par_chunks_exact(2) + par_iter for layers with >= 4096 elements
Falls back to sequential for small final layers where Rayon overhead dominates
Each FRI fold iteration is embarrassingly parallel — the output at position j depends only on inputs at 2j, 2j+1 and the j-th twiddle factor

Benchmark results

fib_iterative_4M, PARALLEL_TABLES=1, 5 samples:

Metric	Before	After	Delta
Wall clock (median)	38.6s	36.8s	-4.7%
CV	2.3%	2.2%	—
R4 fri::commit_phase (instruments)	3.83s	2.92s	-23.8%
Heap	26,498 MB	26,242 MB	-256 MB

Verification: baseline verifier accepts the proof.

Test plan

cargo test --release -p stark (124/124 pass)
Proof verified by baseline verifier binary
/bench on CI runner

The FRI fold loop was fully sequential despite being embarrassingly parallel (each output element depends only on its input pair and twiddle factor). Parallelize with par_chunks_exact(2) for layers above 4096 elements, falling back to the sequential path for small final layers where Rayon overhead dominates.

github-actions · 2026-05-19T19:58:56Z

Codex Code Review

Findings:

Potential Bug: crypto/stark/src/fri/fri_functions.rs:39
The parallel branch uses .par_chunks_exact(2).zip(inv_twiddles.par_iter()), so if inv_twiddles.len() < evals.len() / 2, it silently computes fewer folded values and replaces evals with that shorter vector. The serial path indexes inv_twiddles[j] and would panic on the same invariant violation. Since commit_phase_from_evaluations is public and does not validate evals.len() == domain_size, this can turn a bad caller input into a silently malformed FRI commitment when parallel is enabled. Add an explicit invariant check before both branches, e.g. assert_eq!(inv_twiddles.len(), half), or return an error if this API should be fallible.

Verification: I attempted cargo test -p stark fri --features parallel, but rustup failed before compiling because /home/runner/.rustup/tmp is read-only in this sandbox. git diff --check passed.

claude · 2026-05-19T20:00:15Z

+) where
+    F: IsSubFieldOf<E> + Send + Sync,
+    E: IsField + Send + Sync,
+    FieldElement<E>: Send + Sync,
+    FieldElement<F>: Send + Sync,


Low – Send + Sync bounds leak into non-parallel builds

These four bounds are unconditional, so they apply even when the parallel feature is off. Any caller with a field element type that doesn't implement Send + Sync (e.g. a type containing Rc or Cell) will fail to compile, even in a single-threaded build. In practice, all concrete field element types in this crate are Send + Sync, but the bounds still unnecessarily restrict the API surface.

The cleanest fix without duplicating the function body is to gate them with a custom supertrait:

#[cfg(feature = "parallel")] pub trait MaybeParallel: Send + Sync {} #[cfg(feature = "parallel")] impl<T: Send + Sync> MaybeParallel for T {} #[cfg(not(feature = "parallel"))] pub trait MaybeParallel {} #[cfg(not(feature = "parallel"))] impl<T> MaybeParallel for T {}

then replace Send + Sync with MaybeParallel throughout. Alternatively, if all callers satisfy the bounds unconditionally, document that assumption and leave it as-is — the important thing is the intent is clear.

claude · 2026-05-19T20:00:22Z

+            .par_chunks_exact(2)
+            .zip(inv_twiddles.par_iter())


Low – zip silently truncates if lengths diverge; sequential panics

.par_chunks_exact(2).zip(inv_twiddles.par_iter()) stops at the shorter iterator, so if inv_twiddles.len() < evals.len() / 2 the parallel path silently produces a shorter-than-expected vector and returns with a wrong result. The sequential path would catch that with an out-of-bounds index panic.

The caller (commit_phase_from_evaluations) always passes a correctly-sized inv_twiddles slice, so this won't trigger in practice. But a debug-mode assert would make the invariant explicit and catch misuse during development:

debug_assert_eq!(inv_twiddles.len(), evals.len() / 2);

claude · 2026-05-19T20:00:30Z

Review: perf: parallelize FRI fold with Rayon

Overall: looks correct and clean. The parallel fold formula matches the sequential one exactly, the new-buffer approach for the parallel path is the right call to avoid read/write aliasing, and the threshold of 4096 is a reasonable starting point backed by benchmarks.

Two low-severity issues flagged inline:

#	Severity	Location	Issue
1	Low	`fri_functions.rs:27-31`	`Send + Sync` bounds are unconditional — apply even when `parallel` feature is off, unnecessarily restricting the API
2	Low	`fri_functions.rs:40-41`	`zip` silently truncates on length mismatch where sequential would panic; a `debug_assert_eq!(inv_twiddles.len(), evals.len() / 2)` would surface this during development

No correctness bugs, no security concerns, no unsafe code.

MauroToscano · 2026-05-21T14:13:16Z

/bench

github-actions · 2026-05-21T14:15:13Z

Benchmark — fib_iterative_8M (median of 5)

_{Table parallelism: auto (cores / 3)}

Metric	main	PR	Δ
Peak heap	52516 MB	51894 MB	-622 MB (-1.2%) ⚪
Prove time	25.536s	25.593s	+0.057s (+0.2%) ⚪

✅ No significant change.

✅ Low variance (time: 2.6%, heap: 1.0%)

_{Commit: 225817d · Baseline: cached · Runner: self-hosted bench}

MauroToscano · 2026-05-21T14:37:59Z

/bench 5 1

claude Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize FRI fold with Rayon#597

perf: parallelize FRI fold with Rayon#597
MauroToscano wants to merge 1 commit into
mainfrom
opt/28-parallel-fri-fold

MauroToscano commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

claude Bot May 19, 2026

Uh oh!

claude Bot May 19, 2026

Uh oh!

claude Bot commented May 19, 2026

Uh oh!

MauroToscano commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

MauroToscano commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MauroToscano commented May 19, 2026

Summary

Benchmark results

Test plan

Uh oh!

github-actions Bot commented May 19, 2026

Codex Code Review

Uh oh!

claude Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 19, 2026

Review: perf: parallelize FRI fold with Rayon

Uh oh!

MauroToscano commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark — fib_iterative_8M (median of 5)

Uh oh!

MauroToscano commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 21, 2026 •

edited

Loading