Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions prover/src/tables/trace_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1686,15 +1686,23 @@ struct CollectedOps {
}

/// Chunk raw ops and generate one trace table per chunk.
fn chunk_and_generate<T>(
fn chunk_and_generate<T: Sync>(
ops: &[T],
max_rows: usize,
generate: impl Fn(&[T]) -> TraceTable<GoldilocksField, GoldilocksExtension>,
generate: impl Fn(&[T]) -> TraceTable<GoldilocksField, GoldilocksExtension> + Sync + Send,
) -> Vec<TraceTable<GoldilocksField, GoldilocksExtension>> {
if ops.is_empty() {
vec![generate(&[])]
} else {
ops.chunks(max_rows).map(generate).collect()
#[cfg(feature = "parallel")]
{
use rayon::prelude::*;
ops.par_chunks(max_rows).map(&generate).collect()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parallelizes within one chunk_and_generate call — across chunks of the same op type. But the 10 calls at the Phase 5 call site (cpu, memw, memw_aligned, memw_register, load, lt, shift, mul, dvrm, branch) are still sequential. When each op type has only one chunk (ops.len() ≤ max_rows), par_chunks yields a single item and the Rayon pool stays idle — no speedup at all for the common small-trace case.

The higher-leverage optimization would be running the 10 calls in parallel via rayon::scope or nested rayon::join, mirroring the pattern already used for pages/register/halt below. If that's deferred to PR #545, a note at the call site would help.

}
#[cfg(not(feature = "parallel"))]
{
ops.chunks(max_rows).map(generate).collect()
}
}
}

Expand Down
Loading