perf: sparse gather for LogUp aux trace when multiplicity is selector… by diegokingston · Pull Request #565 · yetanotherco/lambda_vm

diegokingston · 2026-04-24T21:22:49Z

…-gated

Many bus interactions are gated by mutually-exclusive selector columns (e.g., AND/OR/XOR op flags, ADD/SUB/MUL op flags): when the selector is zero, the row's contribution to the term column is zero regardless of the fingerprint. We now detect Column/Sum/Sum3 multiplicities and, when fewer than 7/8 of a chunk's rows are active, compute fingerprints only at the active rows before batch-inverting. Inactive rows stay at zero from the initial vec![zero; trace_len] allocation.

Soundness: sparse_active_rows marks a row as active whenever any contributing column is non-zero, which is an over-approximation of m != 0 (cancellation can only cause us to process more rows, never fewer). Dense multiplicities (One, Negated, Diff, Linear) fall back to the unchanged contiguous path.

Applies to both single (compute_logup_term_column) and paired (compute_logup_batched_term_column) term-column builders. Verified via all 71 stark bus soundness tests and 265 prover lib tests.

…-gated Many bus interactions are gated by mutually-exclusive selector columns (e.g., AND/OR/XOR op flags, ADD/SUB/MUL op flags): when the selector is zero, the row's contribution to the term column is zero regardless of the fingerprint. We now detect Column/Sum/Sum3 multiplicities and, when fewer than 7/8 of a chunk's rows are active, compute fingerprints only at the active rows before batch-inverting. Inactive rows stay at zero from the initial `vec![zero; trace_len]` allocation. Soundness: `sparse_active_rows` marks a row as active whenever any contributing column is non-zero, which is an over-approximation of `m != 0` (cancellation can only cause us to process more rows, never fewer). Dense multiplicities (One, Negated, Diff, Linear) fall back to the unchanged contiguous path. Applies to both single (`compute_logup_term_column`) and paired (`compute_logup_batched_term_column`) term-column builders. Verified via all 71 stark bus soundness tests and 265 prover lib tests.

diegokingston · 2026-04-24T21:22:57Z

/bench 3

github-actions · 2026-04-24T21:24:47Z

Benchmark — fib_iterative_8M (median of 3)

_{Table parallelism: auto (cores / 3)}

Metric	main	PR	Δ
Peak heap	57694 MB	56843 MB	-851 MB (-1.5%) ⚪
Prove time	26.343s	25.760s	-0.583s (-2.2%) ⚪

✅ No significant change.

✅ Low variance (time: 0.8%, heap: 0.9%)

_{Commit: 7047d4d · Baseline: cached · Runner: self-hosted bench}

github-actions · 2026-04-24T21:25:14Z

Codex Code Review

Findings

Medium - Performance/OOM risk: crypto/stark/src/lookup.rs:1489
In default non-parallel builds, process_chunk(0, &mut result) makes chunk_len == trace_len. For dense Column/Sum/Sum3 multiplicities, sparse_active_rows first allocates and fills a Vec<usize> up to the full trace size, then discards it and reruns the dense path. On large traces and many interactions this can be a significant regression or memory spike. Consider bounded sequential chunks too, or early-abort sparse collection once the threshold is exceeded without retaining a full active list.

No security vulnerabilities or correctness bugs found in the PR diff.

I attempted cargo check, but this environment could not complete it: rustup/cargo needed writes under read-only home dirs and then network/git dependencies were unavailable.

claude · 2026-04-24T21:26:25Z

+            let active: Vec<usize> = (0..chunk_len)
+                .filter(|&i| c[chunk_start + i] != zero)
+                .collect();
+            collect_if_sparse(active)


Medium – Wasted allocation on the dense fallback path

The full active: Vec<usize> is built (up to chunk_len elements) before the threshold check. When more than 7/8 of the rows are active (the dense case), the Vec is immediately discarded by collect_if_sparse returning None. This allocation+drop happens on every chunk for every interaction that turns out to be dense — exactly the hot path you're trying to keep fast.

A cheaper approach: count active rows first, bail early, then collect only when it's known to be worthwhile:

Multiplicity::Column(col) => { let c = &main_segment_cols[*col]; let count = (0..chunk_len).filter(|&i| c[chunk_start + i] != zero).count(); if count * SPARSE_DEN > chunk_len * SPARSE_NUM { return None; } let active: Vec<usize> = (0..chunk_len) .filter(|&i| c[chunk_start + i] != zero) .collect(); Some(active) }

(Same pattern for Sum and Sum3.) The double pass over chunk_len elements is cheap compared to a heap allocation + copy, and chunk_len = 1024 fits in L1.

claude · 2026-04-24T21:26:32Z

+        let indices_a: Vec<usize> = match &active_a {
+            Some(v) => v.clone(),
+            None => (0..chunk_len).collect(),
+        };
+        let indices_b: Vec<usize> = match &active_b {
+            Some(v) => v.clone(),
+            None => (0..chunk_len).collect(),
        };


Medium – Unnecessary Vec clone

active_a and active_b are owned Option<Vec<usize>> values that are never used again after this point. Matching on &active_a forces a .clone() in the Some branch. Use unwrap_or_else to consume them directly:

Suggested change

let indices_a: Vec<usize> = match &active_a {

Some(v) => v.clone(),

None => (0..chunk_len).collect(),

};

let indices_b: Vec<usize> = match &active_b {

Some(v) => v.clone(),

None => (0..chunk_len).collect(),

};

let indices_a = active_a.unwrap_or_else(|| (0..chunk_len).collect());

let indices_b = active_b.unwrap_or_else(|| (0..chunk_len).collect());

This avoids a heap allocation + copy for every chunk in the half-sparse case (one sparse, one dense interaction).

claude · 2026-04-24T21:26:39Z

+/// is non-zero — which is an over-approximation of `m != 0` (we don't miss any
+/// non-zero rows, we just might process some m=0 rows due to cancellation).
+///
+/// The threshold check: if more than `SPARSE_ACTIVE_FRAC` of the chunk is active,


Low – Doc refers to a non-existent constant name

The comment says SPARSE_ACTIVE_FRAC but no such constant exists; the actual constants are SPARSE_NUM and SPARSE_DEN defined inside the function body. Update to match:

Suggested change

/// The threshold check: if more than `SPARSE_ACTIVE_FRAC` of the chunk is active,

/// The threshold check: if more than `SPARSE_NUM / SPARSE_DEN` (7/8) of the chunk is active,

claude · 2026-04-24T21:26:52Z

Review of perf/sparse-logup-aux-trace

Overview: Adds a sparse fast-path to the LogUp auxiliary trace builder. When a multiplicity is structurally identifiable as sparse (Column/Sum/Sum3 gated by selectors with <7/8 of rows active), it skips fingerprint computation for zero-multiplicity rows and batch-inverts only the active subset.

Soundness: The over-approximation invariant holds — sparse_active_rows always includes every row where m≠0, so no non-zero contribution is silently dropped. The dense fallback for One/Negated/Diff/Linear is correct.

Correctness: The scatter-then-add pattern in the half-sparse batched path relies on result_chunk starting at zero, which is guaranteed by the vec![FieldElement::<E>::zero(); trace_len] initialization at the call site. This implicit contract is safe as long as result is never reused.

Issues found (inline comments posted):

Severity	Location	Issue
Medium	`sparse_active_rows` (Column/Sum/Sum3 arms)	Full `Vec<usize>` is allocated before the density threshold check — wasted allocation on every dense chunk
Medium	batched sparse path, lines 1808-1815	`v.clone()` on owned `Option<Vec<usize>>` — use `unwrap_or_else` to consume directly
Low	function docstring, line 1459	References constant `SPARSE_ACTIVE_FRAC` which doesn't exist

No security or correctness bugs found. The soundness argument is well-reasoned and the existing test suite (71 bus + 265 prover tests) covers the behavioral contract.

One design note: the half-sparse case (one dense, one sparse interaction) still falls through to the "sparse path" which allocates a full (0..chunk_len) index Vec for the dense interaction. If this case is common, a dedicated (dense, sparse) branch would avoid that allocation entirely — but only worth it if profiling confirms it.

…debug_assertions - `cargo fmt` on sparse LogUp changes - Polynomial import in constraints/evaluator.rs is only used inside `#[cfg(all(debug_assertions, not(feature = "parallel")))]`; match the import cfg to the use site so release builds without `parallel` don't warn. (Pre-existing on origin/main but surfaced by CI lint here.)

jotabulacios · 2026-04-28T18:55:03Z

/bench

diegokingston · 2026-05-20T12:18:26Z

/bench 5

github-actions · 2026-05-20T12:20:24Z

Codex Code Review

Findings

Performance: crypto/stark/src/lookup.rs:1489 and crypto/stark/src/lookup.rs:1807 can allocate Vec<usize> proportional to the full trace length in non-parallel builds, because process_chunk(0, &mut result) uses the entire trace as one chunk. For dense Column/Sum/Sum3 multiplicities, sparse_active_rows first collects nearly all rows before deciding to fall back to dense, and the half-sparse batched path materializes (0..chunk_len).collect() for the dense side. On large STARK traces this is a real memory/time regression and can OOM. Prefer early-aborting collection once the sparse threshold is exceeded, and avoid materializing dense index ranges in the half-sparse path.

No security vulnerabilities or correctness bugs found in the changed logic.

I attempted to run cargo test -p stark --no-default-features lookup --lib, but the environment could not update rustup because /home/runner/.rustup is read-only.

claude · 2026-05-20T12:22:43Z

+        let indices_a: Vec<usize> = match &active_a {
+            Some(v) => v.clone(),
+            None => (0..chunk_len).collect(),
+        };
+        let indices_b: Vec<usize> = match &active_b {
+            Some(v) => v.clone(),
+            None => (0..chunk_len).collect(),
        };


Low: unnecessary clone + silent perf regression in the half-sparse case

active_a/active_b are not used after these bindings, so the .clone() is avoidable. More importantly, when exactly one side is None (e.g. A is dense, B is sparse), this path allocates a full (0..chunk_len).collect() and then processes the dense interaction with indirect indexing — which is strictly worse than the dedicated dense branch above that uses sequential access. The dense-path early-return only triggers when both are None; the half-sparse case silently regresses.

Suggested fixes:

Consume the Option to avoid the clone: active_a.unwrap_or_else(|| (0..chunk_len).collect()).

Add dedicated half-sparse branches (active_a.is_some() && active_b.is_none() and vice versa) that only scatter the sparse side, similar to how compute_logup_term_column handles its sparse path. The dense side can use the same sequential loop as the dense-path branch.

claude · 2026-05-20T12:22:54Z

Review: perf/sparse-logup-aux-trace

The optimization is sound. The over-approximation argument in the PR description is correct: marking a row active when any contributing column is non-zero guarantees every row with non-zero multiplicity is processed; field cancellation (e.g. col_a + col_b = 0 with both non-zero) just causes harmless extra work (0 * fp_inv = 0). The empty-fingerprint edge case (all rows inactive) is handled correctly by inplace_batch_inverse_sequential's early return. The evaluator.rs import gate change is correct — Polynomial is used only in the debug_assertions block.

Low: half-sparse inefficiency in compute_logup_batched_term_column — see inline comment. When exactly one interaction is dense (None) and the other is sparse (Some), the current code falls to the sparse path and allocates a full (0..chunk_len) index vec for the dense side, then processes it with indirect indexing. This is both an extra allocation and slightly slower than sequential access. The single-interaction function (compute_logup_term_column) doesn't have this issue since it only has one arm.

No correctness or security issues found.

claude Bot reviewed Apr 24, 2026

View reviewed changes

jotabulacios and others added 4 commits April 29, 2026 10:50

Merge branch 'main' into perf/sparse-logup-aux-trace

50460ad

Merge branch 'main' into perf/sparse-logup-aux-trace

d57e5fd

Merge branch 'main' into perf/sparse-logup-aux-trace

b6b9e72

Merge branch 'main' into perf/sparse-logup-aux-trace

b2ad2db

diegokingston marked this pull request as ready for review May 20, 2026 12:18

claude Bot reviewed May 20, 2026

View reviewed changes

diegokingston added 2 commits May 21, 2026 17:52

Merge branch 'main' into perf/sparse-logup-aux-trace

89d9e92

Merge branch 'main' into perf/sparse-logup-aux-trace

649c37b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: sparse gather for LogUp aux trace when multiplicity is selector…#565

perf: sparse gather for LogUp aux trace when multiplicity is selector…#565
diegokingston wants to merge 8 commits into
mainfrom
perf/sparse-logup-aux-trace

diegokingston commented Apr 24, 2026

Uh oh!

diegokingston commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026

Uh oh!

jotabulacios commented Apr 28, 2026

Uh oh!

diegokingston commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

claude Bot May 20, 2026

Uh oh!

claude Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/// The threshold check: if more than `SPARSE_ACTIVE_FRAC` of the chunk is active,
	/// The threshold check: if more than `SPARSE_NUM / SPARSE_DEN` (7/8) of the chunk is active,

Conversation

diegokingston commented Apr 24, 2026

Uh oh!

diegokingston commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark — fib_iterative_8M (median of 3)

Uh oh!

github-actions Bot commented Apr 24, 2026

Codex Code Review

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Apr 24, 2026

Review of perf/sparse-logup-aux-trace

Uh oh!

jotabulacios commented Apr 28, 2026

Uh oh!

diegokingston commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Codex Code Review

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 20, 2026

Review: perf/sparse-logup-aux-trace

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 24, 2026 •

edited

Loading