Skip to content

perf: monomorphic dispatch for LogUp constraints#593

Open
MauroToscano wants to merge 2 commits into
mainfrom
opt/24-enum-constraint-dispatch
Open

perf: monomorphic dispatch for LogUp constraints#593
MauroToscano wants to merge 2 commits into
mainfrom
opt/24-enum-constraint-dispatch

Conversation

@MauroToscano
Copy link
Copy Markdown
Contributor

Summary

  • Store concrete copies of LookupBatchedTermConstraint and LookupAccumulatedConstraint on AirWithBuses alongside the boxed dyn TransitionConstraintEvaluator versions
  • Override compute_transition_prover to dispatch LogUp constraints via concrete types, enabling the compiler to inline fingerprint computation
  • Base (domain) constraints keep dyn dispatch — they're heterogeneous user-defined types

Benchmark results

fib_iterative_4M, PARALLEL_TABLES=1, 5 samples:

Metric Before After Delta
Wall clock (median) 38.6s 35.3s -8.5%
CV 2.3% 0.8%
R2 evaluate (instruments) 13.07s 11.58s -11.4%
R2-4 total (instruments) 16.42s 14.64s -10.8%
Heap 26,498 MB 26,498 MB unchanged

Verification: baseline verifier accepts the proof.

Why it works

The prover's constraint evaluation hot loop calls evaluate_prover() through a dyn TransitionConstraintEvaluator vtable for every constraint at every LDE domain point. For LogUp constraints (which do expensive extension-field fingerprint computation), this prevents the compiler from inlining across the call boundary. Storing concrete types and calling them directly lets LLVM inline the fingerprint logic and optimize across constraint boundaries.

Test plan

  • cargo test --release -p stark (124/124 pass)
  • cargo test --release -p lambda-vm-prover (347/347 pass, 3 pre-existing failures unrelated)
  • Proof verified by baseline verifier binary
  • /bench on CI runner

Store concrete copies of LookupBatchedTermConstraint and
LookupAccumulatedConstraint alongside the boxed versions.
Override compute_transition_prover on AirWithBuses to call
the concrete types directly, enabling the compiler to inline
fingerprint computation and eliminate vtable overhead for the
expensive extension-field constraints.

Base (domain) constraints still use dyn dispatch since they are
heterogeneous user-defined types.
@github-actions
Copy link
Copy Markdown

Codex Code Review

No actionable issues found in the PR diff.

I reviewed the added monomorphic LogUp prover dispatch and the cloned direct constraint construction in crypto/stark/src/lookup.rs. I did not find concrete security vulnerabilities, correctness bugs, or significant performance regressions.

Verification note: I attempted cargo check -p stark, but the environment fails before compilation because rustup cannot create temp files under /home/runner/.rustup/tmp due to a read-only filesystem.

Comment on lines +1056 to +1064
// Phase 2: LogUp term constraints — monomorphic dispatch (concrete type)
for c in &self.logup_term_constraints_direct {
c.evaluate_verifier(evaluation_context, ext_evals);
}

// Phase 3: LogUp accumulated constraint — monomorphic dispatch (concrete type)
if let Some(c) = &self.logup_accumulated_direct {
c.evaluate_verifier(evaluation_context, ext_evals);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — evaluate_verifier called where evaluate_prover is expected

This calls evaluate_verifier directly on the concrete types, which works today only because the default evaluate_prover implementation for extension constraints is just a thin wrapper that delegates to evaluate_verifier. But this is fragile:

  1. It's semantically inconsistent — compute_transition_prover calls evaluate_verifier on Phase 2/3 while Phase 1 correctly calls evaluate_prover.
  2. If a future contributor adds a real prover-specific fast path by overriding evaluate_prover on either concrete type (e.g., skipping some verifier-only checks), it will be silently bypassed here.

The static/monomorphic dispatch benefit is preserved either way since these are concrete types. Prefer:

Suggested change
// Phase 2: LogUp term constraints — monomorphic dispatch (concrete type)
for c in &self.logup_term_constraints_direct {
c.evaluate_verifier(evaluation_context, ext_evals);
}
// Phase 3: LogUp accumulated constraint — monomorphic dispatch (concrete type)
if let Some(c) = &self.logup_accumulated_direct {
c.evaluate_verifier(evaluation_context, ext_evals);
}
// Phase 2: LogUp term constraints — monomorphic dispatch (concrete type)
for c in &self.logup_term_constraints_direct {
c.evaluate_prover(evaluation_context, base_evals, ext_evals);
}
// Phase 3: LogUp accumulated constraint — monomorphic dispatch (concrete type)
if let Some(c) = &self.logup_accumulated_direct {
c.evaluate_prover(evaluation_context, base_evals, ext_evals);
}

The compiler will inline evaluate_prover → evaluate_verifier in one pass, so there's no performance difference, but the call chain is now honest.

@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Review: perf: monomorphic dispatch for LogUp constraints

The optimization is sound and well-documented. The 8.5% wall-clock improvement is credible for this kind of devirtualization. One semantic issue worth fixing, one low-level maintenance note.

Medium — evaluate_verifier called instead of evaluate_prover (lines 1057–1064)

See inline comment. Phases 2 and 3 call c.evaluate_verifier() directly on the concrete types while Phase 1 correctly calls evaluate_prover(). It works today because the default evaluate_prover for extension constraints is a one-liner that delegates to evaluate_verifier, but it's a semantic landmine: any future evaluate_prover override on LookupBatchedTermConstraint or LookupAccumulatedConstraint would be silently skipped. Switching to c.evaluate_prover(evaluation_context, base_evals, ext_evals) costs nothing — the compiler inlines the extra hop away — and keeps the call chain honest.

Low — no invariant guard on the duplicated fields

logup_term_constraints_direct and logup_accumulated_direct are copies of entries already in transition_constraints. There's nothing preventing them from diverging if the struct is ever mutated after construction (not possible via the current public API, but not enforced). A debug_assert! at the top of compute_transition_prover would catch this cheaply:

debug_assert_eq!(
    self.logup_term_constraints_direct.len()
        + self.logup_accumulated_direct.is_some() as usize,
    self.transition_constraints.len() - self.num_base_constraints,
);

Otherwise the PR is clean — good comments, no unsafe code, #[derive(Clone)] additions are minimal, and the three-phase structure is easy to follow.

@diegokingston
Copy link
Copy Markdown
Collaborator

/bench 5

jotabulacios added a commit that referenced this pull request May 20, 2026
@diegokingston
Copy link
Copy Markdown
Collaborator

/bench 5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

Benchmark — fib_iterative_8M (median of 7)

Table parallelism: auto (cores / 3)

Metric main PR Δ
Peak heap 52516 MB 52038 MB -478 MB (-0.9%) ⚪
Prove time 25.536s 25.516s -0.020s (-0.1%) ⚪

✅ No significant change.

✅ Low variance (time: 2.2%, heap: 0.5%)

Commit: 45152db · Baseline: cached · Runner: self-hosted bench

@MauroToscano
Copy link
Copy Markdown
Contributor Author

/bench 7 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants