Skip to content

perf: deduplicate Domain and LdeTwiddles across tables in multi_prove#551

Merged
MauroToscano merged 3 commits into
mainfrom
domain-twiddles-dedup
Apr 24, 2026
Merged

perf: deduplicate Domain and LdeTwiddles across tables in multi_prove#551
MauroToscano merged 3 commits into
mainfrom
domain-twiddles-dedup

Conversation

@nicole-graus
Copy link
Copy Markdown
Collaborator

Caches Domain and LdeTwiddles behind an Arc in multi_prove, keyed by (trace_length, blowup_factor, coset_offset), so tables sharing the same domain parameters avoid redundant ~56 MB allocations each. Reduces peak heap by ~2–4 GB on large workloads.

Optimization extracted from PR #518.

@nicole-graus
Copy link
Copy Markdown
Collaborator Author

/bench 10

@github-actions
Copy link
Copy Markdown

Codex Code Review

No findings in the PR diff.

The change is a straightforward dedup of Domain/LdeTwiddles behind Arc, and the cache key matches the values those constructors actually depend on (trace_length, blowup_factor, coset_offset). I don’t see a security issue, correctness regression, or significant performance problem in the modified code.

I could not run cargo check here because the environment’s Rust toolchain setup fails trying to write under /home/runner/.rustup/tmp (read-only filesystem).

Comment thread crypto/stark/src/prover.rs
@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Review: perf: deduplicate Domain and LdeTwiddles across tables in multi_prove

Overall: LGTM — clean, well-motivated optimization.

What the PR does

Wraps Domain and LdeTwiddles in Arc and deduplicates them by (trace_length, blowup_factor, coset_offset) in multi_prove. Tables sharing domain parameters share the same allocations instead of each creating their own ~56 MB copy. Claimed savings of 2–4 GB peak heap on large workloads.

Correctness

The cache key is complete and correct:

  • new_domain is purely a function of (trace_length, blowup_factor, coset_offset) — verified by reading domain.rs:83
  • LdeTwiddles::new(&d) depends only on the domain, so twiddle deduplication is also sound
  • blowup_factor as usize from u8 is lossless; coset_offset is already u64 in ProofOptions
  • All downstream call sites (prove_rounds_2_to_4, reconstruct_round1, validate_trace) take &Domain<Field> — Rust's deref coercion handles &Arc<Domain<Field>>&Domain<Field> transparently

Issues found

Low — domain_cache outlives its usefulness (inline comment left on line 1549)
After the loop the HashMap is no longer needed but stays alive (with one extra Arc ref per unique key) until the end of multi_prove. Doesn't affect correctness or actual memory usage, but an explicit drop(domain_cache) after the loop would clarify intent.

Nit — local type alias
type DomainEntry<F> = ... inside a function body is valid but unusual. Since it's only used in one spot, an inline tuple would be marginally cleaner. Minor.

No security issues, no bugs, no race conditions.

@github-actions
Copy link
Copy Markdown

Benchmark — fib_iterative_8M (median of 10)

Table parallelism: 32 (auto = cores / 3)

Metric main PR Δ
Peak heap 64397 MB 62050 MB -2347 MB (-3.6%) ⚪
Prove time 31.997s 31.275s -0.722s (-2.3%) ⚪

✅ No significant change.

⚠️ Heap spread: 5.6% (62050 MB / 62211 MB / 61090 MB / 61793 MB / 61216 MB / 63891 MB / 64584 MB / 62630 MB / 64191 MB / 61763 MB)
⚠️ Baseline heap spread: 8.9% (65425 MB / 67907 MB / 65716 MB / 64397 MB / 64849 MB / 64151 MB / 64163 MB / 63601 MB / 66208 MB / 62178 MB) — comparison may be less reliable
Consider re-running /bench

Commit: 045b562 · Baseline: built from main · Runner: self-hosted bench

@nicole-graus nicole-graus marked this pull request as ready for review April 24, 2026 15:11
@github-actions
Copy link
Copy Markdown

Codex Code Review

No findings in the PR diff.

I checked the new domain/twiddle cache in crypto/stark/src/prover.rs and the added coverage in crypto/stark/src/tests/prover_tests.rs. The cache key matches the inputs used by Domain::new and LdeTwiddles::new, and moving those values behind Arc does not introduce an obvious correctness, safety, or meaningful performance regression.

Testing gap: I could not run the targeted Rust tests in this environment because cargo/rustup attempted toolchain and dependency writes under read-only locations and network-restricted paths.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: perf: deduplicate Domain and LdeTwiddles across tables in multi_prove

Overall: Clean, well-motivated optimization. The Arc-based sharing is safe (both types are immutable), the explicit drop(domain_cache) before proving rounds is a nice touch, and the tests cover both the correctness (mixed coset offsets) and the deduplication behavior. No security or correctness issues found.


Medium — Cache key is silently coupled to new_domain's implementation

The key (trace_length, blowup_factor, coset_offset) is correct today but creates an invisible contract: if new_domain is ever extended to read additional fields from air.options() (e.g., any new precomputation that depends on grinding_factor), the cache will silently return wrong domains with no type error and no test failure (unless the new field is also varied in a test).

Consider a minimal guard: a short comment stating which fields new_domain reads, so a future diff to new_domain is a visible prompt to update the key:

// Key must mirror every air.options() field read by new_domain.
// Currently: blowup_factor, coset_offset (trace_length is the third axis).
let key = (trace_length, blowup, coset_offset);

Low — Two hash lookups in test-only stats path

let was_hit = domain_cache.contains_key(&key);  // lookup #1
let (domain, twiddles) = domain_cache
    .entry(key)                                  // lookup #2
    .or_insert_with(|| { ... })
    .clone();

Since this is #[cfg(test)] only there is no performance concern, but it can be unified in one lookup by matching on Entry::Occupied vs Entry::Vacant to record the hit/miss and insert in the same step.


Note — domain_cache_stats and thread reuse

The thread_local! counters are per-thread, which is correct for Rust's default test runner (each test gets its own thread). The reset() call at the top of the dedup test is a sufficient guard today. If a custom test runner ever reuses threads across tests, stats from a prior multi_prove call could bleed through before reset(). Not a real issue today, just worth knowing.


Summary: Approve with the suggestion to document the cache key coupling. The optimization is correct and the tests are solid.

Comment thread crypto/stark/src/prover.rs
Comment thread crypto/stark/src/prover.rs
@MauroToscano MauroToscano added this pull request to the merge queue Apr 24, 2026
Merged via the queue into main with commit 743426c Apr 24, 2026
13 checks passed
@MauroToscano MauroToscano deleted the domain-twiddles-dedup branch April 24, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants