Refactor/fiat shamir statement streamlined by MauroToscano · Pull Request #617 · yetanotherco/lambda_vm

MauroToscano · 2026-05-22T20:30:55Z

This is a follow up of Diego's #610

Notice while Fiat Shamir is better handled here, the single prover re feeds some of the meta data to the transcript, which is not a security problem but it's not really needed

Summary

Builds on #610. Same soundness commitment, smaller surface area. Five simplifications: drop the redundant outer keccak (and the tiny-keccak prod dep), drop the intermediate Vec<u8> encoder, drop FORMAT_VERSION (folded into the domain tag), drop the redundant ELF length prefix, and replace the opaque transcript_seed: &[u8] helper parameter with &mut DefaultTranscript<E>. The statement is now absorbed straight into the transcript by one function (absorb_statement) and the verify path absorbs once + clone()s instead of re-seeding twice.

What's preserved

All soundness properties from #610:

All five statement fields bound (ELF, public output, table counts, private-input page count, runtime page ranges)
Domain-separation tag (LAMBDAVM_STARK_STATEMENT_V1)
Length prefix on public_output (boundary collision resistance with the next field)
Length prefix on runtime_page_ranges count
ELF pre-digested to 32 bytes (recursion-friendly — a recursive verifier doesn't have to absorb MBs)
Three call sites agree (prove / verify / bus-balance replay)
Breaking proof-format change: pre-refactor(prover): bind the proven statement into the Fiat-Shamir transcript #610 proofs don't verify (intentional)

Simplifications

1. Drop the outer keccak, drop `tiny-keccak`

DefaultTranscript is itself a Keccak256 absorber. The original PR computed keccak256(encode_statement(...)) with tiny-keccak externally to get a 32-byte seed, then handed it to DefaultTranscript::new(&seed) — which hashed it again with its own sha3::Keccak256. Two crates, two keccak passes, no benefit.

This PR streams the canonical bytes directly into the transcript via append_bytes. The ELF digest uses sha3::Keccak256 (already a transitive prod dep via crypto). tiny-keccak returns to being a dev-only dep (still used in the in-VM keccak precompile cross-check).

2. Drop the intermediate `Vec<u8>`

encode_statement() -> Vec<u8> allocated a heap buffer on every prove / verify just to be hashed once. The streaming absorb_statement(&mut transcript, ...) writes the same bytes directly into the transcript hasher — no allocation.

3. Drop `FORMAT_VERSION`

Versioning lived in two places: DOMAIN_TAG = "..._V1" and FORMAT_VERSION: u32 = 1. Two sources of truth that have to be bumped together. Kept the tag suffix as the single version source — bump it (_V2, ...) on any encoding change.

4. Drop the ELF length prefix

elf_bytes.len().to_le_bytes() was followed immediately by keccak256(elf_bytes) — a fixed-width 32-byte digest. Length prefixes prevent boundary collisions between variable-length raw fields; a fixed-width digest can't shift into its neighbors. The prefix was dead weight.

5. Helper signatures: `transcript_seed: &[u8]` → `&mut DefaultTranscript<E>` `replay_transcript_phase_a` and `compute_expected_commit_bus_balance` previously took an opaque seed byte-slice and constructed the transcript internally. Internal tests bypassed binding by passing `&[]` — a magic sentinel.

Now the caller builds and seeds the transcript; helpers operate on it directly. Tests that want an unbound transcript just hand over a fresh empty DefaultTranscript::<E>::new(&[]). No sentinel.

Also folded a pre-existing duplication: both branches of if air.is_preprocessed() in replay_transcript_phase_a did transcript.append_bytes(&proof.lde_trace_main_merkle_root). Collapsed to one append after the conditional.

Verify path: absorb once + `clone()`, not seed twice

The original PR derived the seed once and passed it to two DefaultTranscript::new(...) calls (one for compute_expected_commit_bus_balance, one
for multi_verify). Each new() re-absorbed the seed bytes through the hasher.

This PR absorbs the statement once into one transcript, then clone()s the post-absorb hasher state to fork into the replay vs. verify transcripts.

Single source of truth, no risk of one site's field list drifting from the other's.

Out of scope (follow-ups)

Phase A is still replayed twice on the verify path. compute_expected_commit_bus_balance runs Phase A once to recover (z, α), then multi_verify runs it again. This is pre-existing — not introduced or worsened by the statement binding. Removing it requires changing multi_verify to take a closure that produces the bus balance offset given (z, α). Separate scope.
VmProof version discriminant. Old proofs currently reject by "challenges happen to differ." A version field on the proof bundle would let them reject at deserialize time. Worth a follow-up.

…script The VM prover and verifier built their transcript with `new(&[])` — the statement (program, public output, table layout) was never absorbed, so Fiat-Shamir challenges did not depend on it. Soundness was preserved only by out-of-band reconstruction checks in the verifier. Add a canonical, versioned, length-prefixed encoder (`prover/src/statement.rs`) that digests the full statement — ELF hash, public output, table counts, and page layout — into a 32-byte transcript seed. Seed the transcript with it on the prove path, the verify path, and the bus-balance replay (`replay_transcript_phase_a`), so all three derive identical challenges for an honest proof and diverge for any tampered statement. Closes the frozen-heart class of issue as defense-in-depth and is a prerequisite for recursion. Breaking proof-format change (challenges differ). `tiny-keccak` is promoted from a dev-dependency to a dependency for the statement digest.

Apply the simplifications to the statement-binding work: - Drop tiny-keccak as a prod dep; the DefaultTranscript is already a Keccak256 absorber, so the outer keccak256 in statement_seed() was redundant. ELF digest now uses sha3::Keccak256 (already a transitive prod dep via crypto). tiny-keccak stays as a dev-dep for the in-VM keccak precompile cross-check. - Replace encode_statement -> Vec<u8> + statement_seed -> [u8; 32] with a single absorb_statement(&mut transcript, ...) that streams bytes directly into the transcript. No intermediate allocation. - Drop FORMAT_VERSION constant — the version is in DOMAIN_TAG suffix (LAMBDAVM_STARK_STATEMENT_V1). Bump the suffix on any encoding change. - Drop the redundant 8-byte length prefix on the elf bytes: we hash to fixed 32-byte digest, so the prefix doesn't prevent any collision. - replay_transcript_phase_a / compute_expected_commit_bus_balance now take &mut DefaultTranscript<E> instead of an opaque transcript_seed: &[u8]. Callers decide what's in the transcript; internal tests just hand a fresh empty one (no &[] magic sentinel). - Fold the duplicated lde_trace_main_merkle_root append branches in replay_transcript_phase_a. - verify_with_options absorbs the statement ONCE, then clones the transcript so multi_verify and the bus-balance replay both start from the same post-absorb state — no closure, no field-list drift. - statement_tests.rs reframed around transcript.state(). 3 tests (state_is_deterministic, state_depends_on_every_field, public_output_length_prefix_prevents_collision) cover the same surface as the 8 byte-level encoder tests they replace, hitting the real absorb path. Drop the two-ELF e2e test that needed RISC-V artifacts and couldn't run in author env. - 6 internal test sites (decode_tests.rs, prove_elfs_tests.rs): pass a fresh DefaultTranscript instead of &[]. Soundness commitment unchanged: same five statement fields bound, same domain separation, same length prefixes where needed, same cross-site agreement (now explicitly tested via deterministic absorb). Follow-up: Phase A is still replayed twice on the verify path (once in compute_expected_commit_bus_balance, once in multi_verify). That's pre-existing, not from this PR; removing it would change multi_verify's public API to take a closure that produces the bus balance offset given (z, alpha). Separate scope.

github-actions · 2026-05-22T20:32:34Z

Codex Code Review

No issues found in the PR diff.

I reviewed the statement absorption changes, transcript replay path, prover/verifier transcript initialization, and updated tests. The changes appear consistent: prover, verifier, and commit bus replay all start from the same statement-bound transcript state, and tampered statement fields should cause verification failure.

Verification note: I attempted cargo test -p lambda-vm-prover statement_tests --no-default-features and cargo check -p lambda-vm-prover --no-default-features, but both were blocked by the environment because rustup cannot write to /home/runner/.rustup/tmp on this filesystem. git diff --check passed.

claude · 2026-05-22T20:36:21Z

Review: Fiat-Shamir statement binding

The security approach is sound. Binding the full statement into the transcript before Phase A prevents malleability attacks where a tampered ELF, public output, or table layout would otherwise produce valid-looking challenges.

Logic correctness:

The refactor of the phase-A replay is semantically equivalent to the original (the merkle root was always appended in both branches).
The transcript clone for the bus-balance replay is the correct pattern — both paths start from the same post-statement state.
Tests using the low-level prover directly are consistent (both prover and replay use empty transcripts at that level), and the security property is exercised end-to-end by the higher-level prove/verify tests.

One issue found:

Medium (soundness): The hardcoded TableCounts field array in absorb_statement silently excludes any field added to the struct in the future — see inline comment.

If a field is added to TableCounts or RuntimePageRange and not bound into the transcript, statement binding is silently broken for that field — a proof generated for one configuration could pass verification against a different one. The previous `[ table_counts.cpu, ... ]` array literal compiled fine even when a field was missing; the new field was simply skipped. Same for direct `.base` / `.count` access on RuntimePageRange. Replace both with exhaustive `&Struct { ... }` patterns. Adding a field now produces: error[E0027]: pattern does not mention field `<name>` --> prover/src/statement.rs:.. The compile error fires at the encoding site itself — on every build, not just on test runs — pointing the dev directly at the code that needs updating. No separate test needed; the safety is in the encoding function. Encoded transcript bytes are unchanged: same field order, same widths. Pure source-level guard.

diegokingston and others added 3 commits May 22, 2026 11:42

Merge branch 'main' into harden/fiat-shamir-statement-binding

28f2776

claude Bot reviewed May 22, 2026

View reviewed changes

Comment thread prover/src/statement.rs

MauroToscano mentioned this pull request May 26, 2026

refactor(prover): bind the proven statement into the Fiat-Shamir transcript #610

Closed

3 tasks

MauroToscano and others added 2 commits May 26, 2026 12:24

Merge branch 'main' into refactor/fiat-shamir-statement-streamlined

3347cce

diegokingston approved these changes May 26, 2026

View reviewed changes

diegokingston added this pull request to the merge queue May 26, 2026

Merged via the queue into main with commit 78d7b99 May 26, 2026
12 checks passed

diegokingston deleted the refactor/fiat-shamir-statement-streamlined branch May 26, 2026 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/fiat shamir statement streamlined#617

Refactor/fiat shamir statement streamlined#617
diegokingston merged 5 commits into
mainfrom
refactor/fiat-shamir-statement-streamlined

MauroToscano commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Uh oh!

claude Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MauroToscano commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's preserved

Simplifications

1. Drop the outer keccak, drop tiny-keccak

2. Drop the intermediate Vec<u8>

3. Drop FORMAT_VERSION

4. Drop the ELF length prefix

5. Helper signatures: transcript_seed: &[u8] → &mut DefaultTranscript<E> replay_transcript_phase_a and compute_expected_commit_bus_balance previously took an opaque seed byte-slice and constructed the transcript internally. Internal tests bypassed binding by passing &[] — a magic sentinel.

Verify path: absorb once + clone(), not seed twice

Out of scope (follow-ups)

Uh oh!

github-actions Bot commented May 22, 2026

Codex Code Review

Uh oh!

Uh oh!

claude Bot commented May 22, 2026

Review: Fiat-Shamir statement binding

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MauroToscano commented May 22, 2026 •

edited

Loading

1. Drop the outer keccak, drop `tiny-keccak`

2. Drop the intermediate `Vec<u8>`

3. Drop `FORMAT_VERSION`

Verify path: absorb once + `clone()`, not seed twice