Skip to content

Refactor/fiat shamir statement streamlined#617

Merged
diegokingston merged 5 commits into
mainfrom
refactor/fiat-shamir-statement-streamlined
May 26, 2026
Merged

Refactor/fiat shamir statement streamlined#617
diegokingston merged 5 commits into
mainfrom
refactor/fiat-shamir-statement-streamlined

Conversation

@MauroToscano
Copy link
Copy Markdown
Contributor

@MauroToscano MauroToscano commented May 22, 2026

This is a follow up of Diego's #610

Notice while Fiat Shamir is better handled here, the single prover re feeds some of the meta data to the transcript, which is not a security problem but it's not really needed

Summary

Builds on #610. Same soundness commitment, smaller surface area. Five simplifications: drop the redundant outer keccak (and the tiny-keccak prod dep), drop the intermediate Vec<u8> encoder, drop FORMAT_VERSION (folded into the domain tag), drop the redundant ELF length prefix, and replace the opaque transcript_seed: &[u8] helper parameter with &mut DefaultTranscript<E>. The statement is now absorbed straight into the transcript by one function (absorb_statement) and the verify path absorbs once + clone()s instead of re-seeding twice.

What's preserved

All soundness properties from #610:

  • All five statement fields bound (ELF, public output, table counts, private-input page count, runtime page ranges)
  • Domain-separation tag (LAMBDAVM_STARK_STATEMENT_V1)
  • Length prefix on public_output (boundary collision resistance with the next field)
  • Length prefix on runtime_page_ranges count
  • ELF pre-digested to 32 bytes (recursion-friendly — a recursive verifier doesn't have to absorb MBs)
  • Three call sites agree (prove / verify / bus-balance replay)
  • Breaking proof-format change: pre-refactor(prover): bind the proven statement into the Fiat-Shamir transcript #610 proofs don't verify (intentional)

Simplifications

1. Drop the outer keccak, drop tiny-keccak

DefaultTranscript is itself a Keccak256 absorber. The original PR computed keccak256(encode_statement(...)) with tiny-keccak externally to get a 32-byte seed, then handed it to DefaultTranscript::new(&seed) — which hashed it again with its own sha3::Keccak256. Two crates, two keccak passes, no benefit.

This PR streams the canonical bytes directly into the transcript via append_bytes. The ELF digest uses sha3::Keccak256 (already a transitive prod dep via crypto). tiny-keccak returns to being a dev-only dep (still used in the in-VM keccak precompile cross-check).

2. Drop the intermediate Vec<u8>

encode_statement() -> Vec<u8> allocated a heap buffer on every prove / verify just to be hashed once. The streaming absorb_statement(&mut transcript, ...) writes the same bytes directly into the transcript hasher — no allocation.

3. Drop FORMAT_VERSION

Versioning lived in two places: DOMAIN_TAG = "..._V1" and FORMAT_VERSION: u32 = 1. Two sources of truth that have to be bumped together. Kept the tag suffix as the single version source — bump it (_V2, ...) on any encoding change.

4. Drop the ELF length prefix

elf_bytes.len().to_le_bytes() was followed immediately by keccak256(elf_bytes) — a fixed-width 32-byte digest. Length prefixes prevent boundary collisions between variable-length raw fields; a fixed-width digest can't shift into its neighbors. The prefix was dead weight.

5. Helper signatures: transcript_seed: &[u8]&mut DefaultTranscript<E> replay_transcript_phase_a and compute_expected_commit_bus_balance previously took an opaque seed byte-slice and constructed the transcript internally. Internal tests bypassed binding by passing &[] — a magic sentinel.

Now the caller builds and seeds the transcript; helpers operate on it directly. Tests that want an unbound transcript just hand over a fresh empty DefaultTranscript::<E>::new(&[]). No sentinel.

Also folded a pre-existing duplication: both branches of if air.is_preprocessed() in replay_transcript_phase_a did transcript.append_bytes(&proof.lde_trace_main_merkle_root). Collapsed to one append after the conditional.

Verify path: absorb once + clone(), not seed twice

The original PR derived the seed once and passed it to two DefaultTranscript::new(...) calls (one for compute_expected_commit_bus_balance, one
for multi_verify). Each new() re-absorbed the seed bytes through the hasher.

This PR absorbs the statement once into one transcript, then clone()s the post-absorb hasher state to fork into the replay vs. verify transcripts.

Single source of truth, no risk of one site's field list drifting from the other's.

Out of scope (follow-ups)

  • Phase A is still replayed twice on the verify path. compute_expected_commit_bus_balance runs Phase A once to recover (z, α), then multi_verify runs it again. This is pre-existing — not introduced or worsened by the statement binding. Removing it requires changing multi_verify to take a closure that produces the bus balance offset given (z, α). Separate scope.
  • VmProof version discriminant. Old proofs currently reject by "challenges happen to differ." A version field on the proof bundle would let them reject at deserialize time. Worth a follow-up.

diegokingston and others added 3 commits May 22, 2026 11:42
…script

The VM prover and verifier built their transcript with `new(&[])` — the
statement (program, public output, table layout) was never absorbed, so
Fiat-Shamir challenges did not depend on it. Soundness was preserved only
by out-of-band reconstruction checks in the verifier.

Add a canonical, versioned, length-prefixed encoder (`prover/src/statement.rs`)
that digests the full statement — ELF hash, public output, table counts, and
page layout — into a 32-byte transcript seed. Seed the transcript with it on
the prove path, the verify path, and the bus-balance replay
(`replay_transcript_phase_a`), so all three derive identical challenges for an
honest proof and diverge for any tampered statement.

Closes the frozen-heart class of issue as defense-in-depth and is a
prerequisite for recursion. Breaking proof-format change (challenges differ).

`tiny-keccak` is promoted from a dev-dependency to a dependency for the
statement digest.
Apply the simplifications to the statement-binding work:

- Drop tiny-keccak as a prod dep; the DefaultTranscript is already a
  Keccak256 absorber, so the outer keccak256 in statement_seed() was
  redundant. ELF digest now uses sha3::Keccak256 (already a transitive
  prod dep via crypto). tiny-keccak stays as a dev-dep for the in-VM
  keccak precompile cross-check.

- Replace encode_statement -> Vec<u8> + statement_seed -> [u8; 32] with
  a single absorb_statement(&mut transcript, ...) that streams bytes
  directly into the transcript. No intermediate allocation.

- Drop FORMAT_VERSION constant — the version is in DOMAIN_TAG suffix
  (LAMBDAVM_STARK_STATEMENT_V1). Bump the suffix on any encoding change.

- Drop the redundant 8-byte length prefix on the elf bytes: we hash to
  fixed 32-byte digest, so the prefix doesn't prevent any collision.

- replay_transcript_phase_a / compute_expected_commit_bus_balance now
  take &mut DefaultTranscript<E> instead of an opaque transcript_seed:
  &[u8]. Callers decide what's in the transcript; internal tests just
  hand a fresh empty one (no &[] magic sentinel).

- Fold the duplicated lde_trace_main_merkle_root append branches in
  replay_transcript_phase_a.

- verify_with_options absorbs the statement ONCE, then clones the
  transcript so multi_verify and the bus-balance replay both start
  from the same post-absorb state — no closure, no field-list drift.

- statement_tests.rs reframed around transcript.state(). 3 tests
  (state_is_deterministic, state_depends_on_every_field,
  public_output_length_prefix_prevents_collision) cover the same
  surface as the 8 byte-level encoder tests they replace, hitting the
  real absorb path. Drop the two-ELF e2e test that needed RISC-V
  artifacts and couldn't run in author env.

- 6 internal test sites (decode_tests.rs, prove_elfs_tests.rs): pass
  a fresh DefaultTranscript instead of &[].

Soundness commitment unchanged: same five statement fields bound, same
domain separation, same length prefixes where needed, same cross-site
agreement (now explicitly tested via deterministic absorb).

Follow-up: Phase A is still replayed twice on the verify path (once in
compute_expected_commit_bus_balance, once in multi_verify). That's
pre-existing, not from this PR; removing it would change multi_verify's
public API to take a closure that produces the bus balance offset given
(z, alpha). Separate scope.
@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I reviewed the statement absorption changes, transcript replay path, prover/verifier transcript initialization, and updated tests. The changes appear consistent: prover, verifier, and commit bus replay all start from the same statement-bound transcript state, and tampered statement fields should cause verification failure.

Verification note: I attempted cargo test -p lambda-vm-prover statement_tests --no-default-features and cargo check -p lambda-vm-prover --no-default-features, but both were blocked by the environment because rustup cannot write to /home/runner/.rustup/tmp on this filesystem. git diff --check passed.

Comment thread prover/src/statement.rs
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Review: Fiat-Shamir statement binding

The security approach is sound. Binding the full statement into the transcript before Phase A prevents malleability attacks where a tampered ELF, public output, or table layout would otherwise produce valid-looking challenges.

Logic correctness:

  • The refactor of the phase-A replay is semantically equivalent to the original (the merkle root was always appended in both branches).
  • The transcript clone for the bus-balance replay is the correct pattern — both paths start from the same post-statement state.
  • Tests using the low-level prover directly are consistent (both prover and replay use empty transcripts at that level), and the security property is exercised end-to-end by the higher-level prove/verify tests.

One issue found:

Medium (soundness): The hardcoded TableCounts field array in absorb_statement silently excludes any field added to the struct in the future — see inline comment.

MauroToscano and others added 2 commits May 26, 2026 12:24
If a field is added to TableCounts or RuntimePageRange and not bound into
the transcript, statement binding is silently broken for that field — a
proof generated for one configuration could pass verification against a
different one.

The previous `[ table_counts.cpu, ... ]` array literal compiled fine even
when a field was missing; the new field was simply skipped. Same for
direct `.base` / `.count` access on RuntimePageRange.

Replace both with exhaustive `&Struct { ... }` patterns. Adding a field
now produces:

    error[E0027]: pattern does not mention field `<name>`
      --> prover/src/statement.rs:..

The compile error fires at the encoding site itself — on every build,
not just on test runs — pointing the dev directly at the code that
needs updating. No separate test needed; the safety is in the encoding
function.

Encoded transcript bytes are unchanged: same field order, same widths.
Pure source-level guard.
@diegokingston diegokingston added this pull request to the merge queue May 26, 2026
Merged via the queue into main with commit 78d7b99 May 26, 2026
12 checks passed
@diegokingston diegokingston deleted the refactor/fiat-shamir-statement-streamlined branch May 26, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants