Refactor/fiat shamir statement streamlined#617
Conversation
…script The VM prover and verifier built their transcript with `new(&[])` — the statement (program, public output, table layout) was never absorbed, so Fiat-Shamir challenges did not depend on it. Soundness was preserved only by out-of-band reconstruction checks in the verifier. Add a canonical, versioned, length-prefixed encoder (`prover/src/statement.rs`) that digests the full statement — ELF hash, public output, table counts, and page layout — into a 32-byte transcript seed. Seed the transcript with it on the prove path, the verify path, and the bus-balance replay (`replay_transcript_phase_a`), so all three derive identical challenges for an honest proof and diverge for any tampered statement. Closes the frozen-heart class of issue as defense-in-depth and is a prerequisite for recursion. Breaking proof-format change (challenges differ). `tiny-keccak` is promoted from a dev-dependency to a dependency for the statement digest.
Apply the simplifications to the statement-binding work: - Drop tiny-keccak as a prod dep; the DefaultTranscript is already a Keccak256 absorber, so the outer keccak256 in statement_seed() was redundant. ELF digest now uses sha3::Keccak256 (already a transitive prod dep via crypto). tiny-keccak stays as a dev-dep for the in-VM keccak precompile cross-check. - Replace encode_statement -> Vec<u8> + statement_seed -> [u8; 32] with a single absorb_statement(&mut transcript, ...) that streams bytes directly into the transcript. No intermediate allocation. - Drop FORMAT_VERSION constant — the version is in DOMAIN_TAG suffix (LAMBDAVM_STARK_STATEMENT_V1). Bump the suffix on any encoding change. - Drop the redundant 8-byte length prefix on the elf bytes: we hash to fixed 32-byte digest, so the prefix doesn't prevent any collision. - replay_transcript_phase_a / compute_expected_commit_bus_balance now take &mut DefaultTranscript<E> instead of an opaque transcript_seed: &[u8]. Callers decide what's in the transcript; internal tests just hand a fresh empty one (no &[] magic sentinel). - Fold the duplicated lde_trace_main_merkle_root append branches in replay_transcript_phase_a. - verify_with_options absorbs the statement ONCE, then clones the transcript so multi_verify and the bus-balance replay both start from the same post-absorb state — no closure, no field-list drift. - statement_tests.rs reframed around transcript.state(). 3 tests (state_is_deterministic, state_depends_on_every_field, public_output_length_prefix_prevents_collision) cover the same surface as the 8 byte-level encoder tests they replace, hitting the real absorb path. Drop the two-ELF e2e test that needed RISC-V artifacts and couldn't run in author env. - 6 internal test sites (decode_tests.rs, prove_elfs_tests.rs): pass a fresh DefaultTranscript instead of &[]. Soundness commitment unchanged: same five statement fields bound, same domain separation, same length prefixes where needed, same cross-site agreement (now explicitly tested via deterministic absorb). Follow-up: Phase A is still replayed twice on the verify path (once in compute_expected_commit_bus_balance, once in multi_verify). That's pre-existing, not from this PR; removing it would change multi_verify's public API to take a closure that produces the bus balance offset given (z, alpha). Separate scope.
Codex Code ReviewNo issues found in the PR diff. I reviewed the statement absorption changes, transcript replay path, prover/verifier transcript initialization, and updated tests. The changes appear consistent: prover, verifier, and commit bus replay all start from the same statement-bound transcript state, and tampered statement fields should cause verification failure. Verification note: I attempted |
Review: Fiat-Shamir statement bindingThe security approach is sound. Binding the full statement into the transcript before Phase A prevents malleability attacks where a tampered ELF, public output, or table layout would otherwise produce valid-looking challenges. Logic correctness:
One issue found: Medium (soundness): The hardcoded |
If a field is added to TableCounts or RuntimePageRange and not bound into
the transcript, statement binding is silently broken for that field — a
proof generated for one configuration could pass verification against a
different one.
The previous `[ table_counts.cpu, ... ]` array literal compiled fine even
when a field was missing; the new field was simply skipped. Same for
direct `.base` / `.count` access on RuntimePageRange.
Replace both with exhaustive `&Struct { ... }` patterns. Adding a field
now produces:
error[E0027]: pattern does not mention field `<name>`
--> prover/src/statement.rs:..
The compile error fires at the encoding site itself — on every build,
not just on test runs — pointing the dev directly at the code that
needs updating. No separate test needed; the safety is in the encoding
function.
Encoded transcript bytes are unchanged: same field order, same widths.
Pure source-level guard.
This is a follow up of Diego's #610
Notice while Fiat Shamir is better handled here, the single prover re feeds some of the meta data to the transcript, which is not a security problem but it's not really needed
Summary
Builds on #610. Same soundness commitment, smaller surface area. Five simplifications: drop the redundant outer keccak (and the
tiny-keccakprod dep), drop the intermediateVec<u8>encoder, dropFORMAT_VERSION(folded into the domain tag), drop the redundant ELF length prefix, and replace the opaquetranscript_seed: &[u8]helper parameter with&mut DefaultTranscript<E>. The statement is now absorbed straight into the transcript by one function (absorb_statement) and the verify path absorbs once +clone()s instead of re-seeding twice.What's preserved
All soundness properties from #610:
LAMBDAVM_STARK_STATEMENT_V1)public_output(boundary collision resistance with the next field)runtime_page_rangescountSimplifications
1. Drop the outer keccak, drop
tiny-keccakDefaultTranscriptis itself aKeccak256absorber. The original PR computedkeccak256(encode_statement(...))withtiny-keccakexternally to get a 32-byte seed, then handed it toDefaultTranscript::new(&seed)— which hashed it again with its ownsha3::Keccak256. Two crates, two keccak passes, no benefit.This PR streams the canonical bytes directly into the transcript via
append_bytes. The ELF digest usessha3::Keccak256(already a transitive prod dep viacrypto).tiny-keccakreturns to being a dev-only dep (still used in the in-VM keccak precompile cross-check).2. Drop the intermediate
Vec<u8>encode_statement() -> Vec<u8>allocated a heap buffer on every prove / verify just to be hashed once. The streamingabsorb_statement(&mut transcript, ...)writes the same bytes directly into the transcript hasher — no allocation.3. Drop
FORMAT_VERSIONVersioning lived in two places:
DOMAIN_TAG = "..._V1"andFORMAT_VERSION: u32 = 1. Two sources of truth that have to be bumped together. Kept the tag suffix as the single version source — bump it (_V2, ...) on any encoding change.4. Drop the ELF length prefix
elf_bytes.len().to_le_bytes()was followed immediately bykeccak256(elf_bytes)— a fixed-width 32-byte digest. Length prefixes prevent boundary collisions between variable-length raw fields; a fixed-width digest can't shift into its neighbors. The prefix was dead weight.5. Helper signatures:
transcript_seed: &[u8]→&mut DefaultTranscript<E>replay_transcript_phase_aandcompute_expected_commit_bus_balancepreviously took an opaque seed byte-slice and constructed the transcript internally. Internal tests bypassed binding by passing&[]— a magic sentinel.Now the caller builds and seeds the transcript; helpers operate on it directly. Tests that want an unbound transcript just hand over a fresh empty
DefaultTranscript::<E>::new(&[]). No sentinel.Also folded a pre-existing duplication: both branches of
if air.is_preprocessed()inreplay_transcript_phase_adidtranscript.append_bytes(&proof.lde_trace_main_merkle_root). Collapsed to one append after the conditional.Verify path: absorb once +
clone(), not seed twiceThe original PR derived the seed once and passed it to two
DefaultTranscript::new(...)calls (one forcompute_expected_commit_bus_balance, onefor
multi_verify). Eachnew()re-absorbed the seed bytes through the hasher.This PR absorbs the statement once into one transcript, then
clone()s the post-absorb hasher state to fork into the replay vs. verify transcripts.Single source of truth, no risk of one site's field list drifting from the other's.
Out of scope (follow-ups)
compute_expected_commit_bus_balanceruns Phase A once to recover(z, α), thenmulti_verifyruns it again. This is pre-existing — not introduced or worsened by the statement binding. Removing it requires changingmulti_verifyto take a closure that produces the bus balance offset given(z, α). Separate scope.VmProofversion discriminant. Old proofs currently reject by "challenges happen to differ." A version field on the proof bundle would let them reject at deserialize time. Worth a follow-up.