feat(prover): slim MEMW table and add MEMW_A aligned fast path#424
feat(prover): slim MEMW table and add MEMW_A aligned fast path#424diegokingston wants to merge 3 commits into
Conversation
Implement the new MEMW spec (PR #398) with two major changes: 1. MEMW slimdown (70→49 cols, 57→26 interactions): - Replace address_add[7] DWordHL (28 cols) with add_limb_overflow[7] Bit (7 cols), making address_add virtual via BusValue::linear - Remove 28 IS_HALFWORD interactions and 3 LT overflow checks - Replace AddConstraint pairs with IsBitConstraint::unconditional 2. MEMW_A aligned fast path (30 cols, 22 interactions): - New table for aligned accesses where all bytes share the same old_timestamp (covers all register ops + most memory ops) - Address decomposed into high/mid/low parts - AND_BYTE interaction for alignment check - Single old_timestamp instead of per-byte Operations are routed at trace build time: aligned ops with uniform timestamps go to MEMW_A, the rest go to the slimmed MEMW.
|
/bench |
Codex Code Review
Assumption
|
| ]); | ||
|
|
||
| // CM18/19: byte 1 with w2 | ||
| // For aligned accesses, adding 1 to the low byte never overflows to hi word |
There was a problem hiding this comment.
Medium — Overflow safety invariant depends on three independent range checks
The claim "alignment guarantees base_address_lo + width-1 < 2^32" holds, but only when all three lookup-enforced range checks are satisfied simultaneously:
IS_HALFWORD(MID)→MID ∈ [0, 2^16)IS_BYTE(LOW[1])→LOW[1] ∈ [0, 256)AND_BYTE(LOW[0], mask)→LOW[0] ∈ [0, 256)(implicit byte range) + alignment
With these, base_addr_lo ≤ 0xFFFFFFF8 for width=8 aligned (max LOW[0]=248), so +7 ≤ 0xFFFFFFFF. The property is correct, but there's no single algebraic constraint encoding it — it emerges from three separate bus interactions. If any of those range checks were ever relaxed or skipped (e.g., due to future refactoring of the Bitwise table), this no-overflow guarantee would silently break.
Consider adding a comment that explicitly names all three preconditions this relies on, to make the invariant auditable in isolation.
| } | ||
|
|
||
| /// Creates all constraints for the MEMW_A table (2 total). | ||
| pub fn constraints() -> Vec<Box<dyn TransitionConstraint<GoldilocksField, GoldilocksExtension>>> { |
There was a problem hiding this comment.
Low — No IS_BIT constraints for WRITE2, WRITE4, WRITE8, IS_REGISTER
MEMW_A has 2 algebraic constraints vs MEMW's 9 (which includes 7 IsBitConstraint::unconditional for ADD_LIMB_OVERFLOW). The write flags and IS_REGISTER here are only constrained to be binary via bus interactions (CO24/CO25 must match what CPU sends), not by local polynomial constraints.
This is consistent with the original MEMW design (write flags weren't IS_BIT-constrained there either), and the bus-level enforcement is sound. But it's worth noting that the AND_BYTE alignment mask write2*1 + write4*3 + write8*7 behaves correctly only if the flags are mutually exclusive and binary — a property that's enforced globally (via the bus) but not locally. Future auditors reading this table in isolation may miss this dependency.
| /// An operation is aligned if: | ||
| /// 1. For width > 1: base_address is aligned to width (low bits are zero) | ||
| /// 2. All accessed bytes share the same old_timestamp | ||
| fn is_aligned_op(op: &MemwOperation) -> bool { |
There was a problem hiding this comment.
Low — Routing logic is duplicated in two build paths
is_aligned_op + the partition call appear identically in both Traces::from_logs and the parallel build path. If the routing criteria ever change (e.g., a new alignment condition is added), both copies need to stay in sync — there's no compile-time guarantee they will.
Also note: the routing check here tests alignment on the full 32-bit low word (low & (width-1)), while MEMW_A's constraint tests only LOW[0] = addr & 0xFF. These are equivalent since masks ≤ 7 < 256, but they're expressed differently in two separate places. If width ever grows beyond 256, the two diverge.
|
No critical or high issues found. The column reduction (70->49 for MEMW, new 30-col MEMW_A), replacement of LT overflow checks with bit-column arithmetic, and aligned routing are all correct. Security analysis: Removed LT overflow checks (R1-R3): Replaced by add_limb_overflow[i] bit columns with IsBitConstraint::unconditional. Sound: wrong overflow bits produce incorrect memory bus tokens not matched by PAGE/REGISTER, so bus imbalance rejects the proof. MEMW_A address overflow invariant: base_addr_lo + i < 2^32 for aligned accesses is mathematically correct (max 8-byte aligned base_addr_lo = 0xFFFFFFF8, +7 = 0xFFFFFFFF). But this relies on IS_HALFWORD + IS_BYTE + AND_BYTE range checks all holding simultaneously. See inline comment at memw_aligned.rs:322. Alignment mask write21 + write43 + write8*7 is correct when flags are mutually exclusive. Exclusivity is enforced via CO24/CO25 bus interactions (must match CPU fingerprint), not local polynomial constraints. Issues (inline comments posted):
|
Benchmark — fib_iterative_2M (median of 3)Table parallelism: 32 (auto = cores / 3)
Memory GrowthMeasured with
Growth rate: 12145 MB / 1M steps (main: 13379, Δ: -9.2%)
Commit: 66ee8c6 · Baseline: cached · Runner: self-hosted bench |
Use Vec::with_capacity for known interaction counts (26 and 22) to avoid reallocations. Remove unnecessary clone on last mu_sum use.
|
/bench |
|
/bench 10 |
- Remove IS_HALFWORD[mid] and IS_BYTE[low[1]] bus interactions from MEMW_A: these are spec assumptions (MEMW_A-A2, MEMW_A-A3) satisfied by the CPU's IS_BYTE range checks on its byte decomposition plus Memw bus propagation. MID and LOW[1] only appear inside the linear combination 2^16*MID + 2^8*LOW[1] + LOW[0], which the bus constrains to equal the CPU's base_address_0, so individual range checks are redundant. Saves 2 bus interactions per MEMW_A row. - Remove corresponding IS_HALFWORD and IS_BYTE bitwise lookups from collect_bitwise_from_memw_aligned in trace_builder. - Update stale LT comment labels in memw.rs: C7-C10 → MEMW-C4 through MEMW-C7 to match current spec numbering.
|
/bench |
|
Closing, prove time improved but heap regressed too much. |
Implement the new MEMW spec (PR #398) with two major changes:
MEMW slimdown (70→49 cols, 57→26 interactions):
MEMW_A aligned fast path (30 cols, 22 interactions):
Operations are routed at trace build time: aligned ops with uniform timestamps go to MEMW_A, the rest go to the slimmed MEMW.