Summary
The entire Rust test suite (~895 tests) checks structural properties — opcode names, ELF format validity, no-crash — but none verify that compiled code actually computes correct results. There are no semantic correctness tests.
Current state
- Structural tests: verify things like "the output contains an ADD instruction", "the ELF header is valid", "compilation does not panic"
- Semantic tests: zero tests that compile a WASM program, execute the ARM output (or simulate it), and verify the result is correct
- Renode tests (Bazel-only,
tests/ directory): these may test end-to-end semantics via ARM Cortex-M4 emulation, but they are not part of the standard cargo test workflow
Why this matters
This is how issues like the silent NOP catch-all (#35) go undetected — if no test ever checks "did i32.add 2 3 produce 5?", a codegen bug that emits NOP instead of ADD passes the entire test suite.
The Rocq proofs provide formal assurance for a subset of operations, but as documented in the proof divergence issue, the proofs verify a different program than the Rust compiler actually produces. Without semantic tests, there is no automated check that the shipped compiler produces correct output for any operation.
Recommendation
Add tests that:
- Compile simple WASM programs (single-function, single-operation)
- Either simulate the ARM output or use a known-good interpreter to verify correctness
- At minimum, cover the i32 operations that have Rocq proofs (add, sub, mul, div, rem, and/or/xor, shl/shr, eq/ne/lt/gt/le/ge)
A lightweight approach: for each WASM op, compile (i32.const A) (i32.const B) (i32.OP) and verify the ARM instruction sequence would produce the correct result via a simple Thumb-2 interpreter or by cross-checking register values against expected outputs.
This would also serve as a regression test to detect any future divergence between the Rocq proofs and the Rust implementation.
Summary
The entire Rust test suite (~895 tests) checks structural properties — opcode names, ELF format validity, no-crash — but none verify that compiled code actually computes correct results. There are no semantic correctness tests.
Current state
tests/directory): these may test end-to-end semantics via ARM Cortex-M4 emulation, but they are not part of the standardcargo testworkflowWhy this matters
This is how issues like the silent NOP catch-all (#35) go undetected — if no test ever checks "did
i32.add 2 3produce5?", a codegen bug that emits NOP instead of ADD passes the entire test suite.The Rocq proofs provide formal assurance for a subset of operations, but as documented in the proof divergence issue, the proofs verify a different program than the Rust compiler actually produces. Without semantic tests, there is no automated check that the shipped compiler produces correct output for any operation.
Recommendation
Add tests that:
A lightweight approach: for each WASM op, compile
(i32.const A) (i32.const B) (i32.OP)and verify the ARM instruction sequence would produce the correct result via a simple Thumb-2 interpreter or by cross-checking register values against expected outputs.This would also serve as a regression test to detect any future divergence between the Rocq proofs and the Rust implementation.