test: zero tests verify compiled code produces correct results

## Summary

The entire Rust test suite (~895 tests) checks structural properties — opcode names, ELF format validity, no-crash — but none verify that compiled code actually computes correct results. There are no semantic correctness tests.

---

## Current state

- **Structural tests**: verify things like "the output contains an ADD instruction", "the ELF header is valid", "compilation does not panic"
- **Semantic tests**: zero tests that compile a WASM program, execute the ARM output (or simulate it), and verify the result is correct
- **Renode tests** (Bazel-only, `tests/` directory): these may test end-to-end semantics via ARM Cortex-M4 emulation, but they are not part of the standard `cargo test` workflow

---

## Why this matters

This is how issues like the silent NOP catch-all (#35) go undetected — if no test ever checks "did `i32.add 2 3` produce `5`?", a codegen bug that emits NOP instead of ADD passes the entire test suite.

The Rocq proofs provide formal assurance for a subset of operations, but as documented in the proof divergence issue, the proofs verify a different program than the Rust compiler actually produces. Without semantic tests, there is no automated check that the shipped compiler produces correct output for any operation.

---

## Recommendation

Add tests that:

1. Compile simple WASM programs (single-function, single-operation)
2. Either simulate the ARM output or use a known-good interpreter to verify correctness
3. At minimum, cover the i32 operations that have Rocq proofs (add, sub, mul, div, rem, and/or/xor, shl/shr, eq/ne/lt/gt/le/ge)

A lightweight approach: for each WASM op, compile `(i32.const A) (i32.const B) (i32.OP)` and verify the ARM instruction sequence would produce the correct result via a simple Thumb-2 interpreter or by cross-checking register values against expected outputs.

This would also serve as a regression test to detect any future divergence between the Rocq proofs and the Rust implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: zero tests verify compiled code produces correct results #74

Summary

Current state

Why this matters

Recommendation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test: zero tests verify compiled code produces correct results #74

Description

Summary

Current state

Why this matters

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions