diff --git a/docs/mutation-testing.md b/docs/mutation-testing.md new file mode 100644 index 0000000..6a6983a --- /dev/null +++ b/docs/mutation-testing.md @@ -0,0 +1,156 @@ +# Mutation testing with cargo-mutants + +This document captures rivet's mutation-testing pattern so other +pulseengine repos can adopt it consistently. It is the canonical +reference for the [`templates/cargo-mutants`](../templates/cargo-mutants/) +template files. + +## Why mutation testing + +Mutation testing measures **test-suite adequacy**: it perturbs the +code under test and counts how many perturbations the suite catches. +A test suite that achieves high line/branch coverage but kills few +mutants is a suite full of assertions that don't actually constrain +behaviour. + +Mutation score is recognised under: + +- **IEC 61508 Annex C.5.12** — table C.13 lists "mutation analysis" + as a recommended technique for verifying test-case completeness. +- **ISO 26262-6 Table 13** — recommended for ASIL C/D unit-test + coverage assessment. +- **EN 50128** and **DO-178C** treat mutation analysis as + acceptable evidence of structural-coverage robustness. + +For Rust specifically, mutation score is the most credible answer to +the open MC/DC-for-Rust problem: Rust has no production MC/DC tooling, +and mutation analysis fills the same evidentiary slot. + +## When to run + +| Stage | Cost | Recommendation | +|---|---|---| +| Pre-commit | Too slow (minutes per file) | Skip | +| Pre-push (smoke) | 1–5 min on one crate | Optional; rivet does this | +| CI nightly | 30–90 min per crate | Required for ASIL ≥ B / DAL ≥ C | +| Pre-release | Hours, full workspace | Required for ASIL D / DAL A | + +The nightly CI gate is the load-bearing one — it is the only level at +which a meaningful mutation score is computed and recorded. The +pre-push smoke is just a sanity check that mutation testing still +runs at all. + +## Score targets per safety level + +These are PulseEngine internal targets. They are stricter than any +single standard requires because we apply mutation-score as the +primary structural-coverage gate for Rust. + +| Safety level | Mutation-score floor | Rationale | +|---|---|---| +| QM / DAL E | no requirement | Mutation testing optional | +| ASIL A / DAL D | ≥ 0.70 | Catch obvious assertion gaps | +| ASIL B / DAL C | ≥ 0.80 | Match IEC 61508 SIL 2 expectations | +| ASIL C / DAL B | ≥ 0.85 | | +| ASIL D / DAL A | ≥ 0.90 | Closes the MC/DC-for-Rust gap | + +Record the target on the relevant `test-spec` artifact via +`mutation-score-target` and the measured value on each `test-exec` via +`mutation-score`. `rivet validate` will (in a future change tracked +under #188) compare measured vs. target and surface drift. + +## Recording results in rivet + +The schema fields land in `schemas/score.yaml`: + +```yaml +- id: TEST-SPEC-007 + type: test-spec + title: rivet-core unit tests + fields: + safety-level: ASIL_C + mutation-score-target: 0.85 + +- id: TEST-EXEC-2026-04-27 + type: test-exec + fields: + version: v0.5.0 + commit: 92ad95d + timestamp: 2026-04-27T02:00:00Z + mutation-score: 0.872 + mutants-tested: 481 + mutants-killed: 419 + mutants-missed: 49 + mutants-timeout: 8 + mutants-unviable: 5 + links: + - type: belongs-to + target: TEST-SPEC-007 +``` + +`mutants-tested = mutants-killed + mutants-missed + mutants-timeout ++ mutants-unviable`. cargo-mutants treats `timeout` as caught and +`unviable` (didn't compile) as excluded, so: + +``` +mutation-score = (killed + timeout) / (tested - unviable) +``` + +## Marking unreachable mutants + +Some mutants are unreachable by construction (defensive `assert!` on +type-system invariants, debug-only `tracing::debug!` calls that have +no observable effect, etc.). Skipping them is fine if you can justify +the rationale — record the rationale alongside the skip so an +auditor can read both together. + +### Per-call skip via `mutants.toml` + +For pattern-wide skips (e.g., all `tracing::debug!` calls), use the +`skip_calls` array in `mutants.toml`. The template ships with +`tracing::trace` and `tracing::debug` excluded by default. + +### Per-function skip via attribute + +For ad-hoc skips on a specific function, add an attribute and a +comment justifying it: + +```rust +// cargo-mutants: defensive bounds check; mutating the comparison +// would corrupt unrelated proofs that rely on this invariant. +#[cfg_attr(test, mutants::skip)] +fn assert_index_in_bounds(i: usize, len: usize) { + assert!(i < len); +} +``` + +The cfg_attr scoping keeps the attribute out of release builds. + +## Adopting in another pulseengine repo + +1. Copy the template files: + ```sh + cp rivet/templates/cargo-mutants/mutants.toml . + cp rivet/templates/cargo-mutants/mutants.yml .github/workflows/ + ``` +2. Edit `.github/workflows/mutants.yml` — replace the `matrix.crate` + list with your crates. +3. Edit `mutants.toml` — tighten `exclude_globs` and `skip_calls` for + crate-specific noise. +4. Add (or update) a `test-spec` artifact that names the suite under + test and sets `mutation-score-target` per the table above. +5. After the first nightly run, file `test-exec` artifacts to record + measured scores. Automate this in your CI workflow. + +## Open questions / non-goals + +- **Cross-repo aggregation** is tracked separately in + [#188](https://github.com/pulseengine/rivet/issues/188). The schema + fields above are designed to feed that dashboard; rendering belongs + to the coverage-matrix initiative. +- **Mutation testing of proof code** (Verus, Rocq, Lean) is out of + scope. The proofs verify, by definition, the property they + state — there is no "test suite" to mutate against them. +- **Differential mutation** (testing mutants against a delta rather + than a full suite) is not yet templated; cargo-mutants supports it + via `--in-diff` but our nightly schedule runs full suites today. diff --git a/schemas/score.yaml b/schemas/score.yaml index cd1d212..4272ac0 100644 --- a/schemas/score.yaml +++ b/schemas/score.yaml @@ -545,6 +545,14 @@ artifact-types: - name: expected-result type: text required: false + - name: mutation-score-target + type: number + required: false + description: > + Target mutation-kill rate for this test spec, as a fraction in + [0.0, 1.0]. Per-ASIL/DAL recommendations live in + docs/mutation-testing.md. Compared against the measured + `mutation-score` on the corresponding test-exec. link-fields: - name: fully-verifies link-type: fully-verifies @@ -588,6 +596,38 @@ artifact-types: type: structured required: false description: OS, toolchain, hardware configuration + - name: mutation-score + type: number + required: false + description: > + Measured mutation-kill rate for this execution, as a fraction in + [0.0, 1.0]. Recorded by the cargo-mutants run that produced this + test-exec. See docs/mutation-testing.md for ASIL/DAL targets. + - name: mutants-tested + type: number + required: false + description: > + Total number of mutants generated and tested in this run. + Optional but useful for understanding the basis of `mutation-score`. + - name: mutants-killed + type: number + required: false + description: Number of mutants caught by the test suite. + - name: mutants-missed + type: number + required: false + description: > + Number of mutants that survived the test suite. Each missed + mutant is a coverage gap. Should equal `mutants-tested - + mutants-killed - mutants-timeout - mutants-unviable`. + - name: mutants-timeout + type: number + required: false + description: Number of mutants that timed out (treated as killed by cargo-mutants). + - name: mutants-unviable + type: number + required: false + description: Number of mutants that did not compile (excluded from score). link-fields: - name: belongs-to link-type: belongs-to diff --git a/templates/cargo-mutants/README.md b/templates/cargo-mutants/README.md new file mode 100644 index 0000000..aa2d701 --- /dev/null +++ b/templates/cargo-mutants/README.md @@ -0,0 +1,36 @@ +# cargo-mutants template + +Reusable cargo-mutants configuration extracted from rivet's pre-push hook. + +## Files + +- `mutants.toml` — base config (timeouts, exclusions, skip calls). +- `mutants.yml` — nightly + manual-dispatch GitHub Actions workflow. + +## Quickstart for adopters + +```sh +# From the root of the adopting repo: +mkdir -p .github/workflows +cp .../rivet/templates/cargo-mutants/mutants.toml ./mutants.toml +cp .../rivet/templates/cargo-mutants/mutants.yml .github/workflows/mutants.yml + +# Edit mutants.yml and replace the matrix `crate` list with your crates. +# Edit mutants.toml and tighten exclusions / skip_calls per crate. +``` + +## Three operating modes + +| Mode | Where | Cost | When | +|---|---|---|---| +| Pre-commit (off) | Local | Too slow for `pre-commit` | Skip; mutation testing should not block local edits. | +| Pre-push smoke | Local `.pre-commit-config.yaml`, `stages: [pre-push]` | ~1–5 min | Optional, against a single crate's lib. | +| CI nightly | `mutants.yml` | 30–90 min per crate | Required gate for safety-critical crates. | + +## Score targets + +See [`docs/mutation-testing.md`](../../docs/mutation-testing.md) for +ASIL/DAL targets and the procedure for marking unreachable mutants. +The schema field `mutation-score` on `test-exec` (in +`schemas/score.yaml`) is the place to record measured scores so they +flow into rivet traceability. diff --git a/templates/cargo-mutants/mutants.toml b/templates/cargo-mutants/mutants.toml new file mode 100644 index 0000000..dd15945 --- /dev/null +++ b/templates/cargo-mutants/mutants.toml @@ -0,0 +1,39 @@ +# cargo-mutants configuration template +# +# Reusable starting point for pulseengine repos that adopt mutation testing. +# Copy to the repo root as `mutants.toml`. Tune per-crate as needed; the +# defaults below match rivet's pre-push smoke profile. +# +# Reference: https://mutants.rs/mutants_toml.html +# Schema field counterpart: see `mutation-score` on `test-exec` in +# schemas/score.yaml; record measured scores there for traceability. + +# Per-mutant test timeout, in seconds. cargo-mutants kills the test process +# if a mutant takes longer than (baseline_test_time * timeout_multiplier), +# clamped to at least `minimum_test_timeout`. 60s is the rivet smoke value. +minimum_test_timeout = 60 + +# Skip generated, vendored, and proof-only paths. Mutating these costs CI +# time without exercising real code paths. +exclude_globs = [ + "target/**", + "vendor/**", + "**/generated/**", + "proofs/**", + "verus/**", + "fuzz/fuzz_targets/**", +] + +# Pass `--lib` to the underlying `cargo test`. Mutation testing typically +# targets library code; integration / E2E tests are too slow to be useful +# as kill criteria. Override per-crate if you need to include `--tests`. +additional_cargo_test_args = ["--lib"] + +# Functions cargo-mutants should not touch. Common cases: +# - `Drop::drop` (mutating Drop usually corrupts unrelated tests) +# - logging / tracing macros (no behavioural effect) +# Add patterns specific to your crate. +skip_calls = [ + "tracing::trace", + "tracing::debug", +] diff --git a/templates/cargo-mutants/mutants.yml b/templates/cargo-mutants/mutants.yml new file mode 100644 index 0000000..2098f52 --- /dev/null +++ b/templates/cargo-mutants/mutants.yml @@ -0,0 +1,83 @@ +# cargo-mutants nightly mutation-testing workflow template. +# +# Adopters copy this file to `.github/workflows/mutants.yml` and adjust: +# - the matrix `crate` list to the crates they want covered +# - the schedule cron (default: 02:00 UTC nightly) +# - the `MUTANTS_TIMEOUT` if their crate's test suite needs more headroom +# +# See docs/mutation-testing.md (in rivet) for the rationale, ASIL/DAL +# score targets, and how to mark unreachable mutants. +# +# This workflow is deliberately separate from CI: +# - CI runs on every PR and must stay fast (minutes). +# - cargo-mutants is slow (often >30 min for a non-trivial crate) and +# belongs on a nightly schedule + manual dispatch only. + +name: Mutants + +on: + schedule: + # Nightly at 02:00 UTC. Adjust to your maintenance window. + - cron: '0 2 * * *' + workflow_dispatch: + inputs: + crate: + description: 'Crate to mutate (default: all)' + required: false + default: '' + +permissions: + contents: read + +env: + CARGO_TERM_COLOR: always + # Per-mutant timeout in seconds. Should match `minimum_test_timeout` + # in mutants.toml or be slightly higher. + MUTANTS_TIMEOUT: '60' + +jobs: + mutants: + name: cargo mutants (${{ matrix.crate }}) + runs-on: ubuntu-latest + # Mutation runs are intentionally slow. 90 minutes is a reasonable + # ceiling per crate; bump if your suite is larger. + timeout-minutes: 90 + strategy: + fail-fast: false + matrix: + # CUSTOMIZE: list one entry per crate to mutate. Splitting per crate + # lets the matrix parallelise and keeps each shard under timeout. + crate: + - rivet-core + - rivet-cli + steps: + - uses: actions/checkout@v6 + with: + fetch-depth: 1 + + - uses: dtolnay/rust-toolchain@stable + + - uses: Swatinem/rust-cache@v2 + with: + # Mutation runs invalidate caches frequently; keep a separate key. + key: mutants-${{ matrix.crate }} + + - name: Install cargo-mutants + run: cargo install --locked cargo-mutants + + - name: Run mutation tests + run: | + cargo mutants \ + --jobs 4 \ + --minimum-test-timeout "${MUTANTS_TIMEOUT}" \ + -p "${{ matrix.crate }}" \ + --in-place \ + --no-shuffle + + - name: Upload mutants.out + if: always() + uses: actions/upload-artifact@v4 + with: + name: mutants-${{ matrix.crate }} + path: mutants.out/ + retention-days: 14