Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions docs/mutation-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Mutation testing with cargo-mutants

This document captures rivet's mutation-testing pattern so other
pulseengine repos can adopt it consistently. It is the canonical
reference for the [`templates/cargo-mutants`](../templates/cargo-mutants/)
template files.

## Why mutation testing

Mutation testing measures **test-suite adequacy**: it perturbs the
code under test and counts how many perturbations the suite catches.
A test suite that achieves high line/branch coverage but kills few
mutants is a suite full of assertions that don't actually constrain
behaviour.

Mutation score is recognised under:

- **IEC 61508 Annex C.5.12** — table C.13 lists "mutation analysis"
as a recommended technique for verifying test-case completeness.
- **ISO 26262-6 Table 13** — recommended for ASIL C/D unit-test
coverage assessment.
- **EN 50128** and **DO-178C** treat mutation analysis as
acceptable evidence of structural-coverage robustness.

For Rust specifically, mutation score is the most credible answer to
the open MC/DC-for-Rust problem: Rust has no production MC/DC tooling,
and mutation analysis fills the same evidentiary slot.

## When to run

| Stage | Cost | Recommendation |
|---|---|---|
| Pre-commit | Too slow (minutes per file) | Skip |
| Pre-push (smoke) | 1–5 min on one crate | Optional; rivet does this |
| CI nightly | 30–90 min per crate | Required for ASIL ≥ B / DAL ≥ C |
| Pre-release | Hours, full workspace | Required for ASIL D / DAL A |

The nightly CI gate is the load-bearing one — it is the only level at
which a meaningful mutation score is computed and recorded. The
pre-push smoke is just a sanity check that mutation testing still
runs at all.

## Score targets per safety level

These are PulseEngine internal targets. They are stricter than any
single standard requires because we apply mutation-score as the
primary structural-coverage gate for Rust.

| Safety level | Mutation-score floor | Rationale |
|---|---|---|
| QM / DAL E | no requirement | Mutation testing optional |
| ASIL A / DAL D | ≥ 0.70 | Catch obvious assertion gaps |
| ASIL B / DAL C | ≥ 0.80 | Match IEC 61508 SIL 2 expectations |
| ASIL C / DAL B | ≥ 0.85 | |
| ASIL D / DAL A | ≥ 0.90 | Closes the MC/DC-for-Rust gap |

Record the target on the relevant `test-spec` artifact via
`mutation-score-target` and the measured value on each `test-exec` via
`mutation-score`. `rivet validate` will (in a future change tracked
under #188) compare measured vs. target and surface drift.

## Recording results in rivet

The schema fields land in `schemas/score.yaml`:

```yaml
- id: TEST-SPEC-007
type: test-spec
title: rivet-core unit tests
fields:
safety-level: ASIL_C
mutation-score-target: 0.85

- id: TEST-EXEC-2026-04-27
type: test-exec
fields:
version: v0.5.0
commit: 92ad95d
timestamp: 2026-04-27T02:00:00Z
mutation-score: 0.872
mutants-tested: 481
mutants-killed: 419
mutants-missed: 49
mutants-timeout: 8
mutants-unviable: 5
links:
- type: belongs-to
target: TEST-SPEC-007
```

`mutants-tested = mutants-killed + mutants-missed + mutants-timeout
+ mutants-unviable`. cargo-mutants treats `timeout` as caught and
`unviable` (didn't compile) as excluded, so:

```
mutation-score = (killed + timeout) / (tested - unviable)
```

## Marking unreachable mutants

Some mutants are unreachable by construction (defensive `assert!` on
type-system invariants, debug-only `tracing::debug!` calls that have
no observable effect, etc.). Skipping them is fine if you can justify
the rationale — record the rationale alongside the skip so an
auditor can read both together.

### Per-call skip via `mutants.toml`

For pattern-wide skips (e.g., all `tracing::debug!` calls), use the
`skip_calls` array in `mutants.toml`. The template ships with
`tracing::trace` and `tracing::debug` excluded by default.

### Per-function skip via attribute

For ad-hoc skips on a specific function, add an attribute and a
comment justifying it:

```rust
// cargo-mutants: defensive bounds check; mutating the comparison
// would corrupt unrelated proofs that rely on this invariant.
#[cfg_attr(test, mutants::skip)]
fn assert_index_in_bounds(i: usize, len: usize) {
assert!(i < len);
}
```

The cfg_attr scoping keeps the attribute out of release builds.

## Adopting in another pulseengine repo

1. Copy the template files:
```sh
cp rivet/templates/cargo-mutants/mutants.toml .
cp rivet/templates/cargo-mutants/mutants.yml .github/workflows/
```
2. Edit `.github/workflows/mutants.yml` — replace the `matrix.crate`
list with your crates.
3. Edit `mutants.toml` — tighten `exclude_globs` and `skip_calls` for
crate-specific noise.
4. Add (or update) a `test-spec` artifact that names the suite under
test and sets `mutation-score-target` per the table above.
5. After the first nightly run, file `test-exec` artifacts to record
measured scores. Automate this in your CI workflow.

## Open questions / non-goals

- **Cross-repo aggregation** is tracked separately in
[#188](https://github.com/pulseengine/rivet/issues/188). The schema
fields above are designed to feed that dashboard; rendering belongs
to the coverage-matrix initiative.
- **Mutation testing of proof code** (Verus, Rocq, Lean) is out of
scope. The proofs verify, by definition, the property they
state — there is no "test suite" to mutate against them.
- **Differential mutation** (testing mutants against a delta rather
than a full suite) is not yet templated; cargo-mutants supports it
via `--in-diff` but our nightly schedule runs full suites today.
40 changes: 40 additions & 0 deletions schemas/score.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,14 @@ artifact-types:
- name: expected-result
type: text
required: false
- name: mutation-score-target
type: number
required: false
description: >
Target mutation-kill rate for this test spec, as a fraction in
[0.0, 1.0]. Per-ASIL/DAL recommendations live in
docs/mutation-testing.md. Compared against the measured
`mutation-score` on the corresponding test-exec.
link-fields:
- name: fully-verifies
link-type: fully-verifies
Expand Down Expand Up @@ -588,6 +596,38 @@ artifact-types:
type: structured
required: false
description: OS, toolchain, hardware configuration
- name: mutation-score
type: number
required: false
description: >
Measured mutation-kill rate for this execution, as a fraction in
[0.0, 1.0]. Recorded by the cargo-mutants run that produced this
test-exec. See docs/mutation-testing.md for ASIL/DAL targets.
- name: mutants-tested
type: number
required: false
description: >
Total number of mutants generated and tested in this run.
Optional but useful for understanding the basis of `mutation-score`.
- name: mutants-killed
type: number
required: false
description: Number of mutants caught by the test suite.
- name: mutants-missed
type: number
required: false
description: >
Number of mutants that survived the test suite. Each missed
mutant is a coverage gap. Should equal `mutants-tested -
mutants-killed - mutants-timeout - mutants-unviable`.
- name: mutants-timeout
type: number
required: false
description: Number of mutants that timed out (treated as killed by cargo-mutants).
- name: mutants-unviable
type: number
required: false
description: Number of mutants that did not compile (excluded from score).
link-fields:
- name: belongs-to
link-type: belongs-to
Expand Down
36 changes: 36 additions & 0 deletions templates/cargo-mutants/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# cargo-mutants template

Reusable cargo-mutants configuration extracted from rivet's pre-push hook.

## Files

- `mutants.toml` — base config (timeouts, exclusions, skip calls).
- `mutants.yml` — nightly + manual-dispatch GitHub Actions workflow.

## Quickstart for adopters

```sh
# From the root of the adopting repo:
mkdir -p .github/workflows
cp .../rivet/templates/cargo-mutants/mutants.toml ./mutants.toml
cp .../rivet/templates/cargo-mutants/mutants.yml .github/workflows/mutants.yml

# Edit mutants.yml and replace the matrix `crate` list with your crates.
# Edit mutants.toml and tighten exclusions / skip_calls per crate.
```

## Three operating modes

| Mode | Where | Cost | When |
|---|---|---|---|
| Pre-commit (off) | Local | Too slow for `pre-commit` | Skip; mutation testing should not block local edits. |
| Pre-push smoke | Local `.pre-commit-config.yaml`, `stages: [pre-push]` | ~1–5 min | Optional, against a single crate's lib. |
| CI nightly | `mutants.yml` | 30–90 min per crate | Required gate for safety-critical crates. |

## Score targets

See [`docs/mutation-testing.md`](../../docs/mutation-testing.md) for
ASIL/DAL targets and the procedure for marking unreachable mutants.
The schema field `mutation-score` on `test-exec` (in
`schemas/score.yaml`) is the place to record measured scores so they
flow into rivet traceability.
39 changes: 39 additions & 0 deletions templates/cargo-mutants/mutants.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# cargo-mutants configuration template
#
# Reusable starting point for pulseengine repos that adopt mutation testing.
# Copy to the repo root as `mutants.toml`. Tune per-crate as needed; the
# defaults below match rivet's pre-push smoke profile.
#
# Reference: https://mutants.rs/mutants_toml.html
# Schema field counterpart: see `mutation-score` on `test-exec` in
# schemas/score.yaml; record measured scores there for traceability.

# Per-mutant test timeout, in seconds. cargo-mutants kills the test process
# if a mutant takes longer than (baseline_test_time * timeout_multiplier),
# clamped to at least `minimum_test_timeout`. 60s is the rivet smoke value.
minimum_test_timeout = 60

# Skip generated, vendored, and proof-only paths. Mutating these costs CI
# time without exercising real code paths.
exclude_globs = [
"target/**",
"vendor/**",
"**/generated/**",
"proofs/**",
"verus/**",
"fuzz/fuzz_targets/**",
]

# Pass `--lib` to the underlying `cargo test`. Mutation testing typically
# targets library code; integration / E2E tests are too slow to be useful
# as kill criteria. Override per-crate if you need to include `--tests`.
additional_cargo_test_args = ["--lib"]

# Functions cargo-mutants should not touch. Common cases:
# - `Drop::drop` (mutating Drop usually corrupts unrelated tests)
# - logging / tracing macros (no behavioural effect)
# Add patterns specific to your crate.
skip_calls = [
"tracing::trace",
"tracing::debug",
]
83 changes: 83 additions & 0 deletions templates/cargo-mutants/mutants.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# cargo-mutants nightly mutation-testing workflow template.
#
# Adopters copy this file to `.github/workflows/mutants.yml` and adjust:
# - the matrix `crate` list to the crates they want covered
# - the schedule cron (default: 02:00 UTC nightly)
# - the `MUTANTS_TIMEOUT` if their crate's test suite needs more headroom
#
# See docs/mutation-testing.md (in rivet) for the rationale, ASIL/DAL
# score targets, and how to mark unreachable mutants.
#
# This workflow is deliberately separate from CI:
# - CI runs on every PR and must stay fast (minutes).
# - cargo-mutants is slow (often >30 min for a non-trivial crate) and
# belongs on a nightly schedule + manual dispatch only.

name: Mutants

on:
schedule:
# Nightly at 02:00 UTC. Adjust to your maintenance window.
- cron: '0 2 * * *'
workflow_dispatch:
inputs:
crate:
description: 'Crate to mutate (default: all)'
required: false
default: ''

permissions:
contents: read

env:
CARGO_TERM_COLOR: always
# Per-mutant timeout in seconds. Should match `minimum_test_timeout`
# in mutants.toml or be slightly higher.
MUTANTS_TIMEOUT: '60'

jobs:
mutants:
name: cargo mutants (${{ matrix.crate }})
runs-on: ubuntu-latest
# Mutation runs are intentionally slow. 90 minutes is a reasonable
# ceiling per crate; bump if your suite is larger.
timeout-minutes: 90
strategy:
fail-fast: false
matrix:
# CUSTOMIZE: list one entry per crate to mutate. Splitting per crate
# lets the matrix parallelise and keeps each shard under timeout.
crate:
- rivet-core
- rivet-cli
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 1

- uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
# Mutation runs invalidate caches frequently; keep a separate key.
key: mutants-${{ matrix.crate }}

- name: Install cargo-mutants
run: cargo install --locked cargo-mutants

- name: Run mutation tests
run: |
cargo mutants \
--jobs 4 \
--minimum-test-timeout "${MUTANTS_TIMEOUT}" \
-p "${{ matrix.crate }}" \
--in-place \
--no-shuffle

- name: Upload mutants.out
if: always()
uses: actions/upload-artifact@v4
with:
name: mutants-${{ matrix.crate }}
path: mutants.out/
retention-days: 14
Loading