Summary
Add cargo-mutants to CI in diff-only mode (--in-diff) to catch undertested code in PRs. Full mutation testing is infeasible (~25,800 mutants × ~2 min each ≈ 750 hours), but diff-scoped runs are practical.
Why
We enforce >95% line coverage via codecov, but line coverage doesn't measure test quality — a test that runs code without asserting on its output achieves coverage without catching bugs. Mutation testing fills this gap: if replacing + with - in a function doesn't fail any test, the tests are inadequate.
Proposal
PR CI job (diff-only)
Add a GitHub Actions job that runs mutations only on lines changed in the PR:
- name: Mutation testing (changed lines only)
run: |
cargo mutants \
--in-diff <(git diff origin/main) \
--timeout-multiplier 3 \
-j 4 \
-- --features ilp-highs
A typical PR touching 5-10 files generates 50-200 mutants. With 4 parallel jobs and ~2 min per mutant, this is 25-100 minutes — further reducible with sharding across runners.
Configuration (cargo-mutants.toml)
timeout_multiplier = 3.0
minimum_test_timeout = 60
# Exclude generated/non-logic code from mutation
exclude_re = ["fn fmt\\(", "fn default\\(", "fn clone\\("]
exclude_globs = ["examples/**", "problemreductions-macros/**"]
additional_args = ["--features", "ilp-highs"]
jobs = 4
Rollout
- Phase 1 — Informational: Add the CI job as non-blocking (continue-on-error). Upload
mutants.out/ as an artifact. Review surviving mutants manually to calibrate expectations.
- Phase 2 — Enforcing: Once the baseline is clean, make the job blocking. Surviving mutants in changed code fail the PR.
Optional: sharding for large PRs
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: cargo mutants --in-diff <(git diff origin/main) --shard ${{ matrix.shard }}/4
Numbers
| Metric |
Value |
| Total mutants (full project) |
25,809 |
src/rules/ |
15,638 |
src/models/ |
8,318 |
src/solvers/ |
237 |
| Baseline test time |
~1m43s |
| Full run estimate |
~750 hours (infeasible) |
| Typical PR diff run |
50-200 mutants → 25-100 min |
Out of scope
- Full/scheduled mutation testing runs (infeasible at current project size)
- Mutation-based coverage metrics in codecov
Summary
Add cargo-mutants to CI in diff-only mode (
--in-diff) to catch undertested code in PRs. Full mutation testing is infeasible (~25,800 mutants × ~2 min each ≈ 750 hours), but diff-scoped runs are practical.Why
We enforce >95% line coverage via codecov, but line coverage doesn't measure test quality — a test that runs code without asserting on its output achieves coverage without catching bugs. Mutation testing fills this gap: if replacing
+with-in a function doesn't fail any test, the tests are inadequate.Proposal
PR CI job (diff-only)
Add a GitHub Actions job that runs mutations only on lines changed in the PR:
A typical PR touching 5-10 files generates 50-200 mutants. With 4 parallel jobs and ~2 min per mutant, this is 25-100 minutes — further reducible with sharding across runners.
Configuration (cargo-mutants.toml)
Rollout
mutants.out/as an artifact. Review surviving mutants manually to calibrate expectations.Optional: sharding for large PRs
Numbers
src/rules/src/models/src/solvers/Out of scope