Skip to content

Integrate cargo-mutants (diff-only mode) into CI #1053

@zazabap

Description

@zazabap

Summary

Add cargo-mutants to CI in diff-only mode (--in-diff) to catch undertested code in PRs. Full mutation testing is infeasible (~25,800 mutants × ~2 min each ≈ 750 hours), but diff-scoped runs are practical.

Why

We enforce >95% line coverage via codecov, but line coverage doesn't measure test quality — a test that runs code without asserting on its output achieves coverage without catching bugs. Mutation testing fills this gap: if replacing + with - in a function doesn't fail any test, the tests are inadequate.

Proposal

PR CI job (diff-only)

Add a GitHub Actions job that runs mutations only on lines changed in the PR:

- name: Mutation testing (changed lines only)
  run: |
    cargo mutants \
      --in-diff <(git diff origin/main) \
      --timeout-multiplier 3 \
      -j 4 \
      -- --features ilp-highs

A typical PR touching 5-10 files generates 50-200 mutants. With 4 parallel jobs and ~2 min per mutant, this is 25-100 minutes — further reducible with sharding across runners.

Configuration (cargo-mutants.toml)

timeout_multiplier = 3.0
minimum_test_timeout = 60

# Exclude generated/non-logic code from mutation
exclude_re = ["fn fmt\\(", "fn default\\(", "fn clone\\("]
exclude_globs = ["examples/**", "problemreductions-macros/**"]

additional_args = ["--features", "ilp-highs"]
jobs = 4

Rollout

  1. Phase 1 — Informational: Add the CI job as non-blocking (continue-on-error). Upload mutants.out/ as an artifact. Review surviving mutants manually to calibrate expectations.
  2. Phase 2 — Enforcing: Once the baseline is clean, make the job blocking. Surviving mutants in changed code fail the PR.

Optional: sharding for large PRs

strategy:
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: cargo mutants --in-diff <(git diff origin/main) --shard ${{ matrix.shard }}/4

Numbers

Metric Value
Total mutants (full project) 25,809
src/rules/ 15,638
src/models/ 8,318
src/solvers/ 237
Baseline test time ~1m43s
Full run estimate ~750 hours (infeasible)
Typical PR diff run 50-200 mutants → 25-100 min

Out of scope

  • Full/scheduled mutation testing runs (infeasible at current project size)
  • Mutation-based coverage metrics in codecov

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions