Deterministic experiments for discovering and preventing failure modes in complex systems.
Reproduce the failure → detect the violation → restore the invariant → prove it holds
Most failures come from small invariant violations:
- ambiguous state transitions
- retry amplification
- partial writes
- recovery logic drift
This repository turns those failures into deterministic experiments.
Every experiment follows the same loop:
flowchart TD
A[Define invariant]
A --> B[Reproduce failure]
B --> C[Detect violation]
C --> D[Introduce guardrail]
D --> E[Prove invariant holds]
Each failure is packaged as a reproducible bundle.
failure_modes/FM_XXX_name/
spec.md # violated invariant, trigger, detection, recovery
scenario.py # deterministic reproduction
tests/ # prove the failure and the fix
test_repro_fmxxx.py
test_prevent_fmxxx.py
test_recover_fmxxx.pyMinimal job processor designed to demonstrate correctness under stress.
- duplicate execution
- partial writes
- retry amplification
- stuck queues
| Property | Guarantee |
|---|---|
| Delivery | at-least-once |
| Effects | exactly-once via idempotent handlers |
| Retries | bounded with dead-letter fallback |
| Recovery | invariants must be restored |
Correctness is defined by invariants, not uptime.
make sync
make testFM_001 — Retry duplication → double execution
Reproduce and verify the fix:
make sync # install required packages
make test-fm1 # reproduces
make test-fm1-fix # proves prevention boundaryScope
docs/00_scope.md
Invariants
docs/01_invariants.md
Failure-mode methodology
docs/03_failure_modes.md
Happy path baseline
docs/04_happy_path.md
Failure mode index
docs/failure_mode_index.md
Failure-mode bundles
failure_modes/FM_XXX_*
This methodology applies to any system where correctness matters:
- distributed infrastructure
- financial transaction systems
- cryptographic protocols
- AI systems
The pattern is always the same:
flowchart LR
A[define the invariant]
A --> B[break it]
B --> C[build the guardrail]
Failure is inevitable.
Correctness is engineered.