Failure Lab

Deterministic experiments for discovering and preventing failure modes in complex systems.

Reproduce the failure → detect the violation → restore the invariant → prove it holds

Why this exists

Most failures come from small invariant violations:

ambiguous state transitions
retry amplification
partial writes
recovery logic drift

This repository turns those failures into deterministic experiments.

Method

Every experiment follows the same loop:

flowchart TD
  A[Define invariant]
  A --> B[Reproduce failure]
  B --> C[Detect violation]
  C --> D[Introduce guardrail]
  D --> E[Prove invariant holds]

Failure Mode Bundle

Each failure is packaged as a reproducible bundle.

failure_modes/FM_XXX_name/

spec.md # violated invariant, trigger, detection, recovery
scenario.py # deterministic reproduction

tests/ # prove the failure and the fix
test_repro_fmxxx.py
test_prevent_fmxxx.py
test_recover_fmxxx.py

Experiment 01 — Predictable Job Queue

Minimal job processor designed to demonstrate correctness under stress.

Focus

duplicate execution
partial writes
retry amplification
stuck queues

Guarantees

Property	Guarantee
Delivery	at-least-once
Effects	exactly-once via idempotent handlers
Retries	bounded with dead-letter fallback
Recovery	invariants must be restored

Correctness is defined by invariants, not uptime.

Quick Start

make sync
make test

Current Failure Mode

FM_001 — Retry duplication → double execution

Reproduce and verify the fix:

make sync           # install required packages
make test-fm1       # reproduces
make test-fm1-fix   # proves prevention boundary

Repository Map

Scope
docs/00_scope.md

Invariants
docs/01_invariants.md

Failure-mode methodology
docs/03_failure_modes.md

Happy path baseline
docs/04_happy_path.md

Failure mode index
docs/failure_mode_index.md

Failure-mode bundles
failure_modes/FM_XXX_*

Where this applies

This methodology applies to any system where correctness matters:

distributed infrastructure
financial transaction systems
cryptographic protocols
AI systems

The pattern is always the same:

flowchart LR
  A[define the invariant]
  A --> B[break it]
  B --> C[build the guardrail]

Philosophy

Failure is inevitable.
Correctness is engineered.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
failure_modes		failure_modes
faults		faults
harness		harness
policies		policies
postmortems		postmortems
runtime		runtime
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AGENT_CONTEXT.md		AGENT_CONTEXT.md
GLOSSARY.md		GLOSSARY.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
taxonomy.md		taxonomy.md
todo.md		todo.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Failure Lab

Why this exists

Method

Failure Mode Bundle

Experiment 01 — Predictable Job Queue

Focus

Guarantees

Quick Start

Current Failure Mode

Repository Map

Where this applies

Philosophy

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Failure Lab

Why this exists

Method

Failure Mode Bundle

Experiment 01 — Predictable Job Queue

Focus

Guarantees

Quick Start

Current Failure Mode

Repository Map

Where this applies

Philosophy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages