buddingengineers12345 · buddingengineers12345 · Apr 10, 2026 · Apr 10, 2026 · Apr 10, 2026
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -56,9 +56,11 @@ for the full developer guide.
 | `pre-bash-block-no-verify.sh` | PreToolUse | Bash | Block `--no-verify` in git commands |
 | `pre-bash-git-push-reminder.sh` | PreToolUse | Bash | Warn to run `just review` before push |
 | `pre-bash-commit-quality.sh` | PreToolUse | Bash | Scan staged `.py` files for secrets/debug markers |
+| `pre-bash-coverage-gate.sh` | PreToolUse | Bash | Warn before `git commit` if coverage below threshold |
 | `pre-config-protection.sh` | PreToolUse | Write\|Edit\|MultiEdit | Block weakening ruff/pyright config edits |
 | `pre-protect-uv-lock.sh` | PreToolUse | Write\|Edit | Block direct edits to `uv.lock` |
-| `pre-write-src-test-reminder.sh` | PreToolUse | Write\|Edit | Warn if test file missing for new source module |
+| `pre-write-src-require-test.sh` | PreToolUse | Write\|Edit | Block if the matching test file does not exist (strict TDD) |
+| `pre-write-src-test-reminder.sh` | PreToolUse | Write\|Edit | Warn if test file missing for new source module (optional alternative to strict TDD) |
 | `pre-write-doc-file-warning.sh` | PreToolUse | Write | Block `.md` files written outside `docs/` |
 | `pre-write-jinja-syntax.sh` | PreToolUse | Write | Validate Jinja2 syntax before writing `.jinja` files |
 | `pre-suggest-compact.sh` | PreToolUse | Edit\|Write | Suggest `/compact` every ~50 operations |
@@ -68,6 +70,7 @@ for the full developer guide.
 | `post-edit-markdown.sh` | PostToolUse | Edit | Warn if existing `.md` edited outside `docs/` |
 | `post-edit-copier-migration.sh` | PostToolUse | Edit\|Write | Migration checklist after `copier.yml` edits |
 | `post-edit-template-mirror.sh` | PostToolUse | Edit\|Write | Remind to mirror `template/.claude/` ↔ root |
+| `post-edit-refactor-test-guard.sh` | PostToolUse | Edit\|Write | Remind to run tests after several `src/` or `scripts/` edits |
 | `post-bash-pr-created.sh` | PostToolUse | Bash | Log PR URL after `gh pr create` |
 | `stop-session-end.sh` | Stop | * | Persist session state JSON |
 | `stop-evaluate-session.sh` | Stop | * | Extract reusable patterns from transcript |
@@ -106,6 +109,9 @@ One Markdown file per slash command. Claude Code reads these files when you invo
 | `/ci` | Run `just ci` and report results |
 | `/test` | Run `just test` and summarise failures |
 | `/dependency-check` | Validate `uv.lock` is committed, in sync, and not stale |
+| `/tdd-red` | Validate RED phase: confirm a test fails for the right reason |
+| `/tdd-green` | Validate GREEN phase: confirm the test passes with no regressions |
+| `/ci-fix` | Autonomous CI fixer: diagnose failures, apply fixes, re-run until green |
 
 ## rules/
 

diff --git a/.claude/commands/ci-fix.md b/.claude/commands/ci-fix.md
@@ -0,0 +1,88 @@
+Diagnose and fix all CI failures in this project. You are an autonomous CI-fixing agent.
+
+## Step 1 — Run CI and capture output
+
+```bash
+just ci 2>&1
+```
+
+If CI passes (exit 0), report success and stop.
+
+## Step 2 — Classify failures
+
+Parse the output and classify each failure into one of these categories:
+
+| Category | Tool | Typical pattern | Fix strategy |
+|---|---|---|---|
+| **Format** | ruff format | "would reformat" | Run `just fmt` — fully automatic |
+| **Lint** | ruff check | `E`, `F`, `I`, `UP`, `B`, `SIM`, `C4`, `RUF`, `PERF`, `T20` | Run `just fix` for auto-fixable; manual fix for others |
+| **Docstring** | ruff `D` rules | `D1xx`, `D2xx`, `D3xx`, `D4xx` | Add or fix Google-style docstrings |
+| **Type** | basedpyright | "error: Type X cannot be assigned to Y" | Fix type annotations, add casts, narrow types |
+| **Test** | pytest | "FAILED tests/..." | Fix test or implementation logic |
+| **Coverage** | pytest-cov | "FAIL Required test coverage" | Add missing tests |
+| **Import** | ruff `I` rules | `I001` | Run `just fix` — auto-fixable |
+
+## Step 3 — Fix in priority order
+
+Fix failures in this order (cheapest and most impactful first):
+
+1. **Format + Import** — `just fix && just fmt` (fully automatic)
+2. **Lint** — `just fix` for auto-fixable, then manual fixes
+3. **Docstrings** — add missing Google-style docstrings. Read the `python-docstrings`
+   skill (`skills/python-docstrings/SKILL.md`) for guidance if available.
+4. **Type errors** — fix annotations. Read the `python-code-quality` skill
+   (`skills/python-code-quality/SKILL.md`) for basedpyright guidance if available.
+5. **Test failures** — read the failing test, understand what it asserts, fix the
+   implementation or the test. Read the `pytest` skill (`skills/pytest/SKILL.md`)
+   for patterns if available.
+6. **Coverage** — identify uncovered lines with `just coverage`, write tests.
+
+## Step 4 — Iterate
+
+After fixing each category:
+
+1. Re-run `just ci 2>&1`
+2. If new failures appear, classify and fix them
+3. Repeat until CI passes (exit 0)
+
+**Hard limit:** Do not attempt more than 5 full CI iterations. If still failing after 5
+rounds, report the remaining failures and ask the user for guidance.
+
+## Step 5 — Report
+
+When CI passes (or after hitting the iteration limit), produce a summary:
+
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+CI Fix Report
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+Status:     ✅ ALL GREEN (or ❌ FAILURES REMAIN)
+Iterations: N
+Duration:   ~Xm
+
+Fixes applied:
+- [Format] Reformatted 3 files
+- [Lint] Fixed 5 ruff violations (2 auto, 3 manual)
+- [Docstring] Added docstrings to 2 functions
+- [Type] Fixed 1 type annotation in core.py
+- [Test] Fixed assertion in test_core.py
+
+Files changed:
+- src/mypackage/core.py
+- tests/test_core.py
+
+Remaining issues: (if any)
+- <description>
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+
+## Rules
+
+- Fix the root cause, not the symptom. A `# type: ignore` is not a fix.
+- Do not weaken linter or type checker configuration.
+- Do not delete or skip tests to make CI pass.
+- Do not use `--no-verify` on any git command.
+- Every fix must preserve existing test behaviour — run tests after each change.
+- If a fix requires architectural changes beyond the scope of the failing code,
+  stop and ask the user instead of making sweeping changes.
diff --git a/.claude/commands/tdd-green.md b/.claude/commands/tdd-green.md
@@ -0,0 +1,48 @@
+Validate the GREEN phase of a TDD cycle: confirm that the previously-failing test now PASSES and no regressions were introduced.
+
+Run the full test suite with `just test`.
+
+Then analyse the output:
+
+1. **Identify the target test** — the test that was RED in the previous phase.
+
+2. **Check three conditions:**
+   - The target test now PASSES
+   - ALL other tests still pass (no regressions)
+   - No new warnings or errors were introduced
+
+3. **Report the result** in this format:
+
+   ```
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+   TDD GREEN Validation
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+   Target test:  <test name>
+   Status:       PASS ✅ (or FAIL ❌)
+
+   Full suite:   X passed, Y failed, Z skipped
+   Regressions:  None (or: <list of newly failing tests>)
+
+   Verdict:      GREEN CONFIRMED — ready for REFACTOR
+                 (or: NOT GREEN — <what to fix>)
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+   ```
+
+4. If GREEN is confirmed, also run a quick coverage check:
+   ```bash
+   uv run pytest --cov=src --cov-report=term-missing -q
+   ```
+   Report coverage on the module that was changed. Flag any uncovered lines.
+
+5. If GREEN is confirmed, remind the user: "Now review and refactor. Tests must stay green throughout. Run `just test` after every change."
+
+6. If GREEN is not confirmed:
+   - If the target test still fails: the implementation is incomplete. Show the error.
+   - If other tests regressed: the implementation broke existing behaviour. List the regressions.
+   - Advise fixing without over-engineering — GREEN means minimal, not perfect.
+
+7. Reset the refactor edit counter if it exists:
+   ```bash
+   echo "0" > .claude/.refactor-edit-count 2>/dev/null || true
+   ```
diff --git a/.claude/commands/tdd-red.md b/.claude/commands/tdd-red.md
@@ -0,0 +1,41 @@
+Validate the RED phase of a TDD cycle: confirm that a specific test FAILS for the right reason.
+
+Run the test suite with `just test` (or the specific test file if the user provides one).
+
+Then analyse the output:
+
+1. **Identify the new/target test** — the user should tell you which test they expect to fail, or you should identify the most recently written test.
+
+2. **Classify the failure** using this table:
+
+   | Failure type | Meaning | Verdict |
+   |---|---|---|
+   | `AssertionError` | Function exists, wrong behaviour | ✅ Ideal RED — proceed to GREEN |
+   | `AttributeError` / `ImportError` | Module or function doesn't exist yet | ✅ Good RED — proceed to GREEN |
+   | `NameError` | Name not defined | ✅ Good RED — proceed to GREEN |
+   | `SyntaxError` / `IndentationError` | Test itself is broken | ❌ Fix the test first |
+   | `TypeError` (wrong signature) | Signature mismatch in test | ❌ Fix the test first |
+   | Test passes unexpectedly | Implementation already exists or test is wrong | ❌ Review the test |
+   | Unrelated test fails | Regression in existing code | ❌ Investigate before continuing |
+
+3. **Report the result** in this format:
+
+   ```
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+   TDD RED Validation
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+   Test:     <test name>
+   File:     <test file path>
+   Status:   FAIL ✅ (or PASS ❌ or ERROR ❌)
+   Type:     <failure type from table>
+   Message:  <first line of error message>
+
+   Verdict:  RED CONFIRMED — ready for GREEN
+             (or: FIX NEEDED — <what to fix>)
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+   ```
+
+4. If RED is confirmed, remind the user: "Now write the minimal implementation to make this test pass. No more, no less."
+
+5. If RED is not confirmed, explain exactly what needs fixing before the TDD cycle can continue.
diff --git a/.claude/hooks/README.md b/.claude/hooks/README.md
@@ -270,9 +270,11 @@ exit 0
 | `pre-bash-block-no-verify.sh` | PreToolUse | Bash | Block `git --no-verify` |
 | `pre-bash-git-push-reminder.sh` | PreToolUse | Bash | Warn to run `just review` before push |
 | `pre-bash-commit-quality.sh` | PreToolUse | Bash | Secret/debug scan before `git commit` |
+| `pre-bash-coverage-gate.sh` | PreToolUse | Bash | Warn before `git commit` if coverage below threshold |
 | `pre-config-protection.sh` | PreToolUse | Write\|Edit\|MultiEdit | Block weakening ruff/pyright config edits |
 | `pre-protect-uv-lock.sh` | PreToolUse | Write\|Edit | Block direct edits to `uv.lock` |
-| `pre-write-src-test-reminder.sh` | PreToolUse | Write\|Edit | Warn if `tests/<pkg>/test_<module>.py` missing for top-level `src/<pkg>/<module>.py` |
+| `pre-write-src-require-test.sh` | PreToolUse | Write\|Edit | Block if `tests/<pkg>/test_<module>.py` missing for top-level `src/<pkg>/<module>.py` (strict TDD) |
+| `pre-write-src-test-reminder.sh` | (optional) | Write\|Edit | Non-blocking alternative to `pre-write-src-require-test.sh` — **do not register both** |
 | `pre-write-doc-file-warning.sh` | PreToolUse | Write | Block `.md` files outside `docs/` |
 | `pre-write-jinja-syntax.sh` | PreToolUse | Write | Validate Jinja2 syntax before writing |
 | `pre-suggest-compact.sh` | PreToolUse | Edit\|Write | Suggest `/compact` every 50 operations |
@@ -283,6 +285,7 @@ exit 0
 | `post-edit-copier-migration.sh` | PostToolUse | Edit\|Write | Migration checklist after `copier.yml` edits |
 | `post-edit-template-mirror.sh` | PostToolUse | Edit\|Write | Remind to mirror `template/.claude/` ↔ root |
 | `post-bash-pr-created.sh` | PostToolUse | Bash | Log PR URL and suggest review commands |
+| `post-edit-refactor-test-guard.sh` | PostToolUse | Edit\|Write | Remind to run tests after several `src/` or `scripts/` edits |
 | `stop-session-end.sh` | Stop | * | Persist session state JSON |
 | `stop-evaluate-session.sh` | Stop | * | Extract reusable patterns from transcript |
 | `stop-cost-tracker.sh` | Stop | * | Track and accumulate session costs |

diff --git a/.claude/hooks/post-edit-refactor-test-guard.sh b/.claude/hooks/post-edit-refactor-test-guard.sh
@@ -0,0 +1,63 @@
+#!/usr/bin/env bash
+# Claude PostToolUse hook — Edit|Write
+# Refactor safety net: after editing a src/**/*.py or scripts/**/*.py file, remind the developer to
+# re-run tests if they haven't done so recently.
+#
+# This hook tracks the last test run via a timestamp file (.last-test-run). If
+# more than 3 source edits have occurred since the last test run, it surfaces a
+# reminder. This prevents silent regressions during the REFACTOR stage of TDD.
+#
+# Exits: 0 always (PostToolUse hooks cannot block)
+
+set -euo pipefail
+
+INPUT=$(cat)
+
+FILE_PATH=$(printf '%s' "$INPUT" | python3 -c '
+import json, sys
+
+data = json.load(sys.stdin)
+print(data.get("tool_input", {}).get("file_path", ""))
+') || {
+    echo "$INPUT"
+    exit 0
+}
+
+# Only care about Python source files (not tests).
+if [[ "$FILE_PATH" != *.py ]] || [[ -z "$FILE_PATH" ]] || [[ ! -f "$FILE_PATH" ]]; then
+    exit 0
+fi
+
+# Normalise path.
+FP="${FILE_PATH//\\//}"
+FP="${FP#./}"
+
+# Only fire for src/ or scripts/ files.
+if [[ ! "$FP" =~ ^(.*/)?(src|scripts)/ ]]; then
+    exit 0
+fi
+
+# Track edit count since last test run.
+COUNTER_FILE=".claude/.refactor-edit-count"
+mkdir -p .claude
+
+if [[ -f "$COUNTER_FILE" ]]; then
+    COUNT=$(cat "$COUNTER_FILE" 2>/dev/null || echo "0")
+    COUNT=$((COUNT + 1))
+else
+    COUNT=1
+fi
+
+echo "$COUNT" > "$COUNTER_FILE"
+
+# Remind after every 3 edits.
+if [[ $((COUNT % 3)) -eq 0 ]]; then
+    echo "┌─ Refactor test guard"
+    echo "│"
+    echo "│  ${COUNT} source edits since last test run."
+    echo "│  Run \`just test\` to catch regressions early."
+    echo "│  In TDD: tests must stay green throughout REFACTOR."
+    echo "└─ ⚠  Consider running tests"
+fi
+
+exit 0
diff --git a/.claude/hooks/pre-bash-coverage-gate.sh b/.claude/hooks/pre-bash-coverage-gate.sh
@@ -0,0 +1,75 @@
+#!/usr/bin/env bash
+# Claude PreToolUse hook — Bash
+# Coverage gate: before git commit, run a quick coverage check and warn if
+# coverage has dropped below the project threshold (default 85%).
+#
+# This is a non-blocking warning. The pre-commit hook and CI will enforce the
+# hard gate — this hook gives earlier feedback so the developer can add missing
+# tests before committing.
+#
+# Exits: 0 always (non-blocking; warnings on stderr only)
+
+set -uo pipefail
+
+INPUT=$(cat)
+
+COMMAND=$(python3 - <<'PYEOF'
+import json, sys
+
+data = json.loads(sys.stdin.read())
+print(data.get("tool_input", {}).get("command", ""))
+PYEOF
+<<<"$INPUT") || { echo "$INPUT"; exit 0; }
+
+# Only fire for git commit commands.
+if ! echo "$COMMAND" | grep -qE '^\s*git\s+commit\b'; then
+    echo "$INPUT"
+    exit 0
+fi
+
+# Check if any staged Python files exist.
+STAGED_PY=$(git diff --cached --name-only --diff-filter=ACMR 2>/dev/null \
+    | grep '\.py$' || true)
+
+if [[ -z "$STAGED_PY" ]]; then
+    echo "$INPUT"
+    exit 0
+fi
+
+# Run the test suite with coverage (collect-only would not execute tests, so no
+# coverage data would be produced). Non-zero exit from pytest is ignored; we
+# parse the terminal report for TOTAL coverage.
+echo "┌─ Coverage gate" >&2
+
+# Note: do not hardcode the package path here; let the project's coverage config
+# decide what is measured (e.g. src/ for generated projects, tests/ for this repo).
+COV_OUTPUT=$(uv run --active pytest -q --cov --cov-report=term-missing \
+    --cov-fail-under=85 2>&1 || true)
+
+# Extract the total coverage percentage.
+TOTAL_COV=$(echo "$COV_OUTPUT" | grep -oE 'TOTAL\s+[0-9]+\s+[0-9]+\s+([0-9]+)%' \
+    | grep -oE '[0-9]+%$' || echo "")
+
+if [[ -z "$TOTAL_COV" ]]; then
+    # Could not determine coverage — skip silently.
+    echo "│  Could not determine coverage — skipping check" >&2
+    echo "└─" >&2
+    echo "$INPUT"
+    exit 0
+fi
+
+COV_NUM="${TOTAL_COV%\%}"
+
+if [[ "$COV_NUM" -lt 85 ]]; then
+    echo "│" >&2
+    echo "│  ⚠  Coverage is ${TOTAL_COV} (threshold: 85%)" >&2
+    echo "│  Consider adding tests before committing." >&2
+    echo "│  Run \`just coverage\` to see which lines are uncovered." >&2
+    echo "└─ ⚠  Coverage below threshold" >&2
+else
+    echo "│  Coverage: ${TOTAL_COV} ✓" >&2
+    echo "└─ ✓ Coverage gate passed" >&2
+fi
+
+echo "$INPUT"
+exit 0