fix(monitor): TAP-1201 — mid-loop visibility + accurate liveness detection by wtthornton · Pull Request #6 · wtthornton/ralph-claude-code

wtthornton · 2026-05-02T22:31:55Z

Summary

Eliminates the false "LIKELY DEAD" warning while a long Claude call is in flight, and surfaces "Working on:" / "Model:" mid-loop instead of after the on-stop hook fires.

New _classify_liveness (HEALTHY / STALE / DEAD / UNKNOWN) factors three signals:

status.json mtime (existing)
live.log mtime within LIVE_LOG_FRESH_SECS (default 60s) → HEALTHY
ralph_loop.sh PID alive (pgrep) → never DEAD while alive

DEAD now requires BOTH stale status.json AND no live process — exactly the conditions that masked the April-2026 NLTlabsPE Loop 1 incident.

Always-render rows: Working on (placeholder "(awaiting first loop)") and Model (same placeholder). Working on reads from new .ralph/.current_issue file (mid-loop) → linear_issue → last_linear_issue → placeholder.

New PreToolUse hook templates/hooks/on-linear-tool.sh writes .current_issue atomically when Claude calls a Linear MCP tool. Per-project opt-in via .claude/settings.json matcher mcp__plugin_linear_linear__.* — wiring is documented in the hook header.

Test plan

8 BATS cases pin every corner of the classifier
Manual: monitor a real loop and confirm no false DEAD during a long Claude call
Manual: confirm "Working on:" updates mid-loop after wiring the hook

🤖 Generated with Claude Code

…ction The April-2026 NLTlabsPE Loop 1 incident had ralph-monitor flashing "LIKELY DEAD" for 3+ minutes while Claude was actively committing to main. Root cause: liveness was decided on status.json mtime alone, which on-stop.sh writes only after Claude returns. Long Claude calls made an active loop look dead; "Working on:" and "Model:" rows hid entirely when their JSON fields were null, forcing operators to grep logs for context. Three behavior changes: 1. New _classify_liveness() (HEALTHY / STALE / DEAD / UNKNOWN) factors in three signals: - status.json mtime (existing) - live.log mtime within LIVE_LOG_FRESH_SECS (default 60s) → HEALTHY - ralph_loop.sh PID alive (pgrep) → never DEAD while alive DEAD now requires BOTH status_age >= STALE_DEAD_SECS AND no live process — the conditions that originally fired the false alarm. 2. "Working on:" row is always rendered, with a placeholder "(awaiting first loop)" when no signal is available. Pulls from a new .ralph/.current_issue file (mid-loop) → linear_issue (last hook write) → last_linear_issue → placeholder. Sanitized through `tr -dc` so a malformed write can't break ANSI rendering. 3. "Model:" row is always rendered, with the same placeholder pattern. Previously hidden entirely until the first hook fired. New PreToolUse hook templates/hooks/on-linear-tool.sh writes the issue identifier (TAP-NNNN-style) to .ralph/.current_issue atomically when Claude calls a Linear MCP tool. Wiring is per-project opt-in via .claude/settings.json (matcher: "mcp__plugin_linear_linear__.*") — documented in the hook's header comment. The monitor reads the file defensively if present, so partial adoption works. Tests: 8 BATS cases pin every corner of the classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test+ci: enforce hook+workflow invariants instead of brittle counts Five tests were exposed once the bats count mismatch was fixed (PR #8). Three of the five were brittle assertions that locked out legitimate plugin ecosystems; two were a real workflow gap. Long-term right fix is to encode the actual invariants and close the gap, not silence the tests. Workflow gap fixed: - .github/workflows/codeql-analysis.yml now pins `defaults.run.shell: bash` per the TAP-667 standard. Previously the only hand-authored workflow without this. Two tests (#3, #6) were really one root cause. Test invariants tightened (instead of "exactly N" counts): - HOOKS-2: `bash <path>` commands must reference EITHER .ralph/hooks/ OR .claude/hooks/. The original test rejected .claude/hooks/ entries and broke as soon as tapps-mcp registered hooks there. - "all hook commands start with bash": now also accepts the bare .claude/hooks/<name>.sh form that tapps-mcp emits when registering Linear MCP governance hooks. Catches garbage paths / tool names (Write/Edit) without policing plugin command-emission style. - "PreToolUse has exactly two entries": rewritten to verify Ralph's two safety hooks (Bash → validate-command.sh, Edit|Write → protect-ralph-files.sh) are present AND wired to the right scripts. Plugin-injected entries are allowed; what's protected is removal or rewiring of Ralph's own defenses. Local: 1455/1455 unit tests passing, no warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integration): replace dead ALLOWED_TOOLS asserts with negative invariant Two integration tests asserted that setup.sh-generated .ralphrc shipped with `ALLOWED_TOOLS="..."` containing Bash(npm *)/Bash(pytest), but ADR-0006 deleted the legacy `-p` mode and the ALLOWED_TOOLS allowlist along with it — tool surface now lives in .claude/agents/ralph.md (`tools:` allowlist + `disallowedTools:` blocklist). Replaced both with a single negative assertion: if a future change re-introduces `ALLOWED_TOOLS=` to .ralphrc, this test fires so we don't silently split the tool-surface contract across two files again. The positive invariant (tool surface defined in agent file) is already covered by HOOKS-6 in tests/unit/test_hooks.bats. Local: full integration suite (203 tests) passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(evals): split .ralphrc protection into create-allowed vs edit-blocked The eval test "FILE PROTECTION: blocks edit to .ralphrc" asserted the hook returns exit 2 when no .ralphrc exists in the test fixture dir. But the hook's actual contract (HOOKS-5 in tests/unit/test_hooks.bats) is: ALLOW creating a new .ralphrc when absent, BLOCK editing once it exists. The test was asserting the wrong half of the contract — it went red the moment the eval suite started running end-to-end (post PR #8 / #9 fixes that unmasked the eval step). Split into two tests that match the real invariant: 1. Edit on EXISTING .ralphrc → blocked (touch then test) 2. Create on ABSENT .ralphrc → allowed (HOOKS-5 already covers this for the hook script directly; this is the eval-level mirror) Local: 69/69 deterministic evals passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wtthornton merged commit 3000189 into main May 3, 2026
2 checks passed

wtthornton deleted the tap-1201-monitor-visibility branch May 3, 2026 00:01

wtthornton mentioned this pull request May 3, 2026

test+ci: enforce hook+workflow invariants instead of brittle counts #9

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(monitor): TAP-1201 — mid-loop visibility + accurate liveness detection#6

fix(monitor): TAP-1201 — mid-loop visibility + accurate liveness detection#6
wtthornton merged 1 commit into
mainfrom
tap-1201-monitor-visibility

wtthornton commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wtthornton commented May 2, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant