fix(monitor): TAP-1201 — mid-loop visibility + accurate liveness detection#6
Merged
Merged
Conversation
…ction The April-2026 NLTlabsPE Loop 1 incident had ralph-monitor flashing "LIKELY DEAD" for 3+ minutes while Claude was actively committing to main. Root cause: liveness was decided on status.json mtime alone, which on-stop.sh writes only after Claude returns. Long Claude calls made an active loop look dead; "Working on:" and "Model:" rows hid entirely when their JSON fields were null, forcing operators to grep logs for context. Three behavior changes: 1. New _classify_liveness() (HEALTHY / STALE / DEAD / UNKNOWN) factors in three signals: - status.json mtime (existing) - live.log mtime within LIVE_LOG_FRESH_SECS (default 60s) → HEALTHY - ralph_loop.sh PID alive (pgrep) → never DEAD while alive DEAD now requires BOTH status_age >= STALE_DEAD_SECS AND no live process — the conditions that originally fired the false alarm. 2. "Working on:" row is always rendered, with a placeholder "(awaiting first loop)" when no signal is available. Pulls from a new .ralph/.current_issue file (mid-loop) → linear_issue (last hook write) → last_linear_issue → placeholder. Sanitized through `tr -dc` so a malformed write can't break ANSI rendering. 3. "Model:" row is always rendered, with the same placeholder pattern. Previously hidden entirely until the first hook fired. New PreToolUse hook templates/hooks/on-linear-tool.sh writes the issue identifier (TAP-NNNN-style) to .ralph/.current_issue atomically when Claude calls a Linear MCP tool. Wiring is per-project opt-in via .claude/settings.json (matcher: "mcp__plugin_linear_linear__.*") — documented in the hook's header comment. The monitor reads the file defensively if present, so partial adoption works. Tests: 8 BATS cases pin every corner of the classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
wtthornton
added a commit
that referenced
this pull request
May 4, 2026
* test+ci: enforce hook+workflow invariants instead of brittle counts Five tests were exposed once the bats count mismatch was fixed (PR #8). Three of the five were brittle assertions that locked out legitimate plugin ecosystems; two were a real workflow gap. Long-term right fix is to encode the actual invariants and close the gap, not silence the tests. Workflow gap fixed: - .github/workflows/codeql-analysis.yml now pins `defaults.run.shell: bash` per the TAP-667 standard. Previously the only hand-authored workflow without this. Two tests (#3, #6) were really one root cause. Test invariants tightened (instead of "exactly N" counts): - HOOKS-2: `bash <path>` commands must reference EITHER .ralph/hooks/ OR .claude/hooks/. The original test rejected .claude/hooks/ entries and broke as soon as tapps-mcp registered hooks there. - "all hook commands start with bash": now also accepts the bare .claude/hooks/<name>.sh form that tapps-mcp emits when registering Linear MCP governance hooks. Catches garbage paths / tool names (Write/Edit) without policing plugin command-emission style. - "PreToolUse has exactly two entries": rewritten to verify Ralph's two safety hooks (Bash → validate-command.sh, Edit|Write → protect-ralph-files.sh) are present AND wired to the right scripts. Plugin-injected entries are allowed; what's protected is removal or rewiring of Ralph's own defenses. Local: 1455/1455 unit tests passing, no warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integration): replace dead ALLOWED_TOOLS asserts with negative invariant Two integration tests asserted that setup.sh-generated .ralphrc shipped with `ALLOWED_TOOLS="..."` containing Bash(npm *)/Bash(pytest), but ADR-0006 deleted the legacy `-p` mode and the ALLOWED_TOOLS allowlist along with it — tool surface now lives in .claude/agents/ralph.md (`tools:` allowlist + `disallowedTools:` blocklist). Replaced both with a single negative assertion: if a future change re-introduces `ALLOWED_TOOLS=` to .ralphrc, this test fires so we don't silently split the tool-surface contract across two files again. The positive invariant (tool surface defined in agent file) is already covered by HOOKS-6 in tests/unit/test_hooks.bats. Local: full integration suite (203 tests) passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(evals): split .ralphrc protection into create-allowed vs edit-blocked The eval test "FILE PROTECTION: blocks edit to .ralphrc" asserted the hook returns exit 2 when no .ralphrc exists in the test fixture dir. But the hook's actual contract (HOOKS-5 in tests/unit/test_hooks.bats) is: ALLOW creating a new .ralphrc when absent, BLOCK editing once it exists. The test was asserting the wrong half of the contract — it went red the moment the eval suite started running end-to-end (post PR #8 / #9 fixes that unmasked the eval step). Split into two tests that match the real invariant: 1. Edit on EXISTING .ralphrc → blocked (touch then test) 2. Create on ABSENT .ralphrc → allowed (HOOKS-5 already covers this for the hook script directly; this is the eval-level mirror) Local: 69/69 deterministic evals passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminates the false "LIKELY DEAD" warning while a long Claude call is in flight, and surfaces "Working on:" / "Model:" mid-loop instead of after the on-stop hook fires.
New
_classify_liveness(HEALTHY / STALE / DEAD / UNKNOWN) factors three signals:LIVE_LOG_FRESH_SECS(default 60s) → HEALTHYralph_loop.shPID alive (pgrep) → never DEAD while aliveDEAD now requires BOTH stale status.json AND no live process — exactly the conditions that masked the April-2026 NLTlabsPE Loop 1 incident.
Always-render rows: Working on (placeholder "(awaiting first loop)") and Model (same placeholder). Working on reads from new
.ralph/.current_issuefile (mid-loop) → linear_issue → last_linear_issue → placeholder.New PreToolUse hook
templates/hooks/on-linear-tool.shwrites.current_issueatomically when Claude calls a Linear MCP tool. Per-project opt-in via.claude/settings.jsonmatchermcp__plugin_linear_linear__.*— wiring is documented in the hook header.Test plan
🤖 Generated with Claude Code