Skip to content

fix(monitor): TAP-1201 — mid-loop visibility + accurate liveness detection#6

Merged
wtthornton merged 1 commit into
mainfrom
tap-1201-monitor-visibility
May 3, 2026
Merged

fix(monitor): TAP-1201 — mid-loop visibility + accurate liveness detection#6
wtthornton merged 1 commit into
mainfrom
tap-1201-monitor-visibility

Conversation

@wtthornton
Copy link
Copy Markdown
Owner

Summary

Eliminates the false "LIKELY DEAD" warning while a long Claude call is in flight, and surfaces "Working on:" / "Model:" mid-loop instead of after the on-stop hook fires.

New _classify_liveness (HEALTHY / STALE / DEAD / UNKNOWN) factors three signals:

  1. status.json mtime (existing)
  2. live.log mtime within LIVE_LOG_FRESH_SECS (default 60s) → HEALTHY
  3. ralph_loop.sh PID alive (pgrep) → never DEAD while alive

DEAD now requires BOTH stale status.json AND no live process — exactly the conditions that masked the April-2026 NLTlabsPE Loop 1 incident.

Always-render rows: Working on (placeholder "(awaiting first loop)") and Model (same placeholder). Working on reads from new .ralph/.current_issue file (mid-loop) → linear_issue → last_linear_issue → placeholder.

New PreToolUse hook templates/hooks/on-linear-tool.sh writes .current_issue atomically when Claude calls a Linear MCP tool. Per-project opt-in via .claude/settings.json matcher mcp__plugin_linear_linear__.* — wiring is documented in the hook header.

Test plan

  • 8 BATS cases pin every corner of the classifier
  • Manual: monitor a real loop and confirm no false DEAD during a long Claude call
  • Manual: confirm "Working on:" updates mid-loop after wiring the hook

🤖 Generated with Claude Code

…ction

The April-2026 NLTlabsPE Loop 1 incident had ralph-monitor flashing
"LIKELY DEAD" for 3+ minutes while Claude was actively committing
to main. Root cause: liveness was decided on status.json mtime alone,
which on-stop.sh writes only after Claude returns. Long Claude calls
made an active loop look dead; "Working on:" and "Model:" rows hid
entirely when their JSON fields were null, forcing operators to
grep logs for context.

Three behavior changes:

1. New _classify_liveness() (HEALTHY / STALE / DEAD / UNKNOWN) factors
   in three signals:
   - status.json mtime (existing)
   - live.log mtime within LIVE_LOG_FRESH_SECS (default 60s) → HEALTHY
   - ralph_loop.sh PID alive (pgrep) → never DEAD while alive
   DEAD now requires BOTH status_age >= STALE_DEAD_SECS AND no live
   process — the conditions that originally fired the false alarm.

2. "Working on:" row is always rendered, with a placeholder
   "(awaiting first loop)" when no signal is available. Pulls from a
   new .ralph/.current_issue file (mid-loop) → linear_issue (last hook
   write) → last_linear_issue → placeholder. Sanitized through `tr -dc`
   so a malformed write can't break ANSI rendering.

3. "Model:" row is always rendered, with the same placeholder pattern.
   Previously hidden entirely until the first hook fired.

New PreToolUse hook templates/hooks/on-linear-tool.sh writes the issue
identifier (TAP-NNNN-style) to .ralph/.current_issue atomically when
Claude calls a Linear MCP tool. Wiring is per-project opt-in via
.claude/settings.json (matcher: "mcp__plugin_linear_linear__.*") —
documented in the hook's header comment. The monitor reads the file
defensively if present, so partial adoption works.

Tests: 8 BATS cases pin every corner of the classifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wtthornton wtthornton merged commit 3000189 into main May 3, 2026
2 checks passed
@wtthornton wtthornton deleted the tap-1201-monitor-visibility branch May 3, 2026 00:01
wtthornton added a commit that referenced this pull request May 4, 2026
* test+ci: enforce hook+workflow invariants instead of brittle counts

Five tests were exposed once the bats count mismatch was fixed (PR #8).
Three of the five were brittle assertions that locked out legitimate
plugin ecosystems; two were a real workflow gap. Long-term right fix
is to encode the actual invariants and close the gap, not silence
the tests.

Workflow gap fixed:
- .github/workflows/codeql-analysis.yml now pins `defaults.run.shell:
  bash` per the TAP-667 standard. Previously the only hand-authored
  workflow without this. Two tests (#3, #6) were really one root cause.

Test invariants tightened (instead of "exactly N" counts):
- HOOKS-2: `bash <path>` commands must reference EITHER .ralph/hooks/
  OR .claude/hooks/. The original test rejected .claude/hooks/ entries
  and broke as soon as tapps-mcp registered hooks there.
- "all hook commands start with bash": now also accepts the bare
  .claude/hooks/<name>.sh form that tapps-mcp emits when registering
  Linear MCP governance hooks. Catches garbage paths / tool names
  (Write/Edit) without policing plugin command-emission style.
- "PreToolUse has exactly two entries": rewritten to verify Ralph's
  two safety hooks (Bash → validate-command.sh, Edit|Write →
  protect-ralph-files.sh) are present AND wired to the right scripts.
  Plugin-injected entries are allowed; what's protected is removal
  or rewiring of Ralph's own defenses.

Local: 1455/1455 unit tests passing, no warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(integration): replace dead ALLOWED_TOOLS asserts with negative invariant

Two integration tests asserted that setup.sh-generated .ralphrc shipped
with `ALLOWED_TOOLS="..."` containing Bash(npm *)/Bash(pytest), but
ADR-0006 deleted the legacy `-p` mode and the ALLOWED_TOOLS allowlist
along with it — tool surface now lives in .claude/agents/ralph.md
(`tools:` allowlist + `disallowedTools:` blocklist).

Replaced both with a single negative assertion: if a future change
re-introduces `ALLOWED_TOOLS=` to .ralphrc, this test fires so we
don't silently split the tool-surface contract across two files
again. The positive invariant (tool surface defined in agent file)
is already covered by HOOKS-6 in tests/unit/test_hooks.bats.

Local: full integration suite (203 tests) passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(evals): split .ralphrc protection into create-allowed vs edit-blocked

The eval test "FILE PROTECTION: blocks edit to .ralphrc" asserted the
hook returns exit 2 when no .ralphrc exists in the test fixture dir.
But the hook's actual contract (HOOKS-5 in tests/unit/test_hooks.bats)
is: ALLOW creating a new .ralphrc when absent, BLOCK editing once it
exists. The test was asserting the wrong half of the contract — it
went red the moment the eval suite started running end-to-end (post
PR #8 / #9 fixes that unmasked the eval step).

Split into two tests that match the real invariant:
  1. Edit on EXISTING .ralphrc → blocked (touch then test)
  2. Create on ABSENT .ralphrc → allowed (HOOKS-5 already covers this
     for the hook script directly; this is the eval-level mirror)

Local: 69/69 deterministic evals passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant