You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In local Gastown, the Witness and Deacon are persistent AI agent sessions running continuous patrol loops in tmux. They burn LLM tokens on every cycle, even though ~90% of their behavior is mechanical — threshold checks, protocol message routing, session liveness detection, timer evaluation. Genuine reasoning is only needed for a small set of ambiguous situations.
The cloud should not replicate this. The TownDO alarm IS the patrol loop. It should run all mechanical checks as deterministic code, and only spawn short-lived LLM agent sessions when a check produces an ambiguous result that requires reasoning.
Dispatches idle+hooked agents, marks beads failed after 5 dispatch attempts
witnessPatrol()
Partial
Only does zombie detection (container status reconciliation) and basic GUPP mail (30-min stale last_activity_at sends a warning mail, no escalation or force-stop)
deliverPendingMail()
Done
Pushes undelivered mail to working agents
reEscalateStaleEscalations()
Done
Bumps severity of unacknowledged escalations after 4h thresholds
What's completely missing:
deaconPatrol() — no function exists; some behaviors are scattered across other alarm sub-tasks
Stale hook detection (idle agent + hook + no dispatch for extended period)
Stranded convoy detection (convoy has open beads with no assigned agent)
Agent GC (dead/completed agents accumulate in the DB indefinitely)
Per-bead timeout enforcement (no timer gates)
Triage request queue and LLM triage agent dispatch
External health watchdog (no Cron Trigger or independent alarm monitor)
Witness/deacon system prompts (only a one-line stub for witness)
Background: What Local Gastown's Witness & Deacon Actually Do
Three execution layers in local Gastown
Go Daemon — Pure Go process on a 3-minute heartbeat. All behavior is mechanical. Handles session liveness, crash loop detection, orphan cleanup, GUPP checks, heartbeat freshness.
Stranded convoy detection (open beads with no assigned agent)
Gate evaluation (elapsed time > timeout)
Crash loop detection (restart count + timing with exponential backoff)
Intelligent behaviors (genuinely need LLM reasoning)
These are the ~10% of behaviors where deterministic code can't make the right call:
Behavior
Why LLM is needed
Dirty polecat triage
Must read git status/git diff output and judge if uncommitted changes are valuable work worth saving, disposable artifacts, or a confused state requiring escalation
Refinery queue health assessment
Must reason about queue depth, staleness patterns, time context — no hardcoded thresholds
Live agent progress inspection
Must interpret agent conversation/activity output to determine if an agent is stuck, thinking deeply, or making slow but real progress
Help request handling
When a polecat sends HELP, must understand the problem domain and craft contextual guidance
Escalation assessment
Must understand an escalation's context to decide: handle locally, forward to Mayor, or alert human
Zombie scan confirmation
Verify that automated zombie detection results are correct before taking destructive action (nuke)
The alarm already handles the items marked done below. This issue adds the items marked as new:
TownDO.alarm()
├── ensureContainerReady() -- exists
├── processReviewQueue() -- exists
├── processConvoyLandings() -- exists
├── schedulePendingWork() -- exists
├── witnessPatrol() -- expand with:
│ ├── detectZombies() -- exists (container status reconciliation)
│ ├── detectGUPPViolations() -- exists but only sends mail; add escalation + force-stop after threshold
│ ├── detectOrphanedWork() NEW: idle+hooked agents with no dispatch activity
│ ├── agentGC() NEW: delete dead/completed agents past retention period
│ ├── checkTimerGates() NEW: per-bead timeout enforcement
│ └── flagForTriage() NEW: dirty/ambiguous → create triage request bead
├── deaconPatrol() NEW function:
│ ├── detectStaleHooks() NEW: hooked for unreasonable duration without activity
│ ├── feedStrandedConvoys() NEW: convoy has open beads with no assignee → auto-sling
│ └── detectCrashLoops() NEW: same agent failing repeatedly in short window
├── deliverPendingMail() -- exists
├── reEscalateStaleEscalations() -- exists
└── maybeDispatchTriageAgent() NEW: if triage request beads queued, spawn triage agent
On-demand triage agent (LLM, spawned when needed)
When the alarm's mechanical checks produce results that need reasoning, it queues them as triage request beads (type = 'triage_request') with structured context. When the queue is non-empty, the alarm dispatches a short-lived triage agent in the container.
The triage_request bead type must be added to the BeadType enum in src/types.ts and src/db/tables/beads.table.ts.
// In TownDO alarm handlerconsttriageQueue=awaitthis.listBeads({type: 'triage_request',status: 'open'});if(triageQueue.length>0){awaitthis.dispatchTriageAgent(triageQueue);}
The triage agent gets a focused system prompt:
You are a Gastown triage agent. You will be given a list of situations that
require judgment. For each one, assess the situation and take one of the
prescribed actions. Be decisive. When done, call gt_done.
Situations to assess:
1. [DIRTY_POLECAT] Agent "Toast" has uncommitted changes after completion.
Git status: <output>
Git diff --stat: <output>
Options: COMMIT_AND_PUSH | DISCARD | ESCALATE
2. [STUCK_AGENT] Agent "Maple" has not made progress in 45 minutes.
Last activity: <timestamp>
Recent conversation tail: <last 20 lines>
Options: NUDGE | RESTART | ESCALATE
3. [HELP_REQUEST] Agent "Shadow" sent HELP: "Can't resolve merge conflict in auth.ts"
Context: <bead body>
Options: PROVIDE_GUIDANCE | ESCALATE_TO_MAYOR
The triage agent processes each item, takes an action (via tool calls back to the TownDO), and exits. Session lifetime: seconds to minutes, not hours. LLM cost: proportional to actual ambiguity in the system, not to wall-clock uptime.
-- No separate table needed. Uses the universal beads table.-- type = 'triage_request'-- metadata JSON contains the structured context:
{
"triage_type": "dirty_polecat", -- or "stuck_agent", "help_request", "queue_health", "zombie_confirm""agent_bead_id": "...", -- which agent this concerns"context": { -- type-specific context"git_status": "...",
"git_diff_stat": "..."
},
"options": ["COMMIT_AND_PUSH", "DISCARD", "ESCALATE"]
}
Triage agent tools
The triage agent needs a narrow tool set (subset of the existing plugin):
Tool
Purpose
gt_triage_resolve
Resolve a triage request with a chosen action. TownDO executes the action (nuke, restart, escalate, etc.)
gt_mail_send
Send contextual guidance to a stuck agent
gt_escalate
Forward to Mayor or human
gt_nudge
Send a message to a running agent's session
gt_done
Signal triage session complete
When the alarm does NOT spawn a triage agent
Most patrol cycles will have zero ambiguous situations. The alarm runs, all checks pass (or produce clear mechanical outcomes like agent GC), and no triage agent is needed. The LLM is only invoked when the system encounters genuine uncertainty.
Expected frequency: triage agents spawn on <10% of alarm cycles in a healthy town. In a town with many stuck or failing agents, they'll spawn more often — which is correct, because that's when reasoning is most valuable.
What This Replaces
Local Gastown
Cloud Gastown
Go Daemon (3-min heartbeat)
TownDO alarm (5s active / 1m idle)
Boot agent (ephemeral AI triage per tick)
Not needed — TownDO alarm is the external observer
Deacon (persistent AI patrol loop)
deaconPatrol() in alarm handler (mechanical) + on-demand triage agent (intelligent)
Witness (persistent AI patrol loop per rig)
witnessPatrol() in alarm handler (mechanical) + on-demand triage agent (intelligent)
Why the watchdog chain simplifies
Local Gastown needs Boot→Deacon→Witness because "a hung Deacon can't detect it's hung" — Boot provides an external observer. In the cloud, DO alarms are the external observer. They're durable (re-fire after eviction), managed by the Cloudflare runtime (not by user code that can hang), and independent of the container. If the container dies, the alarm still fires and detects dead agents. The three-tier watchdog chain collapses to: DO alarm (always fires) → mechanical checks → triage agent (when needed).
One risk: a logic bug in the alarm handler could silently break the town. Mitigation: a Cron Trigger that pings each active town's health endpoint independently of the DO alarm, providing an external watchdog analogous to Boot.
Implementation Plan
Step 1: Expand witnessPatrol() with full mechanical checks
Enhance the existing witnessPatrol() to cover: GUPP escalation (not just mail — escalate after a second threshold, force-stop after a third), orphaned work detection (idle+hooked+no dispatch), agent GC (delete dead agents past retention), per-bead timeout enforcement.
Step 2: Add deaconPatrol()
New alarm sub-task covering: stale hook detection, stranded convoy feeding, crash loop detection.
Step 3: Triage request queue
Add triage_request to the BeadType enum in src/types.ts and src/db/tables/beads.table.ts. When mechanical checks produce ambiguous results (dirty polecat, stuck agent, help request), create triage request beads with structured context instead of taking immediate action.
Step 4: Triage agent dispatch
When triage requests are queued, the alarm dispatches a short-lived triage agent session in the container with a focused prompt and narrow tool set. The agent processes all pending requests and exits. Add a system prompt at src/prompts/triage-system.prompt.ts.
Step 5: Triage agent tools
Add gt_triage_resolve tool to the container plugin. This tool takes a triage request bead ID and a chosen action, and the TownDO executes the action (nuke agent, restart agent, send mail, escalate, etc.).
Step 6: External health watchdog
Add a Cron Trigger (or separate DO with its own alarm) that periodically verifies each active town's alarm is firing and its container is responsive. This replaces Boot's role as the external observer.
Acceptance Criteria
witnessPatrol() expanded: GUPP escalation after threshold, orphaned work detection, agent GC, per-bead timeouts
The existing PatrolResult type in src/types.ts defines dead_agents, stale_agents, orphaned_beads arrays but is unused — can be repurposed or replaced for the expanded witness/deacon patrols.
The witness role exists as a town-wide singleton in getOrCreateAgent() but has only a one-line stub prompt in container-dispatch.ts. The triage agent replaces what would have been a persistent witness LLM session.
Overview
In local Gastown, the Witness and Deacon are persistent AI agent sessions running continuous patrol loops in tmux. They burn LLM tokens on every cycle, even though ~90% of their behavior is mechanical — threshold checks, protocol message routing, session liveness detection, timer evaluation. Genuine reasoning is only needed for a small set of ambiguous situations.
The cloud should not replicate this. The TownDO alarm IS the patrol loop. It should run all mechanical checks as deterministic code, and only spawn short-lived LLM agent sessions when a check produces an ambiguous result that requires reasoning.
Parent: #204
Current State — What Already Works
The TownDO alarm loop (
alarm()atsrc/dos/Town.do.ts) already runs 7 sub-tasks on a 5s (active) / 1m (idle) interval:ensureContainerReady()/health, triggers restart if deadprocessReviewQueue()processConvoyLandings()ready_to_landconvoys, creates landing MRschedulePendingWork()failedafter 5 dispatch attemptswitnessPatrol()last_activity_atsends a warning mail, no escalation or force-stop)deliverPendingMail()reEscalateStaleEscalations()What's completely missing:
deaconPatrol()— no function exists; some behaviors are scattered across other alarm sub-tasksWhat was intentionally eliminated:
agentDone()creates MR beads directly,completeReviewWithResult()closes them. No protocol messages needed.Background: What Local Gastown's Witness & Deacon Actually Do
Three execution layers in local Gastown
mol-deacon-patrolformula. Continuous loop: inbox check → orphan cleanup → spawn triggers → gate evaluation → convoy checks → health scan → zombie scan → plugin run → loop.mol-witness-patrolformula. Continuous loop: inbox check → process cleanups → check refinery → survey workers → loop.Mechanical behaviors (deterministic — no LLM needed)
These are implemented as Go handler functions that the LLM agents invoke but don't reason about:
Witness mechanical behaviors:
last_activity_at)created_at + timeout < now)Deacon mechanical behaviors:
Intelligent behaviors (genuinely need LLM reasoning)
These are the ~10% of behaviors where deterministic code can't make the right call:
git status/git diffoutput and judge if uncommitted changes are valuable work worth saving, disposable artifacts, or a confused state requiring escalationCloud Architecture: Alarm + On-Demand Triage Agents
TownDO alarm handler — what needs to be added
The alarm already handles the items marked done below. This issue adds the items marked as new:
On-demand triage agent (LLM, spawned when needed)
When the alarm's mechanical checks produce results that need reasoning, it queues them as triage request beads (
type = 'triage_request') with structured context. When the queue is non-empty, the alarm dispatches a short-lived triage agent in the container.The
triage_requestbead type must be added to theBeadTypeenum insrc/types.tsandsrc/db/tables/beads.table.ts.The triage agent gets a focused system prompt:
The triage agent processes each item, takes an action (via tool calls back to the TownDO), and exits. Session lifetime: seconds to minutes, not hours. LLM cost: proportional to actual ambiguity in the system, not to wall-clock uptime.
Triage request bead schema
Triage requests are beads (consistent with #441):
Triage agent tools
The triage agent needs a narrow tool set (subset of the existing plugin):
gt_triage_resolvegt_mail_sendgt_escalategt_nudgegt_doneWhen the alarm does NOT spawn a triage agent
Most patrol cycles will have zero ambiguous situations. The alarm runs, all checks pass (or produce clear mechanical outcomes like agent GC), and no triage agent is needed. The LLM is only invoked when the system encounters genuine uncertainty.
Expected frequency: triage agents spawn on <10% of alarm cycles in a healthy town. In a town with many stuck or failing agents, they'll spawn more often — which is correct, because that's when reasoning is most valuable.
What This Replaces
deaconPatrol()in alarm handler (mechanical) + on-demand triage agent (intelligent)witnessPatrol()in alarm handler (mechanical) + on-demand triage agent (intelligent)Why the watchdog chain simplifies
Local Gastown needs Boot→Deacon→Witness because "a hung Deacon can't detect it's hung" — Boot provides an external observer. In the cloud, DO alarms are the external observer. They're durable (re-fire after eviction), managed by the Cloudflare runtime (not by user code that can hang), and independent of the container. If the container dies, the alarm still fires and detects dead agents. The three-tier watchdog chain collapses to: DO alarm (always fires) → mechanical checks → triage agent (when needed).
One risk: a logic bug in the alarm handler could silently break the town. Mitigation: a Cron Trigger that pings each active town's health endpoint independently of the DO alarm, providing an external watchdog analogous to Boot.
Implementation Plan
Step 1: Expand witnessPatrol() with full mechanical checks
Enhance the existing
witnessPatrol()to cover: GUPP escalation (not just mail — escalate after a second threshold, force-stop after a third), orphaned work detection (idle+hooked+no dispatch), agent GC (delete dead agents past retention), per-bead timeout enforcement.Step 2: Add deaconPatrol()
New alarm sub-task covering: stale hook detection, stranded convoy feeding, crash loop detection.
Step 3: Triage request queue
Add
triage_requestto theBeadTypeenum insrc/types.tsandsrc/db/tables/beads.table.ts. When mechanical checks produce ambiguous results (dirty polecat, stuck agent, help request), create triage request beads with structured context instead of taking immediate action.Step 4: Triage agent dispatch
When triage requests are queued, the alarm dispatches a short-lived triage agent session in the container with a focused prompt and narrow tool set. The agent processes all pending requests and exits. Add a system prompt at
src/prompts/triage-system.prompt.ts.Step 5: Triage agent tools
Add
gt_triage_resolvetool to the container plugin. This tool takes a triage request bead ID and a chosen action, and the TownDO executes the action (nuke agent, restart agent, send mail, escalate, etc.).Step 6: External health watchdog
Add a Cron Trigger (or separate DO with its own alarm) that periodically verifies each active town's alarm is firing and its container is responsive. This replaces Boot's role as the external observer.
Acceptance Criteria
witnessPatrol()expanded: GUPP escalation after threshold, orphaned work detection, agent GC, per-bead timeoutsdeaconPatrol()added: stale hook detection, stranded convoy feeding, crash loop detectiontriage_requestbead type added toBeadTypeenum insrc/types.tsandsrc/db/tables/beads.table.tssrc/prompts/triage-system.prompt.ts) and narrow tool setgt_triage_resolvetool implemented in container pluginNotes
agentDone()→ MR bead,completeReviewWithResult()→ close. No need to reintroduce protocol messages.PatrolResulttype insrc/types.tsdefinesdead_agents,stale_agents,orphaned_beadsarrays but is unused — can be repurposed or replaced for the expanded witness/deacon patrols.getOrCreateAgent()but has only a one-line stub prompt incontainer-dispatch.ts. The triage agent replaces what would have been a persistent witness LLM session.