You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a Gastown user's town breaks — agents stall, beads get stuck, containers die, merges fail, credentials expire — there is currently no way for a Kilo admin to diagnose or fix it without SSH-level access to Cloudflare infrastructure. All state is locked inside Durable Object SQLite databases and KV storage, errors are scattered across console.error logs with no correlation, and intervention requires writing ad-hoc scripts against internal APIs.
This issue covers a user-scoped admin panel: given a specific user, look up their GastownUserDO, see their towns, and drill into any TownDO to inspect and intervene. Fleet-wide views (all towns across all users) are out of scope here and will be addressed alongside #228 (Observability) once a secondary index exists.
User Failure Scenarios & Required Admin Capabilities
1. "My agent isn't doing anything" — Stuck Agent
Root causes: Container died mid-session, git clone failed (bad credentials, private repo), dispatch budget exhausted (5 attempts), polecat name pool exhausted, GUPP timeout, agent process crashed inside container.
Admin needs:
See agent status timeline: idle → working → (stalled? dead?)
See dispatch attempt history with error messages (currently only in console.error)
See the agent's last SDK events (from AgentDO) — did it receive its prompt? did it call any tools? did the session error out?
See container health: is the container running? when did it last restart? what's its process list?
Intervention: Force-reset agent to idle, force-unhook from bead, force-kill agent process in container, retry dispatch
2. "My bead has been open forever" — Stuck Bead
Root causes: No idle agent available, all polecats stalled, dispatch keeps failing, bead assigned to dead agent, convoy dependency blocking it, review queue backed up.
Admin needs:
See bead state history (bead_events timeline)
See which agent is hooked (if any) and that agent's status
See dependency graph: is this bead blocked by another? part of a convoy? waiting for review?
See dispatch attempts against this bead specifically
Intervention: Force-close bead, force-fail bead, reassign to different agent, manually unhook current agent, change bead priority
See per-MR-bead status: who's reviewing, how long in in_progress, branch names, PR URL
See refinery agent's reasoning (conversation/tool calls)
See git operation logs: what merge commands ran, what failed, stderr output
Intervention: Force-retry a failed review, force-close an MR bead, skip review and direct-merge, clear the review queue, change merge strategy on the fly
Root causes: GitHub App token expired, integration revoked, wrong integration linked to rig, credential store in container has stale token, dynamic credential resolution endpoint returning errors.
Admin needs:
See current credential state: which integration is linked, when was the token last resolved, is the token valid?
See git credential verification results (currently only a console.warn)
See all git push/pull failures across agents in the town
Intervention: Force-refresh credentials from integration, manually set a git token, re-link integration
6. "My convoy never finished" — Convoy Progress Stall
Root causes: A tracked bead was deleted instead of closed (counter never catches up), a tracked bead's merge failed, intermediate merge to feature branch conflicted, landing MR failed.
Admin needs:
Convoy bead graph: all tracked beads with their statuses
Progress counters: closed_beads / total_beads, with specific identification of which beads are not yet closed
Review queue churn: how many times has the same MR been retried?
Data Access Model — No Central Database
Gastown has no central Postgres table of towns or agents. All state is distributed across Durable Objects:
GastownUserDO (keyed by userId) — owns user_towns and user_rigs tables. This is the only way to discover which towns a user has.
TownDO (keyed by townId) — owns all beads, agents, rigs, events, config. This is where all the operational data lives.
AgentDO (keyed by agentId) — owns high-volume SDK event streams per agent.
This means the admin panel cannot start from a global town list — there is no table to query. The entry point must be user-scoped: look up a user, query their GastownUserDO for their towns, then drill into a specific TownDO.
Navigation flow
Admin searches for user (by email, userId, or name from kilocode_users in Postgres)
→ Hits GastownUserDO for that userId
→ Gets list of towns (user_towns) and rigs (user_rigs)
→ Selects a town → enters Town Inspector (queries TownDO)
→ Drills into beads, agents, container, config, events
This user-scoped lookup should live inside the existing Kilo admin dashboard's user detail page — not in a separate Gastown-only admin route. When an admin is already looking at a user (e.g., for a support case), they should see a "Gastown" section showing that user's towns with health indicators, and be able to drill straight into the Town Inspector from there.
Admin Panel Sections
User → Gastown Section (entry point, on existing admin user detail page)
List of user's towns (from GastownUserDO) with health indicators (green/yellow/red)
Per-town summary: active agents, open beads, container status, last activity
Quick actions: jump to Town Inspector, force-restart container
List of user's rigs with linked repo and integration status
Town Inspector (single town deep-dive)
State tab: All beads with current status, assigned agent, last event timestamp. Filterable by type/status. Clickable to bead detail view.
Agents tab: All agents with role, status, hooked bead, dispatch attempts, last activity. Clickable to agent detail view with SDK event stream.
All admin actions taken (who, when, what, on which town/bead/agent)
Immutable audit trail — admins must not be able to intervene without a record
Data Requirements
Several pieces of data that admins need are currently not persisted anywhere:
Dispatch attempt details — currently console.error only. Need to persist dispatch errors (container response, error message, timestamp) in a dispatch_attempts table or as bead events.
Container lifecycle events — currently console.log only. Need to emit bead events or a dedicated container event table for start/stop/error/sleep/wake.
Git operation results — currently container-side console.log only. Need to report clone/push/merge outcomes back to TownDO as events.
Admin intervention audit log — does not exist. Need a new table.
Credential resolution results — currently console.warn only. Need to persist token refresh success/failure.
These data gaps should be addressed in #228 or as a prerequisite to this issue.
Acceptance Criteria
User → Gastown section on existing admin user detail page (GastownUserDO lookup, town list with health indicators)
Town inspector with state, agents, review queue, container, config, and events tabs
Bead inspector with full state history, dependency graph, and agent conversation replay
Agent inspector with SDK event stream, dispatch history, and status timeline
The entry point is the existing admin user detail page — add a Gastown section there, not a separate /gastown/admin/ route tree
Town Inspector and deeper views can be standalone admin pages, linked from the user detail page
Fleet-wide overview (all towns across all users) is out of scope — will be addressed alongside [Gastown] PR 22: Observability #228 (Observability) once a secondary index exists
Parent
Part of #204 (Phase 4: Hardening)
Problem
When a Gastown user's town breaks — agents stall, beads get stuck, containers die, merges fail, credentials expire — there is currently no way for a Kilo admin to diagnose or fix it without SSH-level access to Cloudflare infrastructure. All state is locked inside Durable Object SQLite databases and KV storage, errors are scattered across
console.errorlogs with no correlation, and intervention requires writing ad-hoc scripts against internal APIs.This issue covers a user-scoped admin panel: given a specific user, look up their GastownUserDO, see their towns, and drill into any TownDO to inspect and intervene. Fleet-wide views (all towns across all users) are out of scope here and will be addressed alongside #228 (Observability) once a secondary index exists.
User Failure Scenarios & Required Admin Capabilities
1. "My agent isn't doing anything" — Stuck Agent
Root causes: Container died mid-session, git clone failed (bad credentials, private repo), dispatch budget exhausted (5 attempts), polecat name pool exhausted, GUPP timeout, agent process crashed inside container.
Admin needs:
idle→working→ (stalled? dead?)console.error)idle, force-unhook from bead, force-kill agent process in container, retry dispatch2. "My bead has been open forever" — Stuck Bead
Root causes: No idle agent available, all polecats stalled, dispatch keeps failing, bead assigned to dead agent, convoy dependency blocking it, review queue backed up.
Admin needs:
3. "Code won't merge" — Refinery Failure
Root causes: Merge conflict, git push credentials expired, refinery agent hallucinated bad PR URL, PR-strategy entry stuck waiting for webhook, review queue single-item bottleneck.
Admin needs:
in_progress, branch names, PR URL4. "Container keeps dying" — Container Instability
Root causes: OOM from too many concurrent agents, Cloudflare container platform issues, sleep/wake cycle problems, environment variable misconfiguration.
Admin needs:
5. "GitHub says permission denied" — Credential Issues
Root causes: GitHub App token expired, integration revoked, wrong integration linked to rig, credential store in container has stale token, dynamic credential resolution endpoint returning errors.
Admin needs:
console.warn)6. "My convoy never finished" — Convoy Progress Stall
Root causes: A tracked bead was deleted instead of closed (counter never catches up), a tracked bead's merge failed, intermediate merge to feature branch conflicted, landing MR failed.
Admin needs:
closed_beads/total_beads, with specific identification of which beads are not yet closed7. "The Mayor isn't responding" — Mayor Dysfunction
Root causes: Mayor agent crashed, container sleeping, Mayor stuck in a tool call loop, Mayor's system prompt is stale/wrong, Mayor's session errored.
Admin needs:
8. "I'm being charged but nothing is happening" — Silent Resource Consumption
Root causes: Zombie agents consuming LLM tokens, agents in retry loops, refinery re-reviewing the same MR, container running with no active work.
Admin needs:
Data Access Model — No Central Database
Gastown has no central Postgres table of towns or agents. All state is distributed across Durable Objects:
user_townsanduser_rigstables. This is the only way to discover which towns a user has.This means the admin panel cannot start from a global town list — there is no table to query. The entry point must be user-scoped: look up a user, query their GastownUserDO for their towns, then drill into a specific TownDO.
Navigation flow
This user-scoped lookup should live inside the existing Kilo admin dashboard's user detail page — not in a separate Gastown-only admin route. When an admin is already looking at a user (e.g., for a support case), they should see a "Gastown" section showing that user's towns with health indicators, and be able to drill straight into the Town Inspector from there.
Admin Panel Sections
User → Gastown Section (entry point, on existing admin user detail page)
Town Inspector (single town deep-dive)
Bead Inspector (single bead deep-dive)
Agent Inspector (single agent deep-dive)
Intervention Log
Data Requirements
Several pieces of data that admins need are currently not persisted anywhere:
console.erroronly. Need to persist dispatch errors (container response, error message, timestamp) in adispatch_attemptstable or as bead events.console.logonly. Need to emit bead events or a dedicated container event table for start/stop/error/sleep/wake.console.logonly. Need to report clone/push/merge outcomes back to TownDO as events.console.warnonly. Need to persist token refresh success/failure.These data gaps should be addressed in #228 or as a prerequisite to this issue.
Acceptance Criteria
is_adminusers (extends Gate Gastown UI to Kilo admins only #537)Notes
/gastown/admin/route tree