You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A collection of capabilities that are either impossible in local Gastown or dramatically better in the cloud. These are not prioritized for any specific phase — they're ideas to inform architecture decisions and inspire future work.
Local Gastown is single-player â^@^T one human, one machine, one Mayor. The cloud makes towns collaborative. A Kilo org or team owns a town, and any developer in that organization can interact with it â^@^T talking to the Mayor, slinging work, watching agents on the dashboard.
Ownership model
Towns are owned by a Kilo organization, not an individual user. The org admin creates the town, connects repos via the org's GitHub/GitLab integrations, and configures settings (gates, merge strategy, agent concurrency). Any member of the org can:
Talk to the Mayor through their own chat interface
Sling beads and create convoys
Watch agent streams and bead progress on the dashboard
View the activity feed across all org members' work
The Mayor has context about what everyone is doing. "Don't touch the auth module, Sarah's convoy is refactoring it right now" becomes something the Mayor can say because it has the full picture. Each user's messages to the Mayor are attributed â^@^T the Mayor knows who is asking and can factor in their role/permissions.
Cost attribution: always to the human, never to a bot
When a developer kicks off work (slings a bead, asks the Mayor to do something), all resulting LLM usage (polecat token consumption, refinery reviews, triage agent cycles) is attributed to that user, not to the org generically or to a bot account. The Kilo gateway already routes all LLM calls â^@^T each call carries the userId of the human who initiated the work chain.
This means:
Org admins see per-user cost breakdowns: "John's beads cost $47 this week across 3 rigs"
Individual developers see their own usage in their Kilo account
Billing rolls up to the org, but attribution is per-human
No "bot user" abstraction that obscures who is actually consuming resources
The userId should propagate from the Mayor message or gt_sling call through to every downstream agent dispatch and LLM call in that work chain.
Git commit dual attribution
Git commits made by agents should carry two authors: the human who initiated the work and the agent that wrote the code. This applies to both org-scoped and user-scoped Gastown.
Git supports this natively via Author and Committer fields, or via Co-authored-by trailers:
Author: John Smith <john@company.com>
Committer: Toast (Polecat) <toast@gastown.kilo.ai>
feat: add token refresh to auth middleware
Co-authored-by: Toast (Polecat) <toast@gastown.kilo.ai>
Requested-by: John Smith <john@company.com>
Bead-ID: gt-abc-123
The exact format is a design decision (Author/Committer split vs. trailers), but the requirements are:
Human attribution: GitHub/GitLab contribution graphs credit the human. PRs show the human as the author. The human's identity is the Author field.
Agent attribution: The agent identity is visible in the commit metadata. Which polecat, which model, which bead â^@^T all traceable.
Bead linkage: Each commit references the bead ID it was produced for, enabling dashboard â^F^R commit â^F^R code traceability.
This dual attribution is important for compliance (who authorized this code?), for contribution tracking (humans get credit for work they directed), and for debugging (which agent produced this commit?).
Fleet management
Extends to org-level fleet views â^@^T a VP of Engineering sees aggregate metrics across all towns (cost, velocity, quality), and can spot patterns like "the payments team's polecats have a 40% rework rate on the billing repo, but the platform team's are at 8%."
See plans/gastown-org-level-architecture.md for the full org-level spec.
Agent Personas and Smarter Convoy Dispatch
Two related ideas about rethinking how agents are assigned to work.
Serial convoy agent reuse
Currently, each bead in a convoy gets a distinct polecat — even when beads run in series (B depends on A, C depends on B). This means 3 agents for sequential work: each one clones the repo, rebuilds context, and tries to understand what the previous agent did. A single agent carrying context from A → B → C would be faster, cheaper, and produce more coherent work.
The fix is a dispatch strategy change: when the alarm dispatches the next bead in a dependency chain, prefer re-hooking the agent that just completed the predecessor bead (if it's idle) rather than grabbing a random idle polecat. The agent keeps its worktree, its git state, and its LLM context. For parallel beads within a convoy, separate agents are still needed since the work is simultaneous.
This doesn't require rethinking agent identity — it's purely about dispatch preference. The Mayor could also influence this: "run this convoy with a single agent in series" vs. "fan out to separate agents."
Agent personas
All polecats today are identical — same system prompt, same tools, same capabilities. The 20-name pool (Toast, Shadow, Maple, etc.) is cosmetic. There's no capability differentiation between agents.
Personas add a capability dimension. A persona is a prompt template + optional configuration (tool restrictions, quality gate priorities) applied at dispatch time:
Agent = Polecat role + Persona (optional) + Identity (name, work history)
Personas are defined in town config (custom per town) or from built-in defaults
Each persona has: a name, a system prompt overlay, optional tool restrictions, optional quality gate overrides
When the Mayor slings a bead, it can specify a persona: "sling this security audit to a Pentester"
The dispatch system finds an idle polecat, applies the persona's prompt overlay, and dispatches
The same polecat can wear different personas across different beads — "Toast" is a Frontend Master on bead A, then a Backend Architect on bead B
Personas don't restrict the agent pool. You can have 5 agents all running with the "Frontend Master" persona simultaneously
The Mayor becomes the casting director — deciding which persona fits each bead based on the work description
This also composes with serial convoy reuse: a convoy for "refactor the auth module" could be one agent, one persona ("Backend Architect"), carrying context through the whole chain.
Agent Marketplace and Shared Formulas
Local Gastown's Mol Mall is a concept in the docs but barely implemented. The cloud can make this real — a hosted formula registry where users publish and install workflow templates. "Here's a formula for migrating a React codebase from class components to hooks" or "Here's a 12-step molecule for adding comprehensive test coverage to an untested module." Formulas become a product surface, not just an internal tool.
Beyond formulas, agent configurations become shareable — system prompts, quality gate configs, model selections that produce good results for specific kinds of work. "This polecat config with Claude Opus and this system prompt produces 95% first-pass merge rate on TypeScript refactors" becomes community knowledge.
Cross-Session Intelligence at Scale
Local Gastown's seance system lets an agent query one predecessor. The cloud has every agent event stored in AgentDOs. Build cross-agent, cross-session search: "Which agent last touched the payment processing module, and what did they change?" The cloud can answer this by querying across all agent event histories in a town. This is institutional memory that scales — every agent's work is indexed and searchable.
When a polecat is assigned work on a file, the system can automatically surface relevant context from previous agents who touched that file. "Toast worked on this module 3 days ago and left a note about a tricky race condition in the checkout flow." The polecat gets this in its prime context without anyone asking for it. Local Gastown can't do this because session histories aren't centrally indexed.
Real-Time Cost Attribution and Budget Controls
The cloud routes all LLM calls through the Kilo gateway, which means per-bead cost tracking in real time. "This convoy cost $47 across 12 beads and 3 rigs." Local Gastown tracks costs retroactively via gt costs record on session stop, but the cloud can do it live — show a cost ticker on the dashboard that updates as agents stream tokens.
This enables budget guardrails: "This rig has a $200/day budget. When polecats approach the limit, stop slinging new work and notify the Mayor." Or per-convoy budgets: "This refactoring convoy is capped at $500. If we're at $450 and 3 beads are still open, escalate to the human to decide if it's worth continuing." Local Gastown has no concept of cost limits — agents run until the work is done or the human intervenes.
Speculative Execution and A/B Testing
With the container model, sling the same bead to multiple polecats with different configurations — different models, different system prompts, different temperatures — and let the Refinery pick the best result. "Give this task to three polecats: one with Opus, one with Sonnet, one with Gemini. Merge whichever passes quality gates first, discard the rest."
This turns agent orchestration into an experimentation platform. Over time, the system accumulates data: "For TypeScript refactoring tasks, Sonnet 4.6 produces first-pass merges 82% of the time at $0.40/bead. Opus produces 94% at $2.10/bead. For this task complexity, Sonnet is the better value." The cloud can make this recommendation automatically because it has the data. Local Gastown can compare models via the gt-model-eval framework, but it's manual and offline.
Webhook-Driven Beads
The cloud sits behind a web server with existing GitHub/GitLab webhook infrastructure. GitHub issues can automatically become Gastown beads. A new PR from an external contributor triggers a review bead. A failing CI run creates an escalation. A Slack message creates a bead. The Mayor sees it all in its inbox and can decide to act.
Local Gastown is pull-only — agents check for work on their hooks. The cloud can be push-driven, reacting to external events in real time. A GitHub issue labeled gastown auto-creates a bead, the Mayor assesses it, slings it if appropriate, and the human who filed the issue sees a comment: "A polecat is working on this. Track progress at [dashboard link]."
Persistent Agent Reputation
Local Gastown tracks agent CVs (work history per polecat identity), but it's per-machine and per-town. The cloud can build cross-town agent reputation at the platform level. Not just "Toast completed 47 beads" but "Across all Kilo Cloud Gastown towns, Claude Opus polecats with the standard polecat prompt have a 91% first-pass merge rate on Python repos, dropping to 73% on Rust repos." This becomes a data product — model performance benchmarks derived from real production work across the entire user base.
Always-On with Zero Maintenance
The most obvious but most impactful: the cloud town is always running. No laptop that needs to stay open, no tmux sessions to babysit, no gt mayor attach to check on things. Tell the Mayor to refactor a module at 5 PM, close your laptop, and check the dashboard from your phone at dinner. The convoy landed, the PR is merged, the tests pass. The entire watchdog chain (alarm-driven health checks, triage agents for ambiguous situations) runs without any human presence.
Local CLI Bridge (Stretch Goal)
Connect your local Kilo CLI to your cloud Gastown instance. kilo gastown connect <town-url> authenticates and registers your local instance as a crew-like agent in the cloud town. You get all the coordination benefits (beads, mail, identity, attribution, convoy tracking) while running locally with full filesystem access and your own dev environment. Your local session appears in the cloud dashboard as an active agent.
This bridges "fully cloud-hosted" and "I want to work locally but with cloud coordination." The tool plugin's HTTP API surface is the same whether the agent runs in a Cloudflare Container or on someone's laptop.
SolidJS Dashboard: Native Integration with Kilo's Web UI
The Gastown dashboard currently lives in the Next.js (React) app. The Kilo desktop/web app (packages/app/) is built with SolidJS — including all the session UI, message rendering, tool execution cards, diff viewer, file tabs, terminal panel, and prompt dock. These are mature, production-quality components that represent significant engineering investment.
A future direction is to migrate the Gastown dashboard to SolidJS and serve it directly from the gastown Cloudflare Worker, making the UI a first-class part of the gastown service rather than a feature embedded in the main Next.js app. This enables:
Native reuse of Kilo's UI components. The session viewer, message timeline, tool execution cards, diff viewer, and terminal panel from @opencode-ai/ui and packages/app/ can be used directly — no framework bridging, no React ports, no iframe isolation. The same <SessionTurn>, <BasicTool>, <Code>, and <Diff> components that render the desktop app render the cloud agent streams.
Two interaction modes for agents. The xterm.js terminal (Phase 2.5) provides raw PTY access to agent sessions. The SolidJS session viewer provides the rich, structured view — markdown message bubbles, expandable tool call cards with input/output, inline diffs, file tabs. Both show the same agent session. Users pick the view that suits their workflow. The terminal is power-user/debugging; the session viewer is the polished product experience.
Independent deployment. The gastown dashboard deploys with the gastown worker on its own release cycle, decoupled from the main Next.js app. UI changes to the gastown experience don't require rebuilding and deploying the entire Kilo web app.
Server connection is already compatible. The Kilo app connects to kilo serve via HTTP REST + SSE (via @kilocode/sdk). The cloud container runs kilo serve instances. The SDK client is framework-agnostic. The SolidJS dashboard would use the same createOpencodeClient() to connect to agent sessions through the gastown worker proxy — the same code path, just routed through a different network layer.
What this looks like architecturally:
Browser
│
├── Main Kilo App (Next.js/React) — account, billing, integrations, cloud agent
│ └── /gastown/* routes → redirect to gastown UI
│
└── Gastown UI (SolidJS, served from gastown worker)
├── Town dashboard, rig workbench, bead board, convoy tracker
├── Mayor chat (using Kilo app's session components)
├── Agent stream viewer (using Kilo app's message timeline)
├── Agent terminal (xterm.js — kept from Phase 2.5)
└── All backed by gastown worker API + TownDO
The main Next.js app handles everything outside of Gastown (account management, billing, integrations, standalone cloud agent sessions). Gastown owns its own UI surface, served from its own worker, using the same component library as the rest of Kilo.
This is a significant architectural shift — not a near-term task. It depends on the Kilo app's component library being extractable as a standalone SolidJS package, and on the gastown worker being able to serve static assets (or using Cloudflare Pages alongside the worker). But it's the natural end state: the gastown UI is a SolidJS app that shares components with the Kilo desktop/web app, giving both the same quality of agent interaction rendering.
UI-Aware Mayor: Dashboard Context Injection
The Mayor should know what the user is looking at. When the user is staring at a failed bead, the Mayor shouldn't need to be told "bead gt-abc failed" — it should already know, because the dashboard is feeding the user's navigation context into the Mayor's session.
How it works
The dashboard client maintains a user activity stream — a rolling log of what the user has done and what they're currently viewing. On every Mayor message submission, this context is injected as a system block alongside the user's message:
The Mayor sees this context and can respond intelligently without the user having to explain: "I see you're looking at bead gt-abc which failed because of a test failure in auth.test.ts. The agent Toast that was working on it is dead. Would you like me to re-sling this to a new polecat, or should I look at the test failure first?"
What gets tracked
Signal
Source
Example
Current page/view
Router location
rig-detail, town-home, convoy-detail
Currently viewed object
Active slide-over / detail panel
Bead, agent, convoy, escalation
Recent navigation
Page view history (last 10-15 actions, 30 min window)
When sending a Mayor message, the frontend includes the current context and recent activity in a context field on the message payload. The TownDO resolves the referenced objects (fetches current bead status, agent state, etc.) and injects the resolved context into the Mayor's prompt as a system block.
Progressive richness
The context injection can start simple and get richer over time:
Phase 1: Current page + currently viewed object (just the bead/agent/rig the user has open)
Phase 3: Inferred intent — "the user has been looking at failed beads for the last 5 minutes, they're probably trying to diagnose a problem"
Phase 4: Proactive Mayor suggestions — the Mayor notices the user's pattern and offers help before they ask. "I see you've been looking at 3 failed beads in the frontend rig. They all failed on the same test. Want me to investigate the common root cause?"
Why this matters
Without UI context, the Mayor is a blank-slate chatbot that requires the user to explain everything. With UI context, the Mayor becomes a copilot that shares the user's visual field. The conversation goes from:
User: "The auth fix failed, can you look at bead gt-abc and re-sling it?"
to:
User: "Can you fix this?" Mayor: "The bead you're viewing (gt-abc: Fix token refresh) failed because auth.test.ts:42 expected a 401 but got a 500. Toast, the agent that worked on it, is dead. I'll create a new bead with the test failure context and sling it to a fresh polecat. The new agent will know exactly what went wrong."
The user's dashboard becomes a shared workspace where the Mayor can see what the user sees.
Agent Status Bubbles: Live Progress Updates
Agents periodically emit short, plain-language status updates — one or two sentences describing what they're currently doing. "Installing packages," "Setting up Vite," "Creating the main UI components," "Running tests — 3 failures to fix." These aren't debug logs or SDK events — they're human-readable summaries of intent, written by the agent for the user.
Where they show up
Agent cards: A chat-bubble-style tooltip above the agent card on the rig view. Shows the most recent status. Fades/rotates as new updates arrive.
Activity feed: Appears as a timeline entry with the agent's name and avatar: "Toast: Connecting the frontend UI to the backend for the new todo app." Interleaved with bead events, convoy progress, and Mayor messages.
Bead detail panel: Status history for the assigned agent, giving a narrative of how the work progressed.
How it works
A new tool gt_status (or an implicit periodic prompt injection) lets agents emit a status string. The status is lightweight — just a string + timestamp, stored as a bead event or a dedicated agent_status field on the agent metadata. The container streams it to the dashboard via the existing WebSocket event sink.
The agent's system prompt includes guidance: "Periodically call gt_status with a brief, user-friendly description of what you're doing. Write it as if you're telling a teammate — not a log line, not a stack trace. One sentence. Do this when you start a new phase of work, not on every tool call."
Why this matters
Right now, watching agents work means either opening the raw terminal stream (noisy, technical) or waiting for the bead to close (no intermediate visibility). Status bubbles give users the feeling of a team working — you glance at the dashboard and see "Maple is writing tests for the API endpoints" and "Shadow is fixing the lint errors from the previous commit." It makes the town feel alive without requiring users to dig into agent streams.
Code Diffs Linked to Beads
Every bead that results in code changes should have its diff viewable directly from the dashboard. When a polecat works on a bead and pushes a branch, the diff between the branch and its base (main) is the artifact of that work. It should be a first-class, clickable object on the bead detail panel — not something you have to go to GitHub to find.
How it works
When a polecat calls gt_done and pushes its branch, the TownDO records the branch name and base commit on the bead (or its review_metadata satellite). The dashboard can then fetch and render the diff on demand.
Two sources for the diff data:
Container git: The container's git manager can run git diff main...<branch> in the rig's repo and return the result via a new API endpoint (GET /agents/:agentId/diff or GET /git/diff?rig=X&branch=Y). This works while the container is running and the branch exists locally.
GitHub/GitLab API: For diffs that outlive the container session (or when the container has slept), fetch the diff via the platform API using the org's integration token. GET /repos/{owner}/{repo}/compare/{base}...{head} on GitHub returns the full diff. This is the durable path — it works as long as the branch exists on the remote.
What the user sees
On the bead detail panel, a "Diff" tab appears when the bead has associated code changes:
Expandable per-file unified diff with syntax highlighting
Click a file to see the full diff for that file
For beads that went through the Refinery and merged, the diff shows what landed on main (the squash-merge commit). For beads still in progress, the diff shows the current state of the polecat's branch vs main.
What the Mayor sees
When the Mayor checks on a bead or convoy, diff summaries can be included in the context: "Toast's bead changed 4 files (+120 -45): src/auth.ts, src/middleware.ts, test/auth.test.ts, package.json." This gives the Mayor a sense of the scope and nature of the work without reading the full diff.
Convoy-level diffs
At the convoy level, aggregate the diffs across all tracked beads: total files changed, total lines added/removed, and a merged file list. "The auth refactor convoy touched 12 files across 2 rigs, +340 -180." This is the high-level summary for the user who wants to know the blast radius of a batch of work.
Automated PR Review Response: Agents Address Human Feedback
When using the pr merge strategy, the TownDO already polls open PRs for merge/close status (#473). Extend this to detect new review comments and dispatch agents to address them — fix the code, push updates, reply to threads, and resolve conversations.
The loop
TownDO alarm polls open PRs (already exists in pollPendingPRs()). Extend to also fetch review comments via the GitHub/GitLab API.
New comments detected — for each unaddressed review comment (not authored by Gastown, not already resolved), the TownDO creates a sub-bead of the original work bead with the comment context: file path, line range, comment body, reviewer name.
Agent dispatched — the sub-bead is slung to a polecat (or the Refinery itself, since it already has the review context and worktree). The agent's prompt includes the full comment, the file context, and instructions to either fix the issue or respond explaining why no change is needed.
Agent works — reads the comment, examines the code, makes the fix (or determines no fix is needed), commits and pushes to the same PR branch.
Agent replies — via the GitHub/GitLab API, posts a reply to the review thread explaining what was done: "Fixed: moved the null check before the destructuring as suggested" or "This is intentional — the fallback handles the null case on line 47. Marking as resolved."
Agent resolves the thread — calls the resolve conversation API (GitHub: POST /graphql mutation resolveReviewThread, GitLab: PUT /discussions/:id/resolve).
Cycle repeats — if the reviewer posts more comments, the next alarm tick picks them up and the loop continues.
Comment classification
Not every comment needs a code fix. The agent should classify each comment:
Type
Agent action
Code fix request ("this should use ?. instead of !")
Fix the code, push, reply with what changed, resolve
Question ("why did you use X instead of Y?")
Reply with explanation, resolve
Nitpick / style ("prefer const over let here")
Fix it (low effort), push, resolve
Architectural concern ("this should be in a separate service")
Respond acknowledging the feedback. If the change is within scope, fix it. If it's a larger concern, escalate to the Mayor with context.
Approval / positive comment ("LGTM", "nice")
No action needed, skip
What the agent sees in its prompt
You are addressing review feedback on PR #42 for bead "Fix token refresh".
Review comment by @sarah (on src/auth.ts, lines 15-20):
> This function doesn't handle the case where the token is expired
> but the refresh endpoint is down. We should return a 503 here
> instead of silently failing.
The relevant code:
[file context with surrounding lines]
Your task:
1. Fix the issue described in the comment
2. Commit and push to the PR branch
3. Reply to the review thread explaining your fix
4. Resolve the conversation
If the comment is a question or doesn't require a code change, reply
with an explanation and resolve the thread. If the request requires
changes beyond the scope of this bead, escalate to the Mayor.
Tracking
Each review comment creates a sub-bead linked to the MR bead via parent_bead_id. The sub-bead's metadata stores the comment ID, thread ID, file path, and line range. When the agent resolves the thread, the sub-bead closes. The MR bead's progress reflects how many review comments have been addressed.
The dashboard shows review comment sub-beads in the MR bead's detail drawer — each comment with its status (open/addressed/resolved) and the agent's reply.
Platform API calls needed
GitHub:
GET /repos/{owner}/{repo}/pulls/{number}/reviews — list reviews
GET /repos/{owner}/{repo}/pulls/{number}/comments — list review comments
POST /repos/{owner}/{repo}/pulls/{number}/comments/{id}/replies — reply to thread
GET /projects/{id}/merge_requests/{iid}/discussions — list discussions
POST /projects/{id}/merge_requests/{iid}/discussions/{discussion_id}/notes — reply
PUT /projects/{id}/merge_requests/{iid}/discussions/{discussion_id}?resolved=true — resolve
Building on existing infrastructure
The platform-pr.util.ts from #473 already has parseGitUrl(), GitHub/GitLab auth patterns, and Zod response parsing. The comment fetching and reply posting would be natural extensions of the same module. The pollPendingPRs() method in TownDO already iterates open PRs on every alarm tick — adding comment detection is an additional API call per open PR in the same loop.
Wasteland Onboarding + Dolt Integration
Make Gastown by Kilo towns visible to the Wasteland federation by syncing town state to a user-provided DoltHub database. The Wasteland expects Dolt databases — our internal storage is DO SQLite. Rather than replacing or mirroring the entire DO, we sync at the boundary: emit Wasteland-compatible records (beads, completions, agent identity) to DoltHub so the town is a queryable participant in the federation.
Design: Event-sourced sync, not database replication
The TownDO remains the source of truth for all real-time orchestration. Dolt is the town's "public face" — the subset of state that the Wasteland cares about, published on a cadence.
What gets synced to Dolt:
Bead created/updated/closed → wanted posters and completions
Triage requests, mail delivery state, escalation re-escalation counters
Sync mechanisms
Incremental sync (steady-state): The TownDO alarm loop (or a slower-cadence task, e.g., every 5 minutes) batches recent bead_events since the last sync cursor, translates them into Dolt-schema inserts, and pushes a commit to DoltHub via their SQL or HTTP API. This is append-mostly — new beads, status transitions, completions. Dolt's versioning gives history for free.
Full sync (Wasteland onboarding for existing towns): When a user connects an existing town to the Wasteland for the first time, they have historical beads, convoys, and agent records that predate the Dolt integration. A full sync publishes this backlog.
Full syncs are handled by a dedicated GastownDoltSyncDO (Durable Object, keyed by {townId}:{syncId}). The flow:
User completes Wasteland onboarding in the UI — provides DoltHub org/repo and API token
TownDO creates a GastownDoltSyncDO instance for the initial full sync
The sync DO queries the TownDO for all relevant historical data (closed beads, landed convoys, agent identities) in paginated batches
An alarm loop on the sync DO processes batches over time — pushing commits to DoltHub in chunks, tracking progress via a cursor, re-arming until complete
On completion, the sync DO marks itself done and the TownDO switches to incremental sync mode
This avoids blocking the TownDO alarm with a potentially large historical export. The sync DO is a one-shot worker that runs to completion and then goes dormant. Future full syncs (e.g., if the user resets their Dolt database) create a new sync DO instance.
User setup flow
User navigates to town settings → Wasteland section
User creates a DoltHub database (or provides an existing one) — org, repo name, API token
We initialize the Dolt schema (beads table, agents table, completions table — matching the Wasteland's expected structure)
If the town has existing data, a full sync is triggered via GastownDoltSyncDO
Once the full sync completes, incremental sync begins automatically
The town is now visible to the Wasteland federation
Inbound Wasteland work (stretch)
For the town to consume work from the Wasteland (not just publish), a poll-or-webhook mechanism reads wanted posters from the Wasteland's coordination database and creates beads in the TownDO. The Mayor could get a gt_wasteland_check tool that surfaces available federation work. This is a separate effort from the outbound sync described above.
Key architectural points
DO SQLite is the source of truth. Dolt is a downstream projection, not a primary store. If Dolt is unavailable, the town operates normally — sync catches up when connectivity returns.
GastownDoltSyncDO isolates sync work. Batching, retries, and progress tracking don't pollute the TownDO's alarm loop. Each sync is a self-contained DO with its own alarm cadence.
Dolt schema matches Wasteland expectations. We don't invent our own Dolt schema — we implement whatever table structure the Wasteland protocol expects, translating from our internal bead model on the way out.
Sync is idempotent. Re-running a full sync or replaying incremental events produces the same Dolt state. Bead IDs are stable UUIDs.
Kiloclaw Integration
Sub-issues
Kiloclaw as Meta-Mayor — fleet coordinator across multiple Gastown towns #1290 — Kiloclaw as Meta-Mayor — A Kiloclaw instance as a fleet coordinator across multiple Gastown towns. Connects to each town via a Gastown MCP server, can query bead boards, create convoys, send messages to individual town Mayors, track cross-town dependencies, and report on fleet-wide health and spend. The VP of Engineering AI.
Other Kiloclaw integration opportunities (not yet scoped)
Kiloclaw as Mayor — Replace a single town's built-in Mayor with a Kiloclaw instance. Gets persistent memory (solves the container restart context loss problem), custom MCP tools (Slack, Jira, Linear), and user-customizable behavior via custom instructions. Integration: route Mayor messages to Kiloclaw instead of the container; Kiloclaw uses the Gastown MCP server to call gt_sling, gt_list_beads, etc.
Kiloclaw as Polecat — A Kiloclaw instance doing coding work. Benefits: persistent codebase knowledge across sessions, custom MCP servers for test infrastructure. Challenge: polecats need direct filesystem access (git worktrees, shell) — Kiloclaw would need MCP access to the container's filesystem or use the GitHub/GitLab API for code changes.
Kiloclaw as Refinery — Persistent knowledge of codebase quality standards (learned across reviews), MCP tools for external quality checks (SonarQube, security scanners), org-specific review standards from custom instructions.
Kiloclaw as Triage Agent — Persistent memory of past triage decisions. "Last time this agent OOM'd, I increased the memory limit and it worked." Institutional memory for operational decisions.
Shared prerequisite: Gastown MCP Server
All Kiloclaw integrations depend on a Gastown MCP server package that exposes town operations as MCP tools. This is independently useful — any MCP-compatible agent (not just Kiloclaw) could use it to interact with Gastown towns.
Overview
A collection of capabilities that are either impossible in local Gastown or dramatically better in the cloud. These are not prioritized for any specific phase — they're ideas to inform architecture decisions and inspire future work.
Parent: #204
Organization-Level Towns
Local Gastown is single-player â^@^T one human, one machine, one Mayor. The cloud makes towns collaborative. A Kilo org or team owns a town, and any developer in that organization can interact with it â^@^T talking to the Mayor, slinging work, watching agents on the dashboard.
Ownership model
Towns are owned by a Kilo organization, not an individual user. The org admin creates the town, connects repos via the org's GitHub/GitLab integrations, and configures settings (gates, merge strategy, agent concurrency). Any member of the org can:
The Mayor has context about what everyone is doing. "Don't touch the auth module, Sarah's convoy is refactoring it right now" becomes something the Mayor can say because it has the full picture. Each user's messages to the Mayor are attributed â^@^T the Mayor knows who is asking and can factor in their role/permissions.
Cost attribution: always to the human, never to a bot
When a developer kicks off work (slings a bead, asks the Mayor to do something), all resulting LLM usage (polecat token consumption, refinery reviews, triage agent cycles) is attributed to that user, not to the org generically or to a bot account. The Kilo gateway already routes all LLM calls â^@^T each call carries the
userIdof the human who initiated the work chain.This means:
The
userIdshould propagate from the Mayor message orgt_slingcall through to every downstream agent dispatch and LLM call in that work chain.Git commit dual attribution
Git commits made by agents should carry two authors: the human who initiated the work and the agent that wrote the code. This applies to both org-scoped and user-scoped Gastown.
Git supports this natively via
AuthorandCommitterfields, or viaCo-authored-bytrailers:The exact format is a design decision (Author/Committer split vs. trailers), but the requirements are:
Authorfield.This dual attribution is important for compliance (who authorized this code?), for contribution tracking (humans get credit for work they directed), and for debugging (which agent produced this commit?).
Fleet management
Extends to org-level fleet views â^@^T a VP of Engineering sees aggregate metrics across all towns (cost, velocity, quality), and can spot patterns like "the payments team's polecats have a 40% rework rate on the billing repo, but the platform team's are at 8%."
See
plans/gastown-org-level-architecture.mdfor the full org-level spec.Agent Personas and Smarter Convoy Dispatch
Two related ideas about rethinking how agents are assigned to work.
Serial convoy agent reuse
Currently, each bead in a convoy gets a distinct polecat — even when beads run in series (B depends on A, C depends on B). This means 3 agents for sequential work: each one clones the repo, rebuilds context, and tries to understand what the previous agent did. A single agent carrying context from A → B → C would be faster, cheaper, and produce more coherent work.
The fix is a dispatch strategy change: when the alarm dispatches the next bead in a dependency chain, prefer re-hooking the agent that just completed the predecessor bead (if it's idle) rather than grabbing a random idle polecat. The agent keeps its worktree, its git state, and its LLM context. For parallel beads within a convoy, separate agents are still needed since the work is simultaneous.
This doesn't require rethinking agent identity — it's purely about dispatch preference. The Mayor could also influence this: "run this convoy with a single agent in series" vs. "fan out to separate agents."
Agent personas
All polecats today are identical — same system prompt, same tools, same capabilities. The 20-name pool (Toast, Shadow, Maple, etc.) is cosmetic. There's no capability differentiation between agents.
Personas add a capability dimension. A persona is a prompt template + optional configuration (tool restrictions, quality gate priorities) applied at dispatch time:
The model:
This also composes with serial convoy reuse: a convoy for "refactor the auth module" could be one agent, one persona ("Backend Architect"), carrying context through the whole chain.
Agent Marketplace and Shared Formulas
Local Gastown's Mol Mall is a concept in the docs but barely implemented. The cloud can make this real — a hosted formula registry where users publish and install workflow templates. "Here's a formula for migrating a React codebase from class components to hooks" or "Here's a 12-step molecule for adding comprehensive test coverage to an untested module." Formulas become a product surface, not just an internal tool.
Beyond formulas, agent configurations become shareable — system prompts, quality gate configs, model selections that produce good results for specific kinds of work. "This polecat config with Claude Opus and this system prompt produces 95% first-pass merge rate on TypeScript refactors" becomes community knowledge.
Cross-Session Intelligence at Scale
Local Gastown's seance system lets an agent query one predecessor. The cloud has every agent event stored in AgentDOs. Build cross-agent, cross-session search: "Which agent last touched the payment processing module, and what did they change?" The cloud can answer this by querying across all agent event histories in a town. This is institutional memory that scales — every agent's work is indexed and searchable.
When a polecat is assigned work on a file, the system can automatically surface relevant context from previous agents who touched that file. "Toast worked on this module 3 days ago and left a note about a tricky race condition in the checkout flow." The polecat gets this in its prime context without anyone asking for it. Local Gastown can't do this because session histories aren't centrally indexed.
Real-Time Cost Attribution and Budget Controls
The cloud routes all LLM calls through the Kilo gateway, which means per-bead cost tracking in real time. "This convoy cost $47 across 12 beads and 3 rigs." Local Gastown tracks costs retroactively via
gt costs recordon session stop, but the cloud can do it live — show a cost ticker on the dashboard that updates as agents stream tokens.This enables budget guardrails: "This rig has a $200/day budget. When polecats approach the limit, stop slinging new work and notify the Mayor." Or per-convoy budgets: "This refactoring convoy is capped at $500. If we're at $450 and 3 beads are still open, escalate to the human to decide if it's worth continuing." Local Gastown has no concept of cost limits — agents run until the work is done or the human intervenes.
Speculative Execution and A/B Testing
With the container model, sling the same bead to multiple polecats with different configurations — different models, different system prompts, different temperatures — and let the Refinery pick the best result. "Give this task to three polecats: one with Opus, one with Sonnet, one with Gemini. Merge whichever passes quality gates first, discard the rest."
This turns agent orchestration into an experimentation platform. Over time, the system accumulates data: "For TypeScript refactoring tasks, Sonnet 4.6 produces first-pass merges 82% of the time at $0.40/bead. Opus produces 94% at $2.10/bead. For this task complexity, Sonnet is the better value." The cloud can make this recommendation automatically because it has the data. Local Gastown can compare models via the gt-model-eval framework, but it's manual and offline.
Webhook-Driven Beads
The cloud sits behind a web server with existing GitHub/GitLab webhook infrastructure. GitHub issues can automatically become Gastown beads. A new PR from an external contributor triggers a review bead. A failing CI run creates an escalation. A Slack message creates a bead. The Mayor sees it all in its inbox and can decide to act.
Local Gastown is pull-only — agents check for work on their hooks. The cloud can be push-driven, reacting to external events in real time. A GitHub issue labeled
gastownauto-creates a bead, the Mayor assesses it, slings it if appropriate, and the human who filed the issue sees a comment: "A polecat is working on this. Track progress at [dashboard link]."Persistent Agent Reputation
Local Gastown tracks agent CVs (work history per polecat identity), but it's per-machine and per-town. The cloud can build cross-town agent reputation at the platform level. Not just "Toast completed 47 beads" but "Across all Kilo Cloud Gastown towns, Claude Opus polecats with the standard polecat prompt have a 91% first-pass merge rate on Python repos, dropping to 73% on Rust repos." This becomes a data product — model performance benchmarks derived from real production work across the entire user base.
Always-On with Zero Maintenance
The most obvious but most impactful: the cloud town is always running. No laptop that needs to stay open, no tmux sessions to babysit, no
gt mayor attachto check on things. Tell the Mayor to refactor a module at 5 PM, close your laptop, and check the dashboard from your phone at dinner. The convoy landed, the PR is merged, the tests pass. The entire watchdog chain (alarm-driven health checks, triage agents for ambiguous situations) runs without any human presence.Local CLI Bridge (Stretch Goal)
Connect your local Kilo CLI to your cloud Gastown instance.
kilo gastown connect <town-url>authenticates and registers your local instance as a crew-like agent in the cloud town. You get all the coordination benefits (beads, mail, identity, attribution, convoy tracking) while running locally with full filesystem access and your own dev environment. Your local session appears in the cloud dashboard as an active agent.This bridges "fully cloud-hosted" and "I want to work locally but with cloud coordination." The tool plugin's HTTP API surface is the same whether the agent runs in a Cloudflare Container or on someone's laptop.
SolidJS Dashboard: Native Integration with Kilo's Web UI
The Gastown dashboard currently lives in the Next.js (React) app. The Kilo desktop/web app (
packages/app/) is built with SolidJS — including all the session UI, message rendering, tool execution cards, diff viewer, file tabs, terminal panel, and prompt dock. These are mature, production-quality components that represent significant engineering investment.A future direction is to migrate the Gastown dashboard to SolidJS and serve it directly from the gastown Cloudflare Worker, making the UI a first-class part of the gastown service rather than a feature embedded in the main Next.js app. This enables:
Native reuse of Kilo's UI components. The session viewer, message timeline, tool execution cards, diff viewer, and terminal panel from
@opencode-ai/uiandpackages/app/can be used directly — no framework bridging, no React ports, no iframe isolation. The same<SessionTurn>,<BasicTool>,<Code>, and<Diff>components that render the desktop app render the cloud agent streams.Two interaction modes for agents. The xterm.js terminal (Phase 2.5) provides raw PTY access to agent sessions. The SolidJS session viewer provides the rich, structured view — markdown message bubbles, expandable tool call cards with input/output, inline diffs, file tabs. Both show the same agent session. Users pick the view that suits their workflow. The terminal is power-user/debugging; the session viewer is the polished product experience.
Independent deployment. The gastown dashboard deploys with the gastown worker on its own release cycle, decoupled from the main Next.js app. UI changes to the gastown experience don't require rebuilding and deploying the entire Kilo web app.
Server connection is already compatible. The Kilo app connects to kilo serve via HTTP REST + SSE (via
@kilocode/sdk). The cloud container runs kilo serve instances. The SDK client is framework-agnostic. The SolidJS dashboard would use the samecreateOpencodeClient()to connect to agent sessions through the gastown worker proxy — the same code path, just routed through a different network layer.What this looks like architecturally:
The main Next.js app handles everything outside of Gastown (account management, billing, integrations, standalone cloud agent sessions). Gastown owns its own UI surface, served from its own worker, using the same component library as the rest of Kilo.
This is a significant architectural shift — not a near-term task. It depends on the Kilo app's component library being extractable as a standalone SolidJS package, and on the gastown worker being able to serve static assets (or using Cloudflare Pages alongside the worker). But it's the natural end state: the gastown UI is a SolidJS app that shares components with the Kilo desktop/web app, giving both the same quality of agent interaction rendering.
UI-Aware Mayor: Dashboard Context Injection
The Mayor should know what the user is looking at. When the user is staring at a failed bead, the Mayor shouldn't need to be told "bead gt-abc failed" — it should already know, because the dashboard is feeding the user's navigation context into the Mayor's session.
How it works
The dashboard client maintains a user activity stream — a rolling log of what the user has done and what they're currently viewing. On every Mayor message submission, this context is injected as a system block alongside the user's message:
The Mayor sees this context and can respond intelligently without the user having to explain: "I see you're looking at bead gt-abc which failed because of a test failure in auth.test.ts. The agent Toast that was working on it is dead. Would you like me to re-sling this to a new polecat, or should I look at the test failure first?"
What gets tracked
rig-detail,town-home,convoy-detailWhat does NOT get tracked
Implementation
The frontend maintains a bounded circular buffer of
UserActivityEventobjects:When sending a Mayor message, the frontend includes the current context and recent activity in a
contextfield on the message payload. The TownDO resolves the referenced objects (fetches current bead status, agent state, etc.) and injects the resolved context into the Mayor's prompt as a system block.Progressive richness
The context injection can start simple and get richer over time:
Why this matters
Without UI context, the Mayor is a blank-slate chatbot that requires the user to explain everything. With UI context, the Mayor becomes a copilot that shares the user's visual field. The conversation goes from:
to:
The user's dashboard becomes a shared workspace where the Mayor can see what the user sees.
Agent Status Bubbles: Live Progress Updates
Agents periodically emit short, plain-language status updates — one or two sentences describing what they're currently doing. "Installing packages," "Setting up Vite," "Creating the main UI components," "Running tests — 3 failures to fix." These aren't debug logs or SDK events — they're human-readable summaries of intent, written by the agent for the user.
Where they show up
How it works
A new tool
gt_status(or an implicit periodic prompt injection) lets agents emit a status string. The status is lightweight — just a string + timestamp, stored as a bead event or a dedicatedagent_statusfield on the agent metadata. The container streams it to the dashboard via the existing WebSocket event sink.The agent's system prompt includes guidance: "Periodically call gt_status with a brief, user-friendly description of what you're doing. Write it as if you're telling a teammate — not a log line, not a stack trace. One sentence. Do this when you start a new phase of work, not on every tool call."
Why this matters
Right now, watching agents work means either opening the raw terminal stream (noisy, technical) or waiting for the bead to close (no intermediate visibility). Status bubbles give users the feeling of a team working — you glance at the dashboard and see "Maple is writing tests for the API endpoints" and "Shadow is fixing the lint errors from the previous commit." It makes the town feel alive without requiring users to dig into agent streams.
Code Diffs Linked to Beads
Every bead that results in code changes should have its diff viewable directly from the dashboard. When a polecat works on a bead and pushes a branch, the diff between the branch and its base (main) is the artifact of that work. It should be a first-class, clickable object on the bead detail panel — not something you have to go to GitHub to find.
How it works
When a polecat calls
gt_doneand pushes its branch, the TownDO records the branch name and base commit on the bead (or itsreview_metadatasatellite). The dashboard can then fetch and render the diff on demand.Two sources for the diff data:
Container git: The container's git manager can run
git diff main...<branch>in the rig's repo and return the result via a new API endpoint (GET /agents/:agentId/difforGET /git/diff?rig=X&branch=Y). This works while the container is running and the branch exists locally.GitHub/GitLab API: For diffs that outlive the container session (or when the container has slept), fetch the diff via the platform API using the org's integration token.
GET /repos/{owner}/{repo}/compare/{base}...{head}on GitHub returns the full diff. This is the durable path — it works as long as the branch exists on the remote.What the user sees
On the bead detail panel, a "Diff" tab appears when the bead has associated code changes:
For beads that went through the Refinery and merged, the diff shows what landed on main (the squash-merge commit). For beads still in progress, the diff shows the current state of the polecat's branch vs main.
What the Mayor sees
When the Mayor checks on a bead or convoy, diff summaries can be included in the context: "Toast's bead changed 4 files (+120 -45):
src/auth.ts,src/middleware.ts,test/auth.test.ts,package.json." This gives the Mayor a sense of the scope and nature of the work without reading the full diff.Convoy-level diffs
At the convoy level, aggregate the diffs across all tracked beads: total files changed, total lines added/removed, and a merged file list. "The auth refactor convoy touched 12 files across 2 rigs, +340 -180." This is the high-level summary for the user who wants to know the blast radius of a batch of work.
Automated PR Review Response: Agents Address Human Feedback
When using the
prmerge strategy, the TownDO already polls open PRs for merge/close status (#473). Extend this to detect new review comments and dispatch agents to address them — fix the code, push updates, reply to threads, and resolve conversations.The loop
pollPendingPRs()). Extend to also fetch review comments via the GitHub/GitLab API.POST /graphqlmutationresolveReviewThread, GitLab:PUT /discussions/:id/resolve).Comment classification
Not every comment needs a code fix. The agent should classify each comment:
?.instead of!")constoverlethere")What the agent sees in its prompt
Tracking
Each review comment creates a sub-bead linked to the MR bead via
parent_bead_id. The sub-bead'smetadatastores the comment ID, thread ID, file path, and line range. When the agent resolves the thread, the sub-bead closes. The MR bead's progress reflects how many review comments have been addressed.The dashboard shows review comment sub-beads in the MR bead's detail drawer — each comment with its status (open/addressed/resolved) and the agent's reply.
Platform API calls needed
GitHub:
GET /repos/{owner}/{repo}/pulls/{number}/reviews— list reviewsGET /repos/{owner}/{repo}/pulls/{number}/comments— list review commentsPOST /repos/{owner}/{repo}/pulls/{number}/comments/{id}/replies— reply to threadresolveReviewThread(input: { threadId })— resolve conversationGitLab:
GET /projects/{id}/merge_requests/{iid}/discussions— list discussionsPOST /projects/{id}/merge_requests/{iid}/discussions/{discussion_id}/notes— replyPUT /projects/{id}/merge_requests/{iid}/discussions/{discussion_id}?resolved=true— resolveBuilding on existing infrastructure
The
platform-pr.util.tsfrom #473 already hasparseGitUrl(), GitHub/GitLab auth patterns, and Zod response parsing. The comment fetching and reply posting would be natural extensions of the same module. ThepollPendingPRs()method in TownDO already iterates open PRs on every alarm tick — adding comment detection is an additional API call per open PR in the same loop.Wasteland Onboarding + Dolt Integration
Make Gastown by Kilo towns visible to the Wasteland federation by syncing town state to a user-provided DoltHub database. The Wasteland expects Dolt databases — our internal storage is DO SQLite. Rather than replacing or mirroring the entire DO, we sync at the boundary: emit Wasteland-compatible records (beads, completions, agent identity) to DoltHub so the town is a queryable participant in the federation.
Design: Event-sourced sync, not database replication
The TownDO remains the source of truth for all real-time orchestration. Dolt is the town's "public face" — the subset of state that the Wasteland cares about, published on a cadence.
What gets synced to Dolt:
What does NOT get synced:
Sync mechanisms
Incremental sync (steady-state): The TownDO alarm loop (or a slower-cadence task, e.g., every 5 minutes) batches recent
bead_eventssince the last sync cursor, translates them into Dolt-schema inserts, and pushes a commit to DoltHub via their SQL or HTTP API. This is append-mostly — new beads, status transitions, completions. Dolt's versioning gives history for free.Full sync (Wasteland onboarding for existing towns): When a user connects an existing town to the Wasteland for the first time, they have historical beads, convoys, and agent records that predate the Dolt integration. A full sync publishes this backlog.
Full syncs are handled by a dedicated GastownDoltSyncDO (Durable Object, keyed by
{townId}:{syncId}). The flow:GastownDoltSyncDOinstance for the initial full syncThis avoids blocking the TownDO alarm with a potentially large historical export. The sync DO is a one-shot worker that runs to completion and then goes dormant. Future full syncs (e.g., if the user resets their Dolt database) create a new sync DO instance.
User setup flow
Inbound Wasteland work (stretch)
For the town to consume work from the Wasteland (not just publish), a poll-or-webhook mechanism reads wanted posters from the Wasteland's coordination database and creates beads in the TownDO. The Mayor could get a
gt_wasteland_checktool that surfaces available federation work. This is a separate effort from the outbound sync described above.Key architectural points
Kiloclaw Integration
Sub-issues
Other Kiloclaw integration opportunities (not yet scoped)
gt_sling,gt_list_beads, etc.Shared prerequisite: Gastown MCP Server
All Kiloclaw integrations depend on a Gastown MCP server package that exposes town operations as MCP tools. This is independently useful — any MCP-compatible agent (not just Kiloclaw) could use it to interact with Gastown towns.