Overview
Replace the current stdin/stdout-based agent process management in the Town Container with Kilo's built-in HTTP server (kilo serve). The container currently spawns kilo code --non-interactive as fire-and-forget child processes and communicates via raw stdin pipes. This is fragile and provides no structured observability.
Decision: We are going forward with the kilo serve route. See analysis: docs/gt/opencode-server-analysis.md
Context
kilo serve starts a headless HTTP server (OpenAPI 3.1) with session management, structured message sending, SSE event streaming, abort/fork/revert, diff inspection, and more. The SDK (@kilocode/sdk/v2/server) provides createOpencodeServer() to manage server lifecycle.
Current flow:
Container Control Server (port 8080)
└── Bun.spawn('kilo code --non-interactive') × N agents
└── stdin/stdout pipes (fragile, unstructured)
Target flow:
Container Control Server (port 8080)
└── kilo serve (port 4096+N) × M server instances (one per worktree)
└── HTTP API: POST /session/:id/message, GET /event (SSE), etc.
Scope
1. Replace process-manager.ts internals
- Instead of
Bun.spawn(['kilo', 'code', '--non-interactive', ...]), use createOpencodeServer() from @kilocode/sdk/v2/server (or equivalent) to start kilo serve instances
- One
kilo serve instance per worktree/project directory (since a server is scoped to one project)
- Manage port allocation for multiple server instances within the container
- Track server instances and their sessions instead of raw child processes
2. Replace stdin-based messaging with HTTP API
sendMessage(agentId, prompt) → POST /session/:id/message or POST /session/:id/prompt_async
getProcessStatus(agentId) → GET /session/status (structured session-level status)
- Agent abort →
POST /session/:id/abort (clean abort instead of SIGTERM)
3. Replace agent-runner.ts startup flow
- After git clone/worktree setup, start a
kilo serve instance for the worktree (if not already running)
- Create a new session on the server:
POST /session
- Send the initial prompt via
POST /session/:id/message with model/agent/system-prompt configuration
- Return session ID as the agent's handle (instead of process PID)
4. Wire up SSE event streaming
- Subscribe to
GET /event on each kilo serve instance
- Forward relevant events (tool calls, completions, errors) to the heartbeat reporter
- This replaces the raw stdout pipe reading with typed, structured events
- Enables the future WebSocket streaming endpoint (
/agents/:agentId/stream) referenced in the TODO
5. Update control server endpoints
| Endpoint |
Current |
After |
POST /agents/start |
Spawns kilo process |
Creates session on kilo server |
POST /agents/:id/message |
Writes to stdin pipe |
POST /session/:id/message |
GET /agents/:id/status |
Process lifecycle (pid, exit code) |
Session status (active tools, message count, etc.) |
POST /agents/:id/stop |
SIGTERM/SIGKILL on process |
POST /session/:id/abort + optionally stop server if no more sessions |
GET /health |
Process count |
Server instance count + session count |
6. Update heartbeat reporter
- Report session-level status instead of process-level status
- Include active tool calls and last message info from SSE events
What stays the same
- Git clone/worktree management (
git-manager.ts) — unchanged
- Container control server (port 8080) — same interface for TownContainer DO
- Agent environment variable setup — still needed for gastown plugin config
- Dockerfile — still needs
kilo installed globally
Acceptance Criteria
Risks & Notes
- Port management: Each
kilo serve needs its own port. Need port allocation strategy (e.g., 4096 + incrementing counter)
- One server per worktree: A kilo server is scoped to one project dir. Multiple agents sharing a worktree can share a server with separate sessions; agents in different worktrees need separate servers
- Resource overhead: Marginal —
kilo serve is a single Bun process either way, just with HTTP server overhead instead of raw stdin/stdout
- Migration path: Can be done incrementally — start with HTTP messaging, then add SSE, then refine status reporting
Parent issue: #204
Overview
Replace the current stdin/stdout-based agent process management in the Town Container with Kilo's built-in HTTP server (
kilo serve). The container currently spawnskilo code --non-interactiveas fire-and-forget child processes and communicates via raw stdin pipes. This is fragile and provides no structured observability.Decision: We are going forward with the
kilo serveroute. See analysis:docs/gt/opencode-server-analysis.mdContext
kilo servestarts a headless HTTP server (OpenAPI 3.1) with session management, structured message sending, SSE event streaming, abort/fork/revert, diff inspection, and more. The SDK (@kilocode/sdk/v2/server) providescreateOpencodeServer()to manage server lifecycle.Current flow:
Target flow:
Scope
1. Replace
process-manager.tsinternalsBun.spawn(['kilo', 'code', '--non-interactive', ...]), usecreateOpencodeServer()from@kilocode/sdk/v2/server(or equivalent) to startkilo serveinstanceskilo serveinstance per worktree/project directory (since a server is scoped to one project)2. Replace stdin-based messaging with HTTP API
sendMessage(agentId, prompt)→POST /session/:id/messageorPOST /session/:id/prompt_asyncgetProcessStatus(agentId)→GET /session/status(structured session-level status)POST /session/:id/abort(clean abort instead of SIGTERM)3. Replace
agent-runner.tsstartup flowkilo serveinstance for the worktree (if not already running)POST /sessionPOST /session/:id/messagewith model/agent/system-prompt configuration4. Wire up SSE event streaming
GET /eventon eachkilo serveinstance/agents/:agentId/stream) referenced in the TODO5. Update control server endpoints
POST /agents/startPOST /agents/:id/messagePOST /session/:id/messageGET /agents/:id/statusPOST /agents/:id/stopPOST /session/:id/abort+ optionally stop server if no more sessionsGET /health6. Update heartbeat reporter
What stays the same
git-manager.ts) — unchangedkiloinstalled globallyAcceptance Criteria
kilo serveinstances instead ofkilo code --non-interactiveprocessesRisks & Notes
kilo serveneeds its own port. Need port allocation strategy (e.g., 4096 + incrementing counter)kilo serveis a single Bun process either way, just with HTTP server overhead instead of raw stdin/stdoutParent issue: #204