feat(polecat): persist agent conversation across container restarts#1300
feat(polecat): persist agent conversation across container restarts#1300
Conversation
Add AgentEventOutput schema and RpcAgentEventOutput wrapper to trpc/schemas.ts, then wire up a getAgentEvents gastownProcedure in trpc/router.ts that delegates to TownDO.getAgentEvents() with cursor-based pagination (afterId, limit).
…n-across-contai/017955a4/gt/toast/c278af77' into convoy/persist-agent-conversation-across-contai/017955a4/head
Adds reconstructConversation() that takes a sequence of AgentDO streaming
events and reassembles them into clean { role, content } turns. Handles
message.updated / message.completed (info payload), message_part.updated
(both underscore and dot variants), tool-only turns, synthetic/ignored
parts, parts arriving before message info, and malformed events. Supports
configurable maxTurns truncation (default 50, most-recent kept).
21 unit tests covering happy path, edge cases, and truncation.
…atch When the Mayor container is dead and needs re-dispatch, reconstruct the prior conversation using reconstructConversation() and inject it into beadBody so the Mayor resumes with full context. Also fix the checkpoint: null bug in sendMayorMessage — now reads mayor.checkpoint from the agent record instead of always passing null.
When a polecat is re-dispatched after a container restart, reconstruct the agent's prior session from AgentDO events and inject it as 'Prior conversation:...' in beadBody, matching the same pattern used for Mayor re-dispatch in sendMayorMessage(). This prevents duplicate work and gives the new container full context of what was done before.
nonneg() does not exist in the installed version of zod; min(0) is the correct equivalent.
| .query(async ({ ctx, input }) => { | ||
| const rig = await verifyRigOwnership(ctx.env, ctx.userId, input.rigId, ctx.orgMemberships); | ||
| const townStub = getTownDOStub(ctx.env, rig.town_id); | ||
| return townStub.getAgentEvents(input.agentId, input.afterId, input.limit); |
There was a problem hiding this comment.
CRITICAL: Missing rig-to-agent authorization check
This endpoint authorizes input.rigId but then returns events for an arbitrary input.agentId. Because TownDO.getAgentEvents() simply proxies to the AgentDO by id, a caller who knows another agent UUID can read its full transcript without proving that agent belongs to the requested rig/town.
| // Reconstruct the agent's prior session transcript and inject it on | ||
| // re-dispatch (after a container restart) so work isn't duplicated. | ||
| // The presence of prior events is the signal: a fresh container has none. | ||
| const rawEvents = await this.getAgentEvents(agent.id); |
There was a problem hiding this comment.
WARNING: Prior transcript is not scoped to the current bead/session
getAgentEvents(agent.id) returns the agent's entire event log. Polecats are reused once they go idle, and refineries are singleton per rig, so after a later container restart this will replay transcript from previous beads into the next assignment. This needs a bead/session boundary or an event-log reset before reconstructing context.
| const priorTranscript = priorTurns | ||
| .map(t => `[${t.role === 'user' ? 'User' : 'Assistant'}]: ${t.content}`) | ||
| .join('\n\n'); | ||
| beadBody = `Prior conversation:\n\n${priorTranscript}`; |
There was a problem hiding this comment.
WARNING: This drops the bead body on resume
When prior turns exist, beadBody becomes only the reconstructed transcript, so the original bead description and acceptance criteria in bead.body disappear from the restart prompt. A restarted agent can lose the task details even though the transcript is restored.
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge
Fix these issues in Kilo Cloud Issue Details (click to expand)CRITICAL
WARNING
Other Observations (not in diff)None. Files Reviewed (5 files)
Reviewed by gpt-5.4-20260305 · 1,085,843 tokens |
Refinery Review — Request ChangesAll CI checks pass and the core 1. CRITICAL — Authorization gap in
|
|
Refinery code review passed. Previously requested review issues appear resolved in the latest diff. |
|
Refinery re-review complete. Previously raised issues are still present: the new tRPC endpoint authorizes only the rig id but does not verify that the requested agent belongs to that rig/town, and the restart transcript recovery still replays the agent's full historical event log while replacing the original bead body. PR is not ready to merge. |
Summary
Adds conversation transcript reconstruction and injection so that agents resume with full context after a container restart, rather than starting fresh.
reconstructConversationutility (cloudflare-gastown/src/util/reconstruct-conversation.util.ts) that rebuilds{ role, content }turns from the raw stream of AgentDO events (message.updated,message.completed,message_part.updated). Handles streaming edge cases: parts arriving before message info, synthetic/ignored parts, tool-only turns, and mid-stream crashes.beadBodyso the new container sees the prior session.checkpoint.getAgentEventsvia a new tRPC endpoint (with rig ownership verification) so clients can query agent events directly.z.number().nonneg()→z.number().min(0)sincenonneg()is not available in the installed Zod version.Verification
cloudflare-gastown/test/unit/reconstruct-conversation.test.tscovers: basic user↔assistant exchanges, multi-part concatenation, streaming updates, part-before-message ordering, tool-only skipping, synthetic/ignored parts, summary.body fallback, malformed/unknown events, and truncation tomaxTurns.Visual Changes
N/A
Reviewer Notes
The
getAgentEventsmethod onTownDOalready existed and typed its return asPromise<unknown[]>for cross-DO type safety. The new dispatch code usesRigAgentEventRecord.array().safeParse(rawEvents)defensively and falls back to an empty transcript if parsing fails, so the re-dispatch path is never broken by unexpected event shapes.The
maxTurnsdefault of 50 is conservative — it keeps the most recent context while bounding prompt size.