Skip to content

fix(gastown): add JWT token refresh for persistent agents#964

Closed
jrf0110 wants to merge 1 commit intomainfrom
923-agent-jwt-token-refresh
Closed

fix(gastown): add JWT token refresh for persistent agents#964
jrf0110 wants to merge 1 commit intomainfrom
923-agent-jwt-token-refresh

Conversation

@jrf0110
Copy link
Copy Markdown
Contributor

@jrf0110 jrf0110 commented Mar 10, 2026

Summary

  • Fix for Bug: Agent JWT expires after 8h with no refresh — Mayor tool calls 401 #923: Agent JWTs (8h expiry) are now proactively refreshed so persistent agents (Mayor, long-running refinery) never encounter 401s on tool calls, heartbeats, or event persistence.
  • Hybrid approach (Option D): Fresh tokens are pushed on every user interaction (sendMayorMessage/ensureMayor isAlive paths) AND proactively by the TownDO alarm (hourly for all working/stalled agents). This covers both the common case (user keeps chatting) and the edge case (agents running without messages).
  • Full token pipeline: Refresh updates the ManagedAgent record (broadcastEvent, completion reporting), the heartbeat module, AND the plugin client instances (tool-call API requests) via a globalThis symbol-keyed registry that bridges the container src/ and plugin/ TypeScript project roots.

Closes #923

Verification

  • pnpm typecheck — passes with zero errors
  • pnpm vitest run — all 108 tests pass across 7 test files (including 6 new token refresh tests)

Visual Changes

N/A

Reviewer Notes

  • The plugin client registry uses Symbol.for('gastown.pluginClientRegistry') on globalThis to bridge the container/src/ and container/plugin/ TypeScript project roots (they can't cross-import). Both sides access the same array in the same Bun process. This is a deliberate design choice — the alternative would be restructuring the TS project boundaries.
  • refreshRunningAgentTokens() is throttled to once per hour via a simple timestamp check (not persisted across DO restarts, which is fine — a restart re-dispatches agents with fresh tokens anyway).
  • The ensureMayor path fires a void (fire-and-forget) token refresh because it returns immediately without waiting for the agent to process a message. The sendMayorMessage path awaits the refresh before sending the message to ensure the agent has a valid token for any tool calls triggered by that message.
  • Test fixes: Added townId to TEST_ENV and updated URL expectations to include /towns/{townId}/ — the test fixture was stale (missing the townId field added during the beads-centric refactor).

Agent JWTs minted once at dispatch with 8h expiry caused 401s on all
tool calls for persistent agents (Mayor) running longer than 8 hours.

Hybrid approach (Option D):
- On message delivery: sendMayorMessage/ensureMayor mint and push a
  fresh JWT when the agent is already alive in the container
- Proactive alarm refresh: TownDO alarm refreshes tokens for all
  working/stalled agents hourly (safety net for agents not receiving
  user messages, e.g. long refinery reviews)

Container side:
- POST /agents/:id/refresh-token endpoint hot-swaps tokens on running
  agents (ManagedAgent record, heartbeat module, plugin clients)
- GastownClient/MayorGastownClient gain setToken() for live updates
- Global plugin client registry (Symbol.for keyed on globalThis) lets
  the control server reach plugin instances across TS project roots

Closes #923
// Also update the heartbeat module's token so heartbeat POSTs use the
// fresh JWT. The heartbeat uses a single module-level token shared
// across all agents, so any refresh keeps heartbeats alive.
updateHeartbeatToken(parsed.data.token);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Refreshing one agent's token breaks heartbeats for every other running agent

updateHeartbeatToken() stores a single module-level JWT, but /api/towns/:townId/rigs/:rigId/agents/:agentId/heartbeat is protected by agentOnlyMiddleware, so the token has to match the specific :agentId on each request. As soon as one agent refreshes here, the heartbeat loop starts sending that agent's token for all other active agents and their heartbeats will begin returning 403/401.


/** Register a plugin client so its token can be refreshed externally. */
export function registerPluginClient(client: TokenRefreshable): void {
getRegistry().push(client);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: The plugin client registry never releases completed sessions

Every plugin initialization pushes another client into this global array, but there is no matching unregister on session teardown. In a long-lived town container this grows without bound and every token refresh will sweep stale clients from old sessions, which is both a memory leak and an avoidable O(n) cost on each refresh.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Mar 10, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
cloudflare-gastown/container/src/control-server.ts 185 Refreshing one agent's token overwrites the shared heartbeat JWT, so heartbeats for other running agents will start failing auth.
cloudflare-gastown/container/plugin/client.ts 366 Plugin clients are only appended to the global registry, so long-lived containers leak stale sessions and make each token refresh sweep an ever-growing array.

Fix these issues in Kilo Cloud

Other Observations (not in diff)

None.

Files Reviewed (9 files)
  • cloudflare-gastown/container/plugin/client.test.ts - 0 issues
  • cloudflare-gastown/container/plugin/client.ts - 1 issue
  • cloudflare-gastown/container/plugin/index.ts - 0 issues
  • cloudflare-gastown/container/src/control-server.ts - 1 issue
  • cloudflare-gastown/container/src/heartbeat.ts - 0 issues
  • cloudflare-gastown/container/src/process-manager.ts - 0 issues
  • cloudflare-gastown/container/src/types.ts - 0 issues
  • cloudflare-gastown/src/dos/Town.do.ts - 0 issues
  • cloudflare-gastown/src/dos/town/container-dispatch.ts - 0 issues

@jrf0110
Copy link
Copy Markdown
Contributor Author

jrf0110 commented Mar 10, 2026

Closing in favor of #988

@jrf0110 jrf0110 closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Agent JWT expires after 8h with no refresh — Mayor tool calls 401

1 participant