Bug
Town containers never reach their sleepAfter = 30m idle threshold, even when no work is active and no user is chatting. Containers stay alive indefinitely, consuming resources and billing.
Root Cause
Three factors compound to keep the container permanently awake:
1. Mayor stays working forever (primary cause)
When the mayor is started (ensureMayor or sendMayorMessage), its agent status is set to working. It never transitions back unless the container process calls agentCompleted. But the mayor is designed to stay alive waiting for user input — it does not exit on idle the way polecats do (the 2-minute idle timer is for polecats/refineries, not the mayor per handleIdleEvent in process-manager.ts:429).
Because the mayor is working:
hasActiveWork() in scheduling.ts returns true
- The alarm stays on the 5-second interval (instead of 60s for idle towns)
ensureContainerReady() fires every 5s → sends GET /health to the container
- Container status observation fires every 5s → sends status check per working agent
- The container receives HTTP requests every 5 seconds, resetting its
sleepAfter idle timer every time
2. ensureContainerReady() health check resets the container idle timer
Even when the mayor is the only agent, ensureContainerReady() (Town.do.ts:3452) calls container.fetch("http://container/health") every 5 seconds because hasActiveWork() is true (the mayor is working). Each fetch resets the container's sleepAfter 30-minute idle countdown.
3. refreshContainerToken in-memory throttle resets on DO eviction (#1409)
The hourly token refresh throttle is stored as an in-memory property (lastContainerTokenRefreshAt = 0). When the DO is evicted and re-instantiated, the throttle resets and the refresh fires on the next alarm tick. The refresh sends two requests to the container:
container.setEnvVar(...)
container.fetch("/refresh-token")
Both reset the container idle timer. With a 60-second idle alarm interval (if the mayor issue were fixed), this would fire once per DO eviction instead of once per hour.
The Wake-Up Chain
Mayor started → status = "working" → never transitions to idle
→ hasActiveWork() = true
→ alarm fires every 5s
→ ensureContainerReady() → GET /health → resets container idle timer
→ container status observation → GET /agents/:mayorId/status → resets container idle timer
→ refreshContainerToken() (when throttle is cold) → POST /refresh-token → resets container idle timer
→ container NEVER reaches 30 min of inactivity
→ sleepAfter = "30m" NEVER triggers
Fix
Fix 1 (Critical): Mayor lifecycle — idle the mayor when not actively processing
When the mayor finishes processing a prompt and returns to waiting-for-input (session.idle event), the container should transition the mayor to an idle-like state that the TownDO recognizes as "not actively working."
Options:
- A) The container calls
agentCompleted when the mayor goes idle. The reconciler transitions it to idle. When the user sends a new message, sendMayorMessage re-dispatches. This is the simplest but adds latency to mayor responses (re-dispatch takes a few seconds).
- B) Introduce a new agent status
waiting (or use idle with a flag) that means "alive in container, waiting for input, but not doing LLM work." hasActiveWork() treats waiting agents as inactive. The alarm drops to 60s. Container health checks stop. But the container stays alive for the sleepAfter window, so the mayor is still available for instant responses if the user messages within 30 minutes.
- C)
hasActiveWork() explicitly excludes mayors: WHERE status = "working" AND role != "mayor". Simplest code change, but a bit hacky — conflates role with lifecycle.
Recommendation: Option B provides the best UX (instant mayor responses within the sleep window) with correct semantics. Option C is an acceptable stopgap.
Fix 2: Gate health checks on non-mayor active work
If Fix 1 uses Option C (exclude mayors from hasActiveWork), also update ensureContainerReady() to skip when the only working agent is the mayor:
if (!hasWork) {
// Skip if the only "work" is the mayor waiting for user input
if (!isRecentlyConfigured) return;
}
With Option B, this is automatic — hasActiveWork() returns false when the mayor is waiting.
Fix 3: Persist token refresh throttle (#1409)
Already filed as #1409. Store lastContainerTokenRefreshAt in ctx.storage instead of an in-memory property. This prevents the throttle from resetting on DO eviction.
Fix 4: Increase idle alarm interval
Already proposed in #1409 — change IDLE_ALARM_INTERVAL_MS from 60s to 300s (5 min). With the mayor fix, the alarm would drop to 5 min when no polecats/refineries are active. At 5 min intervals, ensureContainerReady() would not fire (no active work), and the container would receive at most 1 request per 5 minutes (token refresh, if the throttle expired). The container would idle and sleep after 30 minutes.
Expected Result After All Fixes
User stops chatting with mayor
→ Mayor session.idle fires → container sets mayor to "waiting"
→ agentCompleted or status update propagated to TownDO
→ hasActiveWork() returns false (no polecats/refineries working)
→ Alarm drops to 5-minute interval
→ No health checks, no status observations
→ Token refresh: at most once/hour, persisted throttle survives eviction
→ Container receives ~0 requests
→ After 30 minutes of inactivity → sleepAfter triggers → container sleeps
→ User sends new message → sendMayorMessage dispatches mayor → container wakes
Files
src/dos/Town.do.ts — hasActiveWork() usage (alarm interval), ensureContainerReady(), refreshContainerToken()
src/dos/town/scheduling.ts — hasActiveWork() query
container/src/process-manager.ts — handleIdleEvent(), mayor idle timer exclusion
container/src/control-server.ts — agentCompleted reporting for mayor
src/dos/town/agents.ts — agent status enum (if adding waiting)
Related
Bug
Town containers never reach their
sleepAfter = 30midle threshold, even when no work is active and no user is chatting. Containers stay alive indefinitely, consuming resources and billing.Root Cause
Three factors compound to keep the container permanently awake:
1. Mayor stays
workingforever (primary cause)When the mayor is started (
ensureMayororsendMayorMessage), its agent status is set toworking. It never transitions back unless the container process callsagentCompleted. But the mayor is designed to stay alive waiting for user input — it does not exit on idle the way polecats do (the 2-minute idle timer is for polecats/refineries, not the mayor perhandleIdleEventinprocess-manager.ts:429).Because the mayor is
working:hasActiveWork()inscheduling.tsreturnstrueensureContainerReady()fires every 5s → sendsGET /healthto the containersleepAfteridle timer every time2.
ensureContainerReady()health check resets the container idle timerEven when the mayor is the only agent,
ensureContainerReady()(Town.do.ts:3452) callscontainer.fetch("http://container/health")every 5 seconds becausehasActiveWork()is true (the mayor is working). Each fetch resets the container'ssleepAfter30-minute idle countdown.3.
refreshContainerTokenin-memory throttle resets on DO eviction (#1409)The hourly token refresh throttle is stored as an in-memory property (
lastContainerTokenRefreshAt = 0). When the DO is evicted and re-instantiated, the throttle resets and the refresh fires on the next alarm tick. The refresh sends two requests to the container:container.setEnvVar(...)container.fetch("/refresh-token")Both reset the container idle timer. With a 60-second idle alarm interval (if the mayor issue were fixed), this would fire once per DO eviction instead of once per hour.
The Wake-Up Chain
Fix
Fix 1 (Critical): Mayor lifecycle — idle the mayor when not actively processing
When the mayor finishes processing a prompt and returns to waiting-for-input (
session.idleevent), the container should transition the mayor to an idle-like state that the TownDO recognizes as "not actively working."Options:
agentCompletedwhen the mayor goes idle. The reconciler transitions it toidle. When the user sends a new message,sendMayorMessagere-dispatches. This is the simplest but adds latency to mayor responses (re-dispatch takes a few seconds).waiting(or useidlewith a flag) that means "alive in container, waiting for input, but not doing LLM work."hasActiveWork()treatswaitingagents as inactive. The alarm drops to 60s. Container health checks stop. But the container stays alive for thesleepAfterwindow, so the mayor is still available for instant responses if the user messages within 30 minutes.hasActiveWork()explicitly excludes mayors:WHERE status = "working" AND role != "mayor". Simplest code change, but a bit hacky — conflates role with lifecycle.Recommendation: Option B provides the best UX (instant mayor responses within the sleep window) with correct semantics. Option C is an acceptable stopgap.
Fix 2: Gate health checks on non-mayor active work
If Fix 1 uses Option C (exclude mayors from
hasActiveWork), also updateensureContainerReady()to skip when the only working agent is the mayor:With Option B, this is automatic —
hasActiveWork()returns false when the mayor iswaiting.Fix 3: Persist token refresh throttle (#1409)
Already filed as #1409. Store
lastContainerTokenRefreshAtinctx.storageinstead of an in-memory property. This prevents the throttle from resetting on DO eviction.Fix 4: Increase idle alarm interval
Already proposed in #1409 — change
IDLE_ALARM_INTERVAL_MSfrom 60s to 300s (5 min). With the mayor fix, the alarm would drop to 5 min when no polecats/refineries are active. At 5 min intervals,ensureContainerReady()would not fire (no active work), and the container would receive at most 1 request per 5 minutes (token refresh, if the throttle expired). The container would idle and sleep after 30 minutes.Expected Result After All Fixes
Files
src/dos/Town.do.ts—hasActiveWork()usage (alarm interval),ensureContainerReady(),refreshContainerToken()src/dos/town/scheduling.ts—hasActiveWork()querycontainer/src/process-manager.ts—handleIdleEvent(), mayor idle timer exclusioncontainer/src/control-server.ts—agentCompletedreporting for mayorsrc/dos/town/agents.ts— agent status enum (if addingwaiting)Related