You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of 2026-04-01 15:00 UTC, polecat dispatch is failing across 151+ towns at a rate of ~4,075 failures per hour (~300 per 5-minute bucket). The failure rate has been steady for at least 3 hours (since ~12:05 UTC). All dispatch failures have empty error strings — the actual failure reason is not being logged.
Evidence
Analytics Engine data for the last hour:
151+ unique town IDs with agent.dispatch_failed events
All failures are for polecat role (refinery and mayor not affected)
Error field (blob5) is empty on all events
Failure rate is steady (not a spike — chronic issue)
Top offender: town d498f44e-... with 595 failures in 1 hour
Additionally:
One DO (7c017069-...) is in overload state: 477 "Durable Object is overloaded" errors in 1 hour
One DO (505b54c4-...) hitting SQLITE_TOOBIG errors on agent-events.create
Impact
High — polecats are the primary work-doing agents. If dispatch is failing, no beads are being worked on across the platform. Towns appear to be "doing nothing" even when there is open work.
Likely Causes
Container infrastructure issue — polecat containers may be failing to start, hitting resource limits, or encountering image pull failures. The empty error string suggests the failure happens before the error can be captured (e.g., the startAgentInContainer HTTP call times out or gets a non-JSON error response).
Polecat-specific configuration issue — since refinery and mayor are NOT affected, the issue may be specific to how polecats are dispatched (different Container image, different startup sequence, different env vars).
Bug
As of 2026-04-01 15:00 UTC, polecat dispatch is failing across 151+ towns at a rate of ~4,075 failures per hour (~300 per 5-minute bucket). The failure rate has been steady for at least 3 hours (since ~12:05 UTC). All dispatch failures have empty error strings — the actual failure reason is not being logged.
Evidence
Analytics Engine data for the last hour:
agent.dispatch_failedeventspolecatrole (refinery and mayor not affected)blob5) is empty on all eventsd498f44e-...with 595 failures in 1 hourAdditionally:
7c017069-...) is in overload state: 477 "Durable Object is overloaded" errors in 1 hour505b54c4-...) hitting SQLITE_TOOBIG errors on agent-events.createImpact
High — polecats are the primary work-doing agents. If dispatch is failing, no beads are being worked on across the platform. Towns appear to be "doing nothing" even when there is open work.
Likely Causes
Container infrastructure issue — polecat containers may be failing to start, hitting resource limits, or encountering image pull failures. The empty error string suggests the failure happens before the error can be captured (e.g., the
startAgentInContainerHTTP call times out or gets a non-JSON error response).The fix(gastown): No circuit breaker on dispatch failures — dead container causes 70h runaway loop (+ spend) #1653 pattern at scale — no circuit breaker on dispatch failures means every failed polecat gets retried every tick across every town. 151 towns × ~27 retries/hour/town = 4,075 failures. The actual number of "stuck" polecats could be much smaller — most failures are retries.
Polecat-specific configuration issue — since refinery and mayor are NOT affected, the issue may be specific to how polecats are dispatched (different Container image, different startup sequence, different env vars).
Investigation Needed
startAgentInContainerfails? The empty error string needs to be fixed (fix(gastown): No circuit breaker on dispatch failures — dead container causes 70h runaway loop (+ spend) #1653 Fix 3).Related
Files
src/dos/town/actions.ts—dispatch_agentaction handler (where error should be logged)src/dos/town/container-dispatch.ts—startAgentInContainer(where the actual failure occurs)