Parent
Part of #204 (Phase 4: Hardening)
Problem
The Town DO alarm loop introduces noticeable latency in the UI. Mayor-slung beads take seconds to appear, agent hookings take a while to transition beads to in_progress, and PR merges take multiple alarm cycles to propagate. The root causes:
-
armAlarmIfNeeded() doesn't bump a pending idle alarm. When a town is idle (alarm set 60s in the future) and new work arrives, the method checks if (!current || current < Date.now()) — if an alarm is pending, it does nothing. Work waits up to 60s for the existing idle alarm to fire. Affects slingConvoy(), agentDone(), agentCompleted(), submitToReviewQueue(), and all other callers.
-
Active interval (5s) runs every phase on every tick. Bead assignment, event drain, PR polling, patrol, GC, mail delivery, and status broadcast all run every 5s. Most of this work is unnecessary most ticks, but the bundling means the fast stuff (event drain + reconcile + dispatch) can't run faster without also running the slow stuff more often.
-
PR polling has no rate limiting. Every in-progress MR bead with a pr_url hits the GitHub API every 5s. A town with 5 open PRs makes ~60 API calls/minute against GitHub's 5000/hour (~83/min) authenticated rate limit. No headroom for other consumers sharing the token.
-
Event-driven work routes through the poll loop. agentDone inserts a town_event → waits for alarm → Phase 0 drains → Phase 1 reconciles → Phase 2 side effects. That's 5-10s of latency for an operation that already has full context at call time. Similarly, slingConvoy creates beads but relies on the reconciler to assign agents on the next tick.
Proposed Changes
A. Fix armAlarmIfNeeded() — bump idle alarms to active interval
Effort: Low | Impact: High | Risk: Very low
When new work arrives during an idle period, reschedule the alarm to fire within the active interval instead of waiting for the idle alarm:
private async armAlarmIfNeeded(): Promise<void> {
const storedId = await this.ctx.storage.get<string>('town:id');
if (!storedId) return;
const current = await this.ctx.storage.getAlarm();
const activeDeadline = Date.now() + ACTIVE_ALARM_INTERVAL_MS;
if (!current || current > activeDeadline) {
await this.ctx.storage.setAlarm(activeDeadline);
}
}
Eliminates the 60s idle-to-active transition penalty.
B. Split the alarm into fast and slow phases
Effort: Medium | Impact: High | Risk: Medium
Not all work needs to run every tick. Separate into:
- Fast path (1-2s): Event drain → reconcile beads/agents → dispatch → status broadcast
- Slow path (30-60s): PR polling, patrol/GUPP, GC, mail delivery, escalation checks, container health
Use a timestamp to throttle slow phases:
const FAST_ALARM_INTERVAL_MS = 2_000;
const SLOW_PHASE_INTERVAL_MS = 30_000;
// In alarm():
const now = Date.now();
const runSlowPhase = !this.lastSlowPhaseAt || (now - this.lastSlowPhaseAt) >= SLOW_PHASE_INTERVAL_MS;
if (runSlowPhase) this.lastSlowPhaseAt = now;
This lets bead assignment and agent hooking happen in ~2s while keeping expensive operations throttled.
C. Add PR polling rate limiting
Effort: Low | Impact: Medium | Risk: Low
- Track
last_polled_at on MR beads, skip if polled within the last 30s
- Consider GitHub webhooks for immediate PR status updates (eliminates polling for GitHub repos entirely)
D. Process events inline where possible
Effort: Medium | Impact: Medium | Risk: Medium
For operations that already have full context (like agentDone), apply the state transition immediately in the RPC handler instead of inserting an event and waiting for the alarm to drain it. The event can still be inserted for audit purposes. This eliminates the 5-10s round-trip through the alarm loop for common operations.
E. Immediate dispatch for slingConvoy
Effort: Low | Impact: Medium | Risk: Low
After creating convoy beads, run a targeted mini-reconcile that assigns agents to the initially-unblocked beads, the same way slingBead() already does fire-and-forget dispatch. Eliminates the 5s wait for the first batch of convoy beads.
Priority Order
| Change |
Effort |
Impact |
Risk |
| A. Fix armAlarmIfNeeded |
Low |
High (eliminates 60s idle penalty) |
Very low |
| B. Split fast/slow phases |
Medium |
High (2s bead assignment) |
Medium |
| C. PR polling rate limit |
Low |
Medium (prevents rate limit issues) |
Low |
| D. Inline event processing |
Medium |
Medium (eliminates 5-10s for agent done) |
Medium |
| E. Immediate convoy dispatch |
Low |
Medium (faster convoy starts) |
Low |
Key Files
cloudflare-gastown/src/dos/Town.do.ts — alarm handler (line 2830), armAlarmIfNeeded() (line 3442), interval constants (line 121)
cloudflare-gastown/src/dos/town/scheduling.ts — dispatch logic, hasActiveWork()
cloudflare-gastown/src/dos/town/reconciler.ts — all reconciliation rules, PR polling actions
cloudflare-gastown/src/dos/town/actions.ts — action application and deferred side effects
cloudflare-gastown/src/dos/town/patrol.ts — GUPP thresholds and patrol checks
Notes
- Change A can be shipped independently and immediately — it's a one-line fix with no risk
- Change B is the big architectural improvement but needs careful testing to ensure slow-phase operations still run reliably
- Change C is important for correctness regardless of latency goals — without it, towns with many open PRs can hit GitHub rate limits
- Changes D and E are incremental optimizations that can be done independently
Parent
Part of #204 (Phase 4: Hardening)
Problem
The Town DO alarm loop introduces noticeable latency in the UI. Mayor-slung beads take seconds to appear, agent hookings take a while to transition beads to in_progress, and PR merges take multiple alarm cycles to propagate. The root causes:
armAlarmIfNeeded()doesn't bump a pending idle alarm. When a town is idle (alarm set 60s in the future) and new work arrives, the method checksif (!current || current < Date.now())— if an alarm is pending, it does nothing. Work waits up to 60s for the existing idle alarm to fire. AffectsslingConvoy(),agentDone(),agentCompleted(),submitToReviewQueue(), and all other callers.Active interval (5s) runs every phase on every tick. Bead assignment, event drain, PR polling, patrol, GC, mail delivery, and status broadcast all run every 5s. Most of this work is unnecessary most ticks, but the bundling means the fast stuff (event drain + reconcile + dispatch) can't run faster without also running the slow stuff more often.
PR polling has no rate limiting. Every in-progress MR bead with a
pr_urlhits the GitHub API every 5s. A town with 5 open PRs makes ~60 API calls/minute against GitHub's 5000/hour (~83/min) authenticated rate limit. No headroom for other consumers sharing the token.Event-driven work routes through the poll loop.
agentDoneinserts atown_event→ waits for alarm → Phase 0 drains → Phase 1 reconciles → Phase 2 side effects. That's 5-10s of latency for an operation that already has full context at call time. Similarly,slingConvoycreates beads but relies on the reconciler to assign agents on the next tick.Proposed Changes
A. Fix
armAlarmIfNeeded()— bump idle alarms to active intervalEffort: Low | Impact: High | Risk: Very low
When new work arrives during an idle period, reschedule the alarm to fire within the active interval instead of waiting for the idle alarm:
Eliminates the 60s idle-to-active transition penalty.
B. Split the alarm into fast and slow phases
Effort: Medium | Impact: High | Risk: Medium
Not all work needs to run every tick. Separate into:
Use a timestamp to throttle slow phases:
This lets bead assignment and agent hooking happen in ~2s while keeping expensive operations throttled.
C. Add PR polling rate limiting
Effort: Low | Impact: Medium | Risk: Low
last_polled_aton MR beads, skip if polled within the last 30sD. Process events inline where possible
Effort: Medium | Impact: Medium | Risk: Medium
For operations that already have full context (like
agentDone), apply the state transition immediately in the RPC handler instead of inserting an event and waiting for the alarm to drain it. The event can still be inserted for audit purposes. This eliminates the 5-10s round-trip through the alarm loop for common operations.E. Immediate dispatch for
slingConvoyEffort: Low | Impact: Medium | Risk: Low
After creating convoy beads, run a targeted mini-reconcile that assigns agents to the initially-unblocked beads, the same way
slingBead()already does fire-and-forget dispatch. Eliminates the 5s wait for the first batch of convoy beads.Priority Order
Key Files
cloudflare-gastown/src/dos/Town.do.ts— alarm handler (line 2830),armAlarmIfNeeded()(line 3442), interval constants (line 121)cloudflare-gastown/src/dos/town/scheduling.ts— dispatch logic,hasActiveWork()cloudflare-gastown/src/dos/town/reconciler.ts— all reconciliation rules, PR polling actionscloudflare-gastown/src/dos/town/actions.ts— action application and deferred side effectscloudflare-gastown/src/dos/town/patrol.ts— GUPP thresholds and patrol checksNotes