fix(gastown): Agents permanently stuck after exhausting dispatch_attempts — no reset mechanism

## Bug

When agents exhaust their `dispatch_attempts` limit (currently 2), they go permanently idle with no mechanism to recover. The reconciler detects the stuck state (emitting ~1,400 invariant violations per hour) and emits dispatch actions, but agents won't retry past their limit. Towns go permanently dead until manual intervention.

## Evidence

Customer town `a7b9c59e` had a convoy running overnight. At ~03:00 UTC on Apr 2, all agents hit `dispatch_attempts: 2` simultaneously (likely a container infrastructure issue). The town has been stuck for **12+ hours**:

- 9 agents idle, 6 with `dispatch_attempts: 2`
- 2 MR reviews `in_progress` assigned to idle refinery
- 2 issues `in_review` waiting on reviews that will never come
- 1 issue not started
- Reconciler: ~700 ticks/hour, ~1,400 violations/hour, ~1,400 actions/hour — all wasted
- Last productive work: 04:13 UTC. Zero work since.

## Root Cause

`dispatch_attempts` is incremented on each failed dispatch but **never reset**. Once an agent reaches the max (2), it's permanently excluded from dispatch. There's no mechanism to:
1. Reset attempts after a cooldown period (e.g., reset to 0 after 30 minutes)
2. Reset attempts when the container comes back healthy (heartbeat received)
3. Reset attempts when the user manually intervenes (settings change, model change)
4. Distinguish between "transient failure" (container briefly unavailable) and "permanent failure" (bad config)

## Fix

### Fix 1 (Critical): Auto-reset dispatch_attempts after a cooldown

When an agent has `dispatch_attempts >= max` and `last_activity_at` is older than a cooldown period (e.g., 30 minutes), reset `dispatch_attempts` to 0. This allows agents to retry after a reasonable backoff.

Add to `reconcileAgents` or `reconcileBeads`:

```ts
// Rule: Reset exhausted agents after cooldown
for (const agent of exhaustedAgents) {
  if (staleMs(agent.last_activity_at, DISPATCH_RESET_COOLDOWN_MS)) {
    actions.push({
      type: 'transition_agent',
      agent_id: agent.bead_id,
      reason: 'dispatch attempts reset after cooldown',
      // Also reset dispatch_attempts to 0
    });
  }
}
```

### Fix 2: Reset attempts on successful heartbeat

When the container sends a heartbeat confirming an agent is running, reset its `dispatch_attempts` to 0. A heartbeat proves the container is functional — there's no reason to keep a stale failure count.

### Fix 3: Increase the max from 2

`dispatch_attempts: 2` is very aggressive — two failures and the agent is permanently dead. Increase to at least 5-10, with exponential backoff between attempts (30s, 1m, 2m, 5m, 10m).

### Fix 4: Reset on container restart

When a container is confirmed restarted (first heartbeat after being `not_found`/`exited`), reset `dispatch_attempts` for ALL agents in that town. The container restart likely fixed whatever caused the dispatch failures.

### Fix 5: Add manual user control to Agent Drawer UI

In the agent drawer UI, next to or near the dispatch attempts count, add a button which allows users to reset the dispatch attempts counter.

## Related

- #1653 — Dispatch circuit breaker (opposite problem: no limit → $100+ burn. This issue: limit too low → permanent stuck)
- #1850 — Platform-wide polecat dispatch failures (the underlying cause of the dispatch failures hitting the limit)

## Files

- `src/dos/town/reconciler.ts` — dispatch attempt checks in `reconcileBeads` and `reconcileReviewQueue`
- `src/dos/town/agents.ts` — `dispatch_attempts` field, increment logic
- `src/dos/town/actions.ts` — `dispatch_agent` action handler (increments attempts)

## Acceptance Criteria

- [ ] Agents auto-reset `dispatch_attempts` to 0 after a configurable cooldown (default: 30 min)
- [ ] Agents reset `dispatch_attempts` on successful heartbeat
- [ ] Agents reset `dispatch_attempts` on container restart detection
- [ ] Max dispatch attempts increased to 5-10 with exponential backoff
- [ ] Towns self-recover from mass dispatch failures within 30-60 minutes without manual intervention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gastown): Agents permanently stuck after exhausting dispatch_attempts — no reset mechanism #1932

Bug

Evidence

Root Cause

Fix

Fix 1 (Critical): Auto-reset dispatch_attempts after a cooldown

Fix 2: Reset attempts on successful heartbeat

Fix 3: Increase the max from 2

Fix 4: Reset on container restart

Fix 5: Add manual user control to Agent Drawer UI

Related

Files

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(gastown): Agents permanently stuck after exhausting dispatch_attempts — no reset mechanism #1932

Description

Bug

Evidence

Root Cause

Fix

Fix 1 (Critical): Auto-reset dispatch_attempts after a cooldown

Fix 2: Reset attempts on successful heartbeat

Fix 3: Increase the max from 2

Fix 4: Reset on container restart

Fix 5: Add manual user control to Agent Drawer UI

Related

Files

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions