Bug: Deleting a town doesn't clean up TownDO, TownContainerDO, or AgentDOs — resources leak indefinitely

## Parent

Part of #204 (Phase 3: Multi-Rig + Scaling)

## Problem

Deleting a town from the UI does not clean up the underlying Durable Objects or containers. With only 3 active users and 4 active towns, there are tens of TownDOs still running alarm loops and ~10 extra TownContainerDOs consuming container instances. Deleted towns continue running indefinitely, burning Cloudflare resources.

## Root Causes (6 compounding issues)

### 1. tRPC `deleteTown` doesn't call `TownDO.destroy()` — THE PRIMARY BUG

**File:** `src/trpc/router.ts:156-162`

The UI calls `deleteTown` via tRPC, which calls `GastownUserDO.deleteTown()`. This **only** deletes rows from the `user_towns` and `user_rigs` tables in GastownUserDO's storage. It never touches the TownDO, TownContainerDO, or AgentDOs. The town is removed from the user's list but continues running everything.

The HTTP handler at `src/handlers/towns.handler.ts:128-146` **does** call `townDOStub.destroy()`, but the tRPC path (which the UI uses) doesn't.

### 2. Neither deletion path destroys the TownContainerDO

`TownDO.destroy()` at `Town.do.ts:4044-4058` deletes agents and clears TownDO storage, but **never calls `getTownContainerStub().destroy()`**. The HTTP handler also doesn't destroy the container. No code path stops the container when a town is deleted.

### 3. TownDO alarm unconditionally re-arms

**File:** `Town.do.ts:2638-2640`

```ts
const active = this.hasActiveWork();
const interval = active ? ACTIVE_ALARM_INTERVAL_MS : IDLE_ALARM_INTERVAL_MS;
await this.ctx.storage.setAlarm(Date.now() + interval);
```

There is no exit condition. Even idle towns with no work fire every 60 seconds forever. The alarm never checks if the town has been deleted.

### 4. TownDO alarm keeps the container alive

`ensureContainerReady()` at `Town.do.ts:3775` calls `container.fetch('http://container/health')` on every alarm tick when the town has rigs. Each health check resets the TownContainerDO's `sleepAfter` 30-minute timer. Since the alarm fires every 5-60 seconds, the container **never sleeps** — even for deleted towns that still have rigs in their tables.

### 5. Constructor re-arms alarm on any interaction

`initializeDatabase()` at `Town.do.ts:437-439` calls `armAlarmIfNeeded()`. Even after a successful `destroy()`, if anything touches the DO stub (a stale request, dashboard polling, a late API call), the constructor fires and re-creates the alarm loop from scratch — on a DO with empty tables. This is a resurrection problem.

### 6. Compatibility date too old for `deleteAll()` to clear alarms

`wrangler.jsonc:5` has compatibility date `2026-01-27`. Cloudflare added `deleteAll()` clearing alarms at `2026-02-24`. The code compensates by calling `deleteAlarm()` explicitly in `destroy()`, but this is fragile when `destroy()` is never called (root cause #1).

## Resource Leak Summary

| Resource | tRPC `deleteTown` (UI path) | HTTP DELETE handler |
|---|---|---|
| GastownUserDO rows | Deleted | Deleted |
| TownDO storage | **Never cleared** | Cleared via `destroy()` |
| TownDO alarm | **Never deleted** — fires every 5-60s forever | Deleted, but can resurrect |
| TownContainerDO | **Never stopped** | **Never stopped** |
| AgentDOs | **Never cleaned** | Cleaned via `TownDO.destroy()` |
| Container process | **Runs indefinitely** | **Runs indefinitely** |

## Fix

### Fix 1: tRPC `deleteTown` must call `TownDO.destroy()`

In `src/trpc/router.ts:156-162`, after `verifyTownOwnership`, call `townDOStub.destroy()` before (or alongside) `userStub.deleteTown()`. Match the HTTP handler's behavior.

### Fix 2: `TownDO.destroy()` must destroy ALL sub-DOs and their storage

`TownDO.destroy()` at `Town.do.ts:4044-4058` currently iterates AgentDOs and calls their `destroy()` (which clears their `agent_events` tables and alarms). But it's missing:

**Sub-DOs with independent storage that must be cleaned up:**

| Sub-DO | Keyed by | Has `destroy()`? | Has storage? | Currently cleaned? |
|---|---|---|---|---|
| **AgentDO** | `agentId` | Yes (`Agent.do.ts:113`) | Yes — `rig_agent_events` table (high-volume event stream) | Yes — `destroy()` called in loop |
| **TownContainerDO** | `townId` | Yes (inherited from `Container`) | Yes — Container class internal state | **No — never called** |
| **AgentIdentityDO** | `agentIdentity` | No `destroy()` method | No (stub with `ping()` only) | N/A — no data to clean |

Add to `TownDO.destroy()`:
```ts
// Kill the container process and clear TownContainerDO state
try {
  const containerStub = getTownContainerStub(this.env, this.townId);
  await containerStub.destroy(); // Sends SIGKILL to container process
} catch {
  // Best-effort — container may already be stopped
}
```

When AgentIdentityDO gains real storage (Phase 3, #224), add cleanup for those instances as well.

### Fix 3: TownDO alarm must have an exit condition

Add a `deleted` flag to TownDO storage (or check if the town has zero config/zero rigs). At the top of `alarm()`, check the flag and skip re-arming:

```ts
if (this.isDeleted()) {
  await this.ctx.storage.deleteAlarm();
  return; // Do NOT re-arm
}
```

### Fix 4: Prevent alarm resurrection after destroy

In `initializeDatabase()`, check for a `deleted` sentinel before calling `armAlarmIfNeeded()`. If `destroy()` was called, it should write a sentinel value that survives `deleteAll()` (or use a different mechanism like a flag on the DO class instance).

Alternative: update the compatibility date to `2026-02-24` so `deleteAll()` also clears alarms, making resurrection less likely.

### Fix 5: `ensureContainerReady()` should not ping containers for idle/deleted towns

Only call `ensureContainerReady()` if `hasActiveWork()` returns true. Idle towns don't need a running container. This lets the `sleepAfter` timer expire naturally for towns with no active work.

### Fix 6: Orphan sweep (safety net)

Add a mechanism to detect and clean up orphaned DOs:
- A Cron Trigger or admin endpoint that lists all active towns (from all GastownUserDOs) and compares against TownDOs that still have active alarms
- Any TownDO not referenced by a GastownUserDO and with no active beads gets `destroy()` called
- This is the catch-all for any leaks the primary fixes miss

## Acceptance Criteria

- [ ] tRPC `deleteTown` calls `TownDO.destroy()` and `TownContainerDO.destroy()`
- [ ] `TownDO.destroy()` destroys the TownContainerDO (kills container process + clears state) and all AgentDOs (clears `rig_agent_events` tables)
- [ ] All sub-DO storage is cleared: AgentDO event tables, TownContainerDO internal state, TownDO SQLite (beads, agents, rigs, events, config, convoy_metadata, review_metadata, bead_dependencies — everything)
- [ ] TownDO alarm does not re-arm after deletion (exit condition + resurrection prevention)
- [ ] `ensureContainerReady()` only pings containers for towns with active work
- [ ] Deleting a town from the UI results in: alarm stopped, container killed, storage cleared
- [ ] Orphan sweep mechanism for catching leaked DOs
- [ ] Compatibility date updated to `2026-02-24` or later

## Notes

- This is actively burning Cloudflare resources in production right now
- The fix for the primary bug (#1) is a one-line change — add `townDOStub.destroy()` to the tRPC mutation. The other fixes are defense-in-depth.
- Cloudflare DOs cannot be truly "deleted" — they exist forever once created. But we can ensure they stop consuming resources by clearing storage, deleting alarms, and preventing resurrection.
- After fixing, we should manually clean up the existing orphaned DOs. An admin script that calls `destroy()` on each known orphaned TownDO/TownContainerDO would suffice.



Sub-DO	Keyed by	Has `destroy()`?	Has storage?	Currently cleaned?
AgentDO	`agentId`	Yes (`Agent.do.ts:113`)	Yes — `rig_agent_events` table (high-volume event stream)	Yes — `destroy()` called in loop
TownContainerDO	`townId`	Yes (inherited from `Container`)	Yes — Container class internal state	No — never called
AgentIdentityDO	`agentIdentity`	No `destroy()` method	No (stub with `ping()` only)	N/A — no data to clean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Deleting a town doesn't clean up TownDO, TownContainerDO, or AgentDOs — resources leak indefinitely #1182

Parent

Problem

Root Causes (6 compounding issues)

1. tRPC `deleteTown` doesn't call `TownDO.destroy()` — THE PRIMARY BUG

2. Neither deletion path destroys the TownContainerDO

3. TownDO alarm unconditionally re-arms

4. TownDO alarm keeps the container alive

5. Constructor re-arms alarm on any interaction

6. Compatibility date too old for `deleteAll()` to clear alarms

Resource Leak Summary

Fix

Fix 1: tRPC `deleteTown` must call `TownDO.destroy()`

Fix 2: `TownDO.destroy()` must destroy ALL sub-DOs and their storage

Fix 3: TownDO alarm must have an exit condition

Fix 4: Prevent alarm resurrection after destroy

Fix 5: `ensureContainerReady()` should not ping containers for idle/deleted towns

Fix 6: Orphan sweep (safety net)

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Resource	tRPC `deleteTown` (UI path)	HTTP DELETE handler
GastownUserDO rows	Deleted	Deleted
TownDO storage	Never cleared	Cleared via `destroy()`
TownDO alarm	Never deleted — fires every 5-60s forever	Deleted, but can resurrect
TownContainerDO	Never stopped	Never stopped
AgentDOs	Never cleaned	Cleaned via `TownDO.destroy()`
Container process	Runs indefinitely	Runs indefinitely

Bug: Deleting a town doesn't clean up TownDO, TownContainerDO, or AgentDOs — resources leak indefinitely #1182

Description

Parent

Problem

Root Causes (6 compounding issues)

1. tRPC deleteTown doesn't call TownDO.destroy() — THE PRIMARY BUG

2. Neither deletion path destroys the TownContainerDO

3. TownDO alarm unconditionally re-arms

4. TownDO alarm keeps the container alive

5. Constructor re-arms alarm on any interaction

6. Compatibility date too old for deleteAll() to clear alarms

Resource Leak Summary

Fix

Fix 1: tRPC deleteTown must call TownDO.destroy()

Fix 2: TownDO.destroy() must destroy ALL sub-DOs and their storage

Fix 3: TownDO alarm must have an exit condition

Fix 4: Prevent alarm resurrection after destroy

Fix 5: ensureContainerReady() should not ping containers for idle/deleted towns

Fix 6: Orphan sweep (safety net)

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. tRPC `deleteTown` doesn't call `TownDO.destroy()` — THE PRIMARY BUG

6. Compatibility date too old for `deleteAll()` to clear alarms

Fix 1: tRPC `deleteTown` must call `TownDO.destroy()`

Fix 2: `TownDO.destroy()` must destroy ALL sub-DOs and their storage

Fix 5: `ensureContainerReady()` should not ping containers for idle/deleted towns