Skip to content

[Gastown] PR 21: Edge Case Handling #227

@jrf0110

Description

@jrf0110

Parent: #204 | Phase 4: Hardening

Revised: Edge cases updated for container-per-town model (container OOM, ephemeral disk, process-level isolation).

Goal

Handle edge cases and failure modes gracefully.

Edge Cases

  • Split-brain: Two processes for the same agent (race on restart) → Rig DO enforces single-writer per agent, container checks DO state before starting
  • Concurrent writes to same bead: SQLite serialization in DO handles this, but add optimistic locking for cross-DO operations
  • DO eviction during alarm: Alarms are durable and will re-fire
  • Container OOM: Kills all agents. DO alarms detect dead agents, new container starts, agents re-dispatched from DO state
  • Container sleep during active work: Agents must have pushed to remote. DO re-dispatches on wake. Checkpoint data in DO enables resumption
  • Gateway outage: Agent retries built into Kilo CLI; escalation if persistent
  • Partial agentDone: What if the polecat pushed the branch but the gt_done call failed? Checkpoint-based recovery
  • Duplicate mail delivery: Idempotency on mail delivery marking
  • Convoy with failed beads: Policy for partial convoy completion
  • Git worktree conflicts: Two agents accidentally assigned same branch → Rig DO enforces unique branch per agent

Dependencies

  • PR 5 (Rig DO Alarm — witness patrol)
  • PR 10 (Multiple Polecats)

Acceptance Criteria

  • Single-writer enforcement per agent (reject duplicate dispatch)
  • Container OOM recovery flow tested (DO re-dispatches all agents)
  • Optimistic locking for cross-DO operations
  • Checkpoint-based recovery for partial done flows
  • Idempotent mail delivery
  • Convoy partial completion policy implemented
  • All edge cases documented with test coverage

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Backlog / futuregt:coreReconciler, state machine, bead lifecycle, convoy flow

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions