Skip to content

bug(gastown): create_landing_mr loops infinitely when convoy branches have no PR URL #2260

@jrf0110

Description

@jrf0110

Summary

The reconciler's create_landing_mr action has no circuit breaker. When a convoy's branches don't have associated PR URLs, the landing MR creation fails immediately at the system level, the reconciler sees the convoy still needs a landing MR, and retries on the next tick — forever.

Observed in production on town 5f5fda7f-00fe-4656-8493-602a4da9f8dc (org 9d278969-5453-4ae3-a51f-a8d2274a7b56):

Metric Value
Convoys stuck 2 (efc19f22, f8bec2cd)
Loop duration ~42 min (16:57–17:38 UTC, still ongoing)
Failed MR beads created 240
Invariant violations 257 (mostly "Rig has 2 in_progress MR beads")
Cycle rate ~1 every 10-15s per convoy
create_landing_mr reconciler actions 186 in last 42 min

The Loop

reconciler tick
  → detects convoy needs landing MR
  → emits create_landing_mr action
  → MR bead created ("Review: convoy/fix-.../head")
  → MR bead immediately fails (system-level, "by agent system")
  → reconciler tick
  → detects convoy STILL needs landing MR (old one failed)
  → emits create_landing_mr again
  → repeat forever

The debug endpoint shows sideEffectsAttempted: 0, sideEffectsSucceeded: 0, sideEffectsFailed: 0 — the failure is happening at validation before the Git API call, likely because there's no PR URL on the convoy's source branches.

Root Cause

The reconciler has no guards on the create_landing_mr path:

  1. No max retry countcreate_landing_mr is emitted every tick as long as the convoy needs one and the last one failed
  2. No PR URL validation — the action is emitted even when the convoy's branches have no associated PRs
  3. No cooldown — new MR beads are created before old ones fully transition to failed, causing the "2 in_progress MR beads" invariant violation
  4. No convoy-level failure escalation — the convoy stays open forever, never fails or escalates

Suggested Fixes

  1. Guard create_landing_mr — before emitting, check that the convoy's completed beads actually have PR URLs. If not, skip and log a warning (or escalate).
  2. Max landing MR attempts per convoy — track landing_mr_attempts on the convoy record. After N failures (e.g., 3), fail the convoy and escalate rather than looping.
  3. Deduplicate MR bead creation — don't create a new MR bead if one for the same convoy is already open or in_progress.
  4. Cooldown between attempts — if a landing MR fails, wait at least 60s before retrying (not every 5s tick).

Relevant Code

  • Reconciler create_landing_mr emission: services/gastown/src/dos/town/reconciler.ts (convoy reconciliation section, ~line 1493+)
  • Landing MR action handler: services/gastown/src/dos/town/actions.ts
  • Review queue MR creation: services/gastown/src/dos/town/review-queue.ts (submitToReviewQueue, ~line 110)
  • Convoy progress tracking: services/gastown/src/dos/town/reconciler.ts (reconcileConvoys, ~line 1516)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Blocks soft launchbugSomething isn't workinggt:coreReconciler, state machine, bead lifecycle, convoy flow

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions