Summary
The reconciler's create_landing_mr action has no circuit breaker. When a convoy's branches don't have associated PR URLs, the landing MR creation fails immediately at the system level, the reconciler sees the convoy still needs a landing MR, and retries on the next tick — forever.
Observed in production on town 5f5fda7f-00fe-4656-8493-602a4da9f8dc (org 9d278969-5453-4ae3-a51f-a8d2274a7b56):
| Metric |
Value |
| Convoys stuck |
2 (efc19f22, f8bec2cd) |
| Loop duration |
~42 min (16:57–17:38 UTC, still ongoing) |
| Failed MR beads created |
240 |
| Invariant violations |
257 (mostly "Rig has 2 in_progress MR beads") |
| Cycle rate |
~1 every 10-15s per convoy |
create_landing_mr reconciler actions |
186 in last 42 min |
The Loop
reconciler tick
→ detects convoy needs landing MR
→ emits create_landing_mr action
→ MR bead created ("Review: convoy/fix-.../head")
→ MR bead immediately fails (system-level, "by agent system")
→ reconciler tick
→ detects convoy STILL needs landing MR (old one failed)
→ emits create_landing_mr again
→ repeat forever
The debug endpoint shows sideEffectsAttempted: 0, sideEffectsSucceeded: 0, sideEffectsFailed: 0 — the failure is happening at validation before the Git API call, likely because there's no PR URL on the convoy's source branches.
Root Cause
The reconciler has no guards on the create_landing_mr path:
- No max retry count —
create_landing_mr is emitted every tick as long as the convoy needs one and the last one failed
- No PR URL validation — the action is emitted even when the convoy's branches have no associated PRs
- No cooldown — new MR beads are created before old ones fully transition to failed, causing the "2 in_progress MR beads" invariant violation
- No convoy-level failure escalation — the convoy stays
open forever, never fails or escalates
Suggested Fixes
- Guard
create_landing_mr — before emitting, check that the convoy's completed beads actually have PR URLs. If not, skip and log a warning (or escalate).
- Max landing MR attempts per convoy — track
landing_mr_attempts on the convoy record. After N failures (e.g., 3), fail the convoy and escalate rather than looping.
- Deduplicate MR bead creation — don't create a new MR bead if one for the same convoy is already
open or in_progress.
- Cooldown between attempts — if a landing MR fails, wait at least 60s before retrying (not every 5s tick).
Relevant Code
- Reconciler
create_landing_mr emission: services/gastown/src/dos/town/reconciler.ts (convoy reconciliation section, ~line 1493+)
- Landing MR action handler:
services/gastown/src/dos/town/actions.ts
- Review queue MR creation:
services/gastown/src/dos/town/review-queue.ts (submitToReviewQueue, ~line 110)
- Convoy progress tracking:
services/gastown/src/dos/town/reconciler.ts (reconcileConvoys, ~line 1516)
Related
Summary
The reconciler's
create_landing_mraction has no circuit breaker. When a convoy's branches don't have associated PR URLs, the landing MR creation fails immediately at the system level, the reconciler sees the convoy still needs a landing MR, and retries on the next tick — forever.Observed in production on town
5f5fda7f-00fe-4656-8493-602a4da9f8dc(org9d278969-5453-4ae3-a51f-a8d2274a7b56):efc19f22,f8bec2cd)create_landing_mrreconciler actionsThe Loop
The debug endpoint shows
sideEffectsAttempted: 0, sideEffectsSucceeded: 0, sideEffectsFailed: 0— the failure is happening at validation before the Git API call, likely because there's no PR URL on the convoy's source branches.Root Cause
The reconciler has no guards on the
create_landing_mrpath:create_landing_mris emitted every tick as long as the convoy needs one and the last one failedopenforever, never fails or escalatesSuggested Fixes
create_landing_mr— before emitting, check that the convoy's completed beads actually have PR URLs. If not, skip and log a warning (or escalate).landing_mr_attemptson the convoy record. After N failures (e.g., 3), fail the convoy and escalate rather than looping.openorin_progress.Relevant Code
create_landing_mremission:services/gastown/src/dos/town/reconciler.ts(convoy reconciliation section, ~line 1493+)services/gastown/src/dos/town/actions.tsservices/gastown/src/dos/town/review-queue.ts(submitToReviewQueue, ~line 110)services/gastown/src/dos/town/reconciler.ts(reconcileConvoys, ~line 1516)Related
pr_urlfor non-convoy beads) — same missing guards but different trigger path. The per-bead path goes throughagentDone→submitToReviewQueue→ refinery dispatch → fail → reopen source → re-dispatch polecat → loop.