spec 015: router job dispatch failure recovery by zbigniewsobiecki · Pull Request #1203 · mongrel-intelligence/cascade

zbigniewsobiecki · 2026-04-26T16:51:59Z

Summary

Closes the silent black-hole failure mode verified live in prod on 2026-04-26 (ucho/MNG-350): a transient capacity miss or Docker spawn error during worker dispatch turned a webhook-driven job into a permanently failed BullMQ entry while stranding the in-memory work-item lock for ~30 minutes — silently rejecting every subsequent webhook for the same (projectId, workItemId, agentType) trio.

Two plans landed under spec 015 with safety-net-first sequencing:

Plan 1 (f3139a50) — hooks BullMQ's worker.on('failed') event on both queues to release the work-item lock + agent-type counter + recently-dispatched dedup mark via a new releaseLocksForFailedJob compensator. Splits the webhook decision-reason vocabulary into three states: Job queued / Awaiting worker slot: ... / Work item locked (no active dispatch): ... (the last is a Sentry-tagged wedged_lock_canary regression invariant).
Plan 2 (f0ac965f) — replaces guardedSpawn's synchronous 'No worker slots available' throw with an in-process slot-waiter (default 5min timeout, configurable via SLOT_WAIT_TIMEOUT_MS). Adds a dispatch-error classifier (transient: ECONNREFUSED/ECONNRESET/ENOTFOUND/HTTP 429/HTTP 409/SLOT_WAIT_TIMEOUT/unknown; terminal: TypeError/ZodError/image-not-found-after-fallback). Both cascade-jobs and cascade-dashboard-jobs now use attempts: 4 + backoff: { type: 'exponential', delay: 5000 } (~75s before exhaustion). Terminal errors wrap in BullMQ's UnrecoverableError to skip retries.

After this, a transient capacity miss or Docker hiccup is invisible to the user (the agent runs once a slot frees, with no further drag-and-drop); a sustained Docker outage exhausts the retry budget within ~90s and surfaces a failed run + Sentry capture; no dispatch failure path can strand the locks.

Doc impact: CLAUDE.md gets a new "Dispatch failure semantics" section documenting the unified contract (file 169 → 182 lines, under the 200 cap). CHANGELOG.md gets two entries under Unreleased — one per plan.

Out of scope (per spec): worker-pool autoscaling, BullMQ migration, work-item-lock semantics changes, snapshot fallback redesign, manual-run lock-bypass changes, failed-set inspection UI, startup re-enqueue sweep. Cleanup of the existing dead linear-1777217350854-2qvhjo job in prod Redis is operational and handled out-of-band.

Test plan

CI green: `npm run lint`, `npm run typecheck`, `npm test`, `npm run test:integration` all pass
Spot-check that 8539 unit tests + 8 router integration tests run green (already verified locally pre-PR)
Inspect the new `Dispatch failure semantics` section of `CLAUDE.md` for accuracy
Verify that the existing `tests/unit/router/worker-manager.test.ts` no longer contains the `processFn throws when at capacity` assertion (replaced, not deleted, per spec AC fix: improve PostgreSQL startup reliability and agent documentation #9)
After merge: `docker exec redis redis-cli ZREM bull:cascade-jobs:failed linear-1777217350854-2qvhjo` to clean up the existing dead prod job (operational, not part of the merge gate)
After merge: monitor production webhook logs for any `Work item locked (no active dispatch)` decision reason or Sentry capture tagged `wedged_lock_canary` — if it fires under normal operation, that's a regression in the compensator
After merge: drag a Linear card to Todo on a busy project to confirm the `Awaiting worker slot: ...` decision reason appears in webhook logs (the old `Work item locked: ...` wording for in-flight dispatches is gone)

Files changed

New modules (5): `dispatch-compensator.ts`, `lock-state-classifier.ts`, `slot-waiter.ts`, `dispatch-error-classifier.ts`, plus an exported predicate `isImageNotFoundError` from container-manager
Modified (8): `active-workers.ts`, `agent-type-lock.ts`, `bullmq-workers.ts`, `config.ts`, `container-manager.ts`, `queue.ts`, `webhook-processor.ts`, `worker-manager.ts`, `queue/client.ts`
New tests (5): one unit test per new module + 2 module-integration tests under `tests/integration/router/`
Extended tests (5): existing router-area unit tests gain new assertions for the new contracts
Docs: `CLAUDE.md`, `CHANGELOG.md`

🤖 Generated with Claude Code

…covery Spec captures the silent black-hole bug class verified live on 2026-04-26 (ucho/MNG-350): a transient capacity miss or Docker error during worker spawn turns a webhook-driven job into a permanently failed BullMQ entry while stranding the work-item / agent-type locks for up to 30 minutes, silently rejecting subsequent webhooks for the same work item. Decomposed into two plans with safety-net-first sequencing: plan 1 hooks the BullMQ failed event to release locks on every dispatch failure path; plan 2 replaces the throw-on-capacity with a wait-for-slot semaphore, adds bounded retry with exponential backoff, and a transient/terminal error classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the stranded-lock half of spec 015's bug class verified live in prod on 2026-04-26 (ucho/MNG-350). When a webhook-driven job's dispatch fails — capacity throw, Docker spawn error, or any future throw site — the work-item lock, agent-type concurrency counter, and recently-dispatched dedup mark established by the webhook → enqueue path are now released by a compensator hooked to BullMQ's `worker.on('failed')` event. What landed: - `src/router/dispatch-compensator.ts` (new) — `releaseLocksForFailedJob` wraps `extractProjectIdFromJob` / `extractWorkItemId` / `extractAgentType` and calls into `clearWorkItemEnqueued` / `clearAgentTypeEnqueued` / `clearRecentlyDispatched`. Never propagates errors; captures to Sentry with `tags: { source: 'dispatch_compensator' }`. - `src/router/agent-type-lock.ts` — exports new `clearRecentlyDispatched` for the compensator. The existing `markRecentlyDispatched` semantics are unchanged (60s TTL, NOT cleared on completion); this helper exists solely so a permanently-failed dispatch doesn't keep deduping a fresh webhook for ~60s while the user retries. - `src/router/bullmq-workers.ts` — extends the existing `worker.on('failed')` handler to invoke `releaseLocksForFailedJob` alongside the existing logger + Sentry calls. Wraps the call in a defensive `.catch` so a future regression in the compensator can't poison the worker. - `src/router/lock-state-classifier.ts` (new) — `classifyLockState` returns `'awaiting-slot'` when an active worker or queued/waiting job matches the trio, `'wedged'` when neither correlation matches. Defaults to `'awaiting-slot'` on classifier error so a Redis blip doesn't mis-emit the wedged canary. - `src/router/active-workers.ts` — `getActiveWorkers()` now exposes `(projectId, workItemId, agentType)` so the classifier can correlate. Backwards-compatible (existing callers work unchanged; new fields are additive optional). - `src/router/webhook-processor.ts` — Step 8 (work-item lock check) now splits the decision-reason vocabulary into three states: * `Job queued: ...` (success path) * `Awaiting worker slot: ...` (lock held + dispatch in flight; healthy) * `Work item locked (no active dispatch): ...` (wedged-lock canary) The wedged branch additionally fires `captureException` with `tags: { source: 'wedged_lock_canary' }` so any regression in compensation is loud in production. What this does NOT change (intentional, all in plan 015/2): - `guardedSpawn` still throws on capacity (BullMQ marks the job failed, the compensator now releases the locks, but the job itself is still lost). Plan 2 replaces the throw with a wait-for-slot semaphore. - Both queues still default to `attempts: 1`. Plan 2 raises this with exponential backoff and adds a transient/terminal error classifier. - CLAUDE.md is intentionally not updated by this plan — the unified passage describing both halves of the new contract lands in plan 015/2. Tests: - 5 new unit tests in `dispatch-compensator.test.ts` - 3 new unit tests in `agent-type-lock.test.ts` for `clearRecentlyDispatched` - 4 new unit tests in `bullmq-workers.test.ts` for the failed-event seam - 5 new unit tests in `lock-state-classifier.test.ts` - 2 new unit tests in `active-workers.test.ts` for the extended shape - 4 new unit tests in `webhook-processor.test.ts` for the three-way taxonomy - 3 new module-integration tests in `tests/integration/router/dispatch-failure-compensation.test.ts` exercise the real lock modules + real bullmq-workers.ts failed-event handler + real compensator end-to-end (only BullMQ's Worker constructor + the worker-env extractors are mocked). Full suite: 8515 passed / 23 skipped / 0 failed. Lint + typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…fier Closes the lost-job half of spec 015's bug class. Combined with plan 015/1, the silent black-hole failure mode verified live in prod on 2026-04-26 (ucho/MNG-350) is now fully closed. What landed: - `src/router/slot-waiter.ts` (new) — semaphore-style primitive: `acquireSlot({ timeoutMs })` resolves immediately when capacity is below `routerConfig.maxWorkers`, otherwise queues a FIFO waiter with a bounded timeout that rejects with `code: 'SLOT_WAIT_TIMEOUT'`. `slotReleased()` pops the head waiter; `clearAllWaiters()` rejects every pending waiter with `code: 'SHUTDOWN'` on router stop. - `src/router/dispatch-error-classifier.ts` (new) — classifies thrown errors into `'transient'` (Docker socket Node codes, HTTP 429/409, SLOT_WAIT_TIMEOUT, anything unknown — default-to-retry) vs `'terminal'` (TypeError, ZodError, image-not-found-after-fallback). - `src/router/worker-manager.ts` — `guardedSpawn` rewritten: `await acquireSlot(...)` replaces the synchronous capacity throw; on spawn error, terminal errors are wrapped in BullMQ's `UnrecoverableError` so retries skip; transient errors propagate unchanged so BullMQ retries via attempts/backoff. - `src/router/active-workers.ts` — `cleanupWorker` now calls `slotReleased()` exactly once per cleanup, including on the crash path. The existing `if (worker)` guard ensures idempotence. - `src/router/config.ts` — new `slotWaitTimeoutMs` field (default 5min, configurable via `SLOT_WAIT_TIMEOUT_MS`). - `src/router/queue.ts` and `src/queue/client.ts` — both queues now default to `attempts: 4` with `backoff: { type: 'exponential', delay: 5000 }` (~75s total before exhaustion). Terminal errors bypass via `UnrecoverableError`. - `src/router/container-manager.ts` — exports the existing `isImageNotFoundError` predicate so the classifier can reuse it. Test contract change (spec AC #9): The previous `tests/unit/router/worker-manager.test.ts:179` assertion `'processFn throws when at capacity'` is REPLACED (not deleted) with `'processFn awaits a slot when at capacity, then dispatches when one frees'`. The throw-on-capacity contract is gone forever. Tests: - 7 new unit tests in `slot-waiter.test.ts` (FIFO, timeout, no-op, shutdown rejection) - 11 new unit tests in `dispatch-error-classifier.test.ts` covering every transient/terminal class - 4 new unit tests in `worker-manager.test.ts` (replaced original capacity-throw test + 3 for retry classification) - 3 new unit tests in `active-workers.test.ts` for slotReleased integration - 5 new module-integration tests in `dispatch-retry.test.ts` exercise REAL guardedSpawn + REAL slot-waiter + REAL dispatch-error-classifier against both queues, mocking only spawnWorker + BullMQ Worker constructor. Plan 1's 3 module-integration tests continue to pass alongside plan 2's 5. Full unit suite: 8539 passed / 23 skipped / 0 failed. CLAUDE.md updated with a new "Dispatch failure semantics" section documenting the unified contract (capacity wait, retry budget, classifier, three-way decision-reason taxonomy from plan 1, wedged-lock canary). File now 182 lines, under the 200-line cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ns complete Closes the silent black-hole bug class verified live on 2026-04-26 (ucho/MNG-350). Plan 1 added failed-event lock compensation + three-way decision-reason taxonomy; plan 2 replaced the throw-on-capacity with wait-for-slot, added bounded retry with exponential backoff, and introduced a transient/terminal error classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-04-26T16:56:26Z

Codecov Report

❌ Patch coverage is 95.36082% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/router/lock-state-classifier.ts	93.54%	1 Missing and 1 partial ⚠️
src/router/queue.ts	0.00%	2 Missing ⚠️
src/router/bullmq-workers.ts	93.33%	1 Missing ⚠️
src/router/dispatch-compensator.ts	96.29%	1 Missing ⚠️
src/router/slot-waiter.ts	97.61%	1 Missing ⚠️
src/router/webhook-processor.ts	96.66%	1 Missing ⚠️
src/router/worker-manager.ts	91.66%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

nhopeatall

LGTM

🕵️ claude-code · claude-opus-4-6 · run details

* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents (#1201) * fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents * fix(triggers): use case-insensitive JIRA status comparison in isInPlanningStatus Match the established pattern from status-changed.ts and label-added.ts which both use .toLowerCase() for JIRA status comparisons, since status names are user-configurable and the API does not guarantee consistent casing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cascade Bot <bot@cascade.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(linear): populate inlineMedia from descriptions/comments and add downloadAttachment (#1202) Co-authored-by: Cascade Bot <bot@cascade.dev> * fix(claude-code): pin pathToClaudeCodeExecutable so SDK skips broken native-binary probe (#1206) The agent-harness SDK bump in #1197 (claude-agent-sdk 0.2.91 → 0.2.119) broke every review run on cascade-prod with: ReferenceError: Claude Code native binary not found at /app/node_modules/@anthropic-ai/claude-agent-sdk-linux-x64-musl/claude The new SDK probes its own platform-specific optional-dependency subpackages for a bundled `claude` binary. Two failure modes hit at once: 1. Cascade installs `@anthropic-ai/claude-code@2.1.119` globally at /usr/local/bin/claude — the SDK never looks there. 2. The SDK probes the `-musl` variant first regardless of host libc and errors on ENOENT instead of falling through to the glibc variant. Pass an explicit `pathToClaudeCodeExecutable` to short-circuit the probe. The resolver checks (in order): - $CLAUDE_CODE_EXECUTABLE_PATH env override (local-dev escape hatch) - `which claude` in $PATH - /usr/local/bin/claude (Docker default from Dockerfile.worker) Two TDD tests pin the option onto query() and prove the env override wins. No Dockerfile change needed; the existing global install at /usr/local/bin/claude becomes the resolver's runtime target. Confirmed broken on ucho PR #72 (cascade-prod review agent crash). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * spec 015: router job dispatch failure recovery (#1203) * docs(spec/plans): add spec 015 + plans for router dispatch failure recovery Spec captures the silent black-hole bug class verified live on 2026-04-26 (ucho/MNG-350): a transient capacity miss or Docker error during worker spawn turns a webhook-driven job into a permanently failed BullMQ entry while stranding the work-item / agent-type locks for up to 30 minutes, silently rejecting subsequent webhooks for the same work item. Decomposed into two plans with safety-net-first sequencing: plan 1 hooks the BullMQ failed event to release locks on every dispatch failure path; plan 2 replaces the throw-on-capacity with a wait-for-slot semaphore, adds bounded retry with exponential backoff, and a transient/terminal error classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 015/1 failed-event-lock-compensation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(router): plan 015/1 done — release locks on dispatch failure Closes the stranded-lock half of spec 015's bug class verified live in prod on 2026-04-26 (ucho/MNG-350). When a webhook-driven job's dispatch fails — capacity throw, Docker spawn error, or any future throw site — the work-item lock, agent-type concurrency counter, and recently-dispatched dedup mark established by the webhook → enqueue path are now released by a compensator hooked to BullMQ's `worker.on('failed')` event. What landed: - `src/router/dispatch-compensator.ts` (new) — `releaseLocksForFailedJob` wraps `extractProjectIdFromJob` / `extractWorkItemId` / `extractAgentType` and calls into `clearWorkItemEnqueued` / `clearAgentTypeEnqueued` / `clearRecentlyDispatched`. Never propagates errors; captures to Sentry with `tags: { source: 'dispatch_compensator' }`. - `src/router/agent-type-lock.ts` — exports new `clearRecentlyDispatched` for the compensator. The existing `markRecentlyDispatched` semantics are unchanged (60s TTL, NOT cleared on completion); this helper exists solely so a permanently-failed dispatch doesn't keep deduping a fresh webhook for ~60s while the user retries. - `src/router/bullmq-workers.ts` — extends the existing `worker.on('failed')` handler to invoke `releaseLocksForFailedJob` alongside the existing logger + Sentry calls. Wraps the call in a defensive `.catch` so a future regression in the compensator can't poison the worker. - `src/router/lock-state-classifier.ts` (new) — `classifyLockState` returns `'awaiting-slot'` when an active worker or queued/waiting job matches the trio, `'wedged'` when neither correlation matches. Defaults to `'awaiting-slot'` on classifier error so a Redis blip doesn't mis-emit the wedged canary. - `src/router/active-workers.ts` — `getActiveWorkers()` now exposes `(projectId, workItemId, agentType)` so the classifier can correlate. Backwards-compatible (existing callers work unchanged; new fields are additive optional). - `src/router/webhook-processor.ts` — Step 8 (work-item lock check) now splits the decision-reason vocabulary into three states: * `Job queued: ...` (success path) * `Awaiting worker slot: ...` (lock held + dispatch in flight; healthy) * `Work item locked (no active dispatch): ...` (wedged-lock canary) The wedged branch additionally fires `captureException` with `tags: { source: 'wedged_lock_canary' }` so any regression in compensation is loud in production. What this does NOT change (intentional, all in plan 015/2): - `guardedSpawn` still throws on capacity (BullMQ marks the job failed, the compensator now releases the locks, but the job itself is still lost). Plan 2 replaces the throw with a wait-for-slot semaphore. - Both queues still default to `attempts: 1`. Plan 2 raises this with exponential backoff and adds a transient/terminal error classifier. - CLAUDE.md is intentionally not updated by this plan — the unified passage describing both halves of the new contract lands in plan 015/2. Tests: - 5 new unit tests in `dispatch-compensator.test.ts` - 3 new unit tests in `agent-type-lock.test.ts` for `clearRecentlyDispatched` - 4 new unit tests in `bullmq-workers.test.ts` for the failed-event seam - 5 new unit tests in `lock-state-classifier.test.ts` - 2 new unit tests in `active-workers.test.ts` for the extended shape - 4 new unit tests in `webhook-processor.test.ts` for the three-way taxonomy - 3 new module-integration tests in `tests/integration/router/dispatch-failure-compensation.test.ts` exercise the real lock modules + real bullmq-workers.ts failed-event handler + real compensator end-to-end (only BullMQ's Worker constructor + the worker-env extractors are mocked). Full suite: 8515 passed / 23 skipped / 0 failed. Lint + typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 015/2 wait-for-slot-and-retry-classifier Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): mark 015/2 status: wip * feat(router): plan 015/2 done — wait-for-slot + retry budget + classifier Closes the lost-job half of spec 015's bug class. Combined with plan 015/1, the silent black-hole failure mode verified live in prod on 2026-04-26 (ucho/MNG-350) is now fully closed. What landed: - `src/router/slot-waiter.ts` (new) — semaphore-style primitive: `acquireSlot({ timeoutMs })` resolves immediately when capacity is below `routerConfig.maxWorkers`, otherwise queues a FIFO waiter with a bounded timeout that rejects with `code: 'SLOT_WAIT_TIMEOUT'`. `slotReleased()` pops the head waiter; `clearAllWaiters()` rejects every pending waiter with `code: 'SHUTDOWN'` on router stop. - `src/router/dispatch-error-classifier.ts` (new) — classifies thrown errors into `'transient'` (Docker socket Node codes, HTTP 429/409, SLOT_WAIT_TIMEOUT, anything unknown — default-to-retry) vs `'terminal'` (TypeError, ZodError, image-not-found-after-fallback). - `src/router/worker-manager.ts` — `guardedSpawn` rewritten: `await acquireSlot(...)` replaces the synchronous capacity throw; on spawn error, terminal errors are wrapped in BullMQ's `UnrecoverableError` so retries skip; transient errors propagate unchanged so BullMQ retries via attempts/backoff. - `src/router/active-workers.ts` — `cleanupWorker` now calls `slotReleased()` exactly once per cleanup, including on the crash path. The existing `if (worker)` guard ensures idempotence. - `src/router/config.ts` — new `slotWaitTimeoutMs` field (default 5min, configurable via `SLOT_WAIT_TIMEOUT_MS`). - `src/router/queue.ts` and `src/queue/client.ts` — both queues now default to `attempts: 4` with `backoff: { type: 'exponential', delay: 5000 }` (~75s total before exhaustion). Terminal errors bypass via `UnrecoverableError`. - `src/router/container-manager.ts` — exports the existing `isImageNotFoundError` predicate so the classifier can reuse it. Test contract change (spec AC #9): The previous `tests/unit/router/worker-manager.test.ts:179` assertion `'processFn throws when at capacity'` is REPLACED (not deleted) with `'processFn awaits a slot when at capacity, then dispatches when one frees'`. The throw-on-capacity contract is gone forever. Tests: - 7 new unit tests in `slot-waiter.test.ts` (FIFO, timeout, no-op, shutdown rejection) - 11 new unit tests in `dispatch-error-classifier.test.ts` covering every transient/terminal class - 4 new unit tests in `worker-manager.test.ts` (replaced original capacity-throw test + 3 for retry classification) - 3 new unit tests in `active-workers.test.ts` for slotReleased integration - 5 new module-integration tests in `dispatch-retry.test.ts` exercise REAL guardedSpawn + REAL slot-waiter + REAL dispatch-error-classifier against both queues, mocking only spawnWorker + BullMQ Worker constructor. Plan 1's 3 module-integration tests continue to pass alongside plan 2's 5. Full unit suite: 8539 passed / 23 skipped / 0 failed. CLAUDE.md updated with a new "Dispatch failure semantics" section documenting the unified contract (capacity wait, retry budget, classifier, three-way decision-reason taxonomy from plan 1, wedged-lock canary). File now 182 lines, under the 200-line cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(spec): 015 done — router job dispatch failure recovery, all plans complete Closes the silent black-hole bug class verified live on 2026-04-26 (ucho/MNG-350). Plan 1 added failed-event lock compensation + three-way decision-reason taxonomy; plan 2 replaced the throw-on-capacity with wait-for-slot, added bounded retry with exponential backoff, and introduced a transient/terminal error classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump postcss from 8.5.8 to 8.5.12 (#1204) Bumps [postcss](https://github.com/postcss/postcss) from 8.5.8 to 8.5.12. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](postcss/postcss@8.5.8...8.5.12) --- updated-dependencies: - dependency-name: postcss dependency-version: 8.5.12 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump uuid, bullmq and dockerode (#1192) Removes [uuid](https://github.com/uuidjs/uuid). It's no longer used after updating ancestor dependencies [uuid](https://github.com/uuidjs/uuid), [bullmq](https://github.com/taskforcesh/bullmq) and [dockerode](https://github.com/apocas/dockerode). These dependencies need to be updated together. Removes `uuid` Updates `bullmq` from 5.72.0 to 5.76.2 - [Release notes](https://github.com/taskforcesh/bullmq/releases) - [Commits](taskforcesh/bullmq@v5.72.0...v5.76.2) Updates `dockerode` from 4.0.10 to 5.0.0 - [Release notes](https://github.com/apocas/dockerode/releases) - [Commits](apocas/dockerode@v4.0.10...v5.0.0) --- updated-dependencies: - dependency-name: uuid dependency-version: dependency-type: indirect - dependency-name: bullmq dependency-version: 5.76.2 dependency-type: direct:production - dependency-name: dockerode dependency-version: 5.0.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: aaight <aaight42@gmail.com> Co-authored-by: Cascade Bot <bot@cascade.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents (#1201) * fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents * fix(triggers): use case-insensitive JIRA status comparison in isInPlanningStatus Match the established pattern from status-changed.ts and label-added.ts which both use .toLowerCase() for JIRA status comparisons, since status names are user-configurable and the API does not guarantee consistent casing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cascade Bot <bot@cascade.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(linear): populate inlineMedia from descriptions/comments and add downloadAttachment (#1202) Co-authored-by: Cascade Bot <bot@cascade.dev> * fix(claude-code): pin pathToClaudeCodeExecutable so SDK skips broken native-binary probe (#1206) The agent-harness SDK bump in #1197 (claude-agent-sdk 0.2.91 → 0.2.119) broke every review run on cascade-prod with: ReferenceError: Claude Code native binary not found at /app/node_modules/@anthropic-ai/claude-agent-sdk-linux-x64-musl/claude The new SDK probes its own platform-specific optional-dependency subpackages for a bundled `claude` binary. Two failure modes hit at once: 1. Cascade installs `@anthropic-ai/claude-code@2.1.119` globally at /usr/local/bin/claude — the SDK never looks there. 2. The SDK probes the `-musl` variant first regardless of host libc and errors on ENOENT instead of falling through to the glibc variant. Pass an explicit `pathToClaudeCodeExecutable` to short-circuit the probe. The resolver checks (in order): - $CLAUDE_CODE_EXECUTABLE_PATH env override (local-dev escape hatch) - `which claude` in $PATH - /usr/local/bin/claude (Docker default from Dockerfile.worker) Two TDD tests pin the option onto query() and prove the env override wins. No Dockerfile change needed; the existing global install at /usr/local/bin/claude becomes the resolver's runtime target. Confirmed broken on ucho PR #72 (cascade-prod review agent crash). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * spec 015: router job dispatch failure recovery (#1203) * docs(spec/plans): add spec 015 + plans for router dispatch failure recovery Spec captures the silent black-hole bug class verified live on 2026-04-26 (ucho/MNG-350): a transient capacity miss or Docker error during worker spawn turns a webhook-driven job into a permanently failed BullMQ entry while stranding the work-item / agent-type locks for up to 30 minutes, silently rejecting subsequent webhooks for the same work item. Decomposed into two plans with safety-net-first sequencing: plan 1 hooks the BullMQ failed event to release locks on every dispatch failure path; plan 2 replaces the throw-on-capacity with a wait-for-slot semaphore, adds bounded retry with exponential backoff, and a transient/terminal error classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 015/1 failed-event-lock-compensation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(router): plan 015/1 done — release locks on dispatch failure Closes the stranded-lock half of spec 015's bug class verified live in prod on 2026-04-26 (ucho/MNG-350). When a webhook-driven job's dispatch fails — capacity throw, Docker spawn error, or any future throw site — the work-item lock, agent-type concurrency counter, and recently-dispatched dedup mark established by the webhook → enqueue path are now released by a compensator hooked to BullMQ's `worker.on('failed')` event. What landed: - `src/router/dispatch-compensator.ts` (new) — `releaseLocksForFailedJob` wraps `extractProjectIdFromJob` / `extractWorkItemId` / `extractAgentType` and calls into `clearWorkItemEnqueued` / `clearAgentTypeEnqueued` / `clearRecentlyDispatched`. Never propagates errors; captures to Sentry with `tags: { source: 'dispatch_compensator' }`. - `src/router/agent-type-lock.ts` — exports new `clearRecentlyDispatched` for the compensator. The existing `markRecentlyDispatched` semantics are unchanged (60s TTL, NOT cleared on completion); this helper exists solely so a permanently-failed dispatch doesn't keep deduping a fresh webhook for ~60s while the user retries. - `src/router/bullmq-workers.ts` — extends the existing `worker.on('failed')` handler to invoke `releaseLocksForFailedJob` alongside the existing logger + Sentry calls. Wraps the call in a defensive `.catch` so a future regression in the compensator can't poison the worker. - `src/router/lock-state-classifier.ts` (new) — `classifyLockState` returns `'awaiting-slot'` when an active worker or queued/waiting job matches the trio, `'wedged'` when neither correlation matches. Defaults to `'awaiting-slot'` on classifier error so a Redis blip doesn't mis-emit the wedged canary. - `src/router/active-workers.ts` — `getActiveWorkers()` now exposes `(projectId, workItemId, agentType)` so the classifier can correlate. Backwards-compatible (existing callers work unchanged; new fields are additive optional). - `src/router/webhook-processor.ts` — Step 8 (work-item lock check) now splits the decision-reason vocabulary into three states: * `Job queued: ...` (success path) * `Awaiting worker slot: ...` (lock held + dispatch in flight; healthy) * `Work item locked (no active dispatch): ...` (wedged-lock canary) The wedged branch additionally fires `captureException` with `tags: { source: 'wedged_lock_canary' }` so any regression in compensation is loud in production. What this does NOT change (intentional, all in plan 015/2): - `guardedSpawn` still throws on capacity (BullMQ marks the job failed, the compensator now releases the locks, but the job itself is still lost). Plan 2 replaces the throw with a wait-for-slot semaphore. - Both queues still default to `attempts: 1`. Plan 2 raises this with exponential backoff and adds a transient/terminal error classifier. - CLAUDE.md is intentionally not updated by this plan — the unified passage describing both halves of the new contract lands in plan 015/2. Tests: - 5 new unit tests in `dispatch-compensator.test.ts` - 3 new unit tests in `agent-type-lock.test.ts` for `clearRecentlyDispatched` - 4 new unit tests in `bullmq-workers.test.ts` for the failed-event seam - 5 new unit tests in `lock-state-classifier.test.ts` - 2 new unit tests in `active-workers.test.ts` for the extended shape - 4 new unit tests in `webhook-processor.test.ts` for the three-way taxonomy - 3 new module-integration tests in `tests/integration/router/dispatch-failure-compensation.test.ts` exercise the real lock modules + real bullmq-workers.ts failed-event handler + real compensator end-to-end (only BullMQ's Worker constructor + the worker-env extractors are mocked). Full suite: 8515 passed / 23 skipped / 0 failed. Lint + typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 015/2 wait-for-slot-and-retry-classifier Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): mark 015/2 status: wip * feat(router): plan 015/2 done — wait-for-slot + retry budget + classifier Closes the lost-job half of spec 015's bug class. Combined with plan 015/1, the silent black-hole failure mode verified live in prod on 2026-04-26 (ucho/MNG-350) is now fully closed. What landed: - `src/router/slot-waiter.ts` (new) — semaphore-style primitive: `acquireSlot({ timeoutMs })` resolves immediately when capacity is below `routerConfig.maxWorkers`, otherwise queues a FIFO waiter with a bounded timeout that rejects with `code: 'SLOT_WAIT_TIMEOUT'`. `slotReleased()` pops the head waiter; `clearAllWaiters()` rejects every pending waiter with `code: 'SHUTDOWN'` on router stop. - `src/router/dispatch-error-classifier.ts` (new) — classifies thrown errors into `'transient'` (Docker socket Node codes, HTTP 429/409, SLOT_WAIT_TIMEOUT, anything unknown — default-to-retry) vs `'terminal'` (TypeError, ZodError, image-not-found-after-fallback). - `src/router/worker-manager.ts` — `guardedSpawn` rewritten: `await acquireSlot(...)` replaces the synchronous capacity throw; on spawn error, terminal errors are wrapped in BullMQ's `UnrecoverableError` so retries skip; transient errors propagate unchanged so BullMQ retries via attempts/backoff. - `src/router/active-workers.ts` — `cleanupWorker` now calls `slotReleased()` exactly once per cleanup, including on the crash path. The existing `if (worker)` guard ensures idempotence. - `src/router/config.ts` — new `slotWaitTimeoutMs` field (default 5min, configurable via `SLOT_WAIT_TIMEOUT_MS`). - `src/router/queue.ts` and `src/queue/client.ts` — both queues now default to `attempts: 4` with `backoff: { type: 'exponential', delay: 5000 }` (~75s total before exhaustion). Terminal errors bypass via `UnrecoverableError`. - `src/router/container-manager.ts` — exports the existing `isImageNotFoundError` predicate so the classifier can reuse it. Test contract change (spec AC #9): The previous `tests/unit/router/worker-manager.test.ts:179` assertion `'processFn throws when at capacity'` is REPLACED (not deleted) with `'processFn awaits a slot when at capacity, then dispatches when one frees'`. The throw-on-capacity contract is gone forever. Tests: - 7 new unit tests in `slot-waiter.test.ts` (FIFO, timeout, no-op, shutdown rejection) - 11 new unit tests in `dispatch-error-classifier.test.ts` covering every transient/terminal class - 4 new unit tests in `worker-manager.test.ts` (replaced original capacity-throw test + 3 for retry classification) - 3 new unit tests in `active-workers.test.ts` for slotReleased integration - 5 new module-integration tests in `dispatch-retry.test.ts` exercise REAL guardedSpawn + REAL slot-waiter + REAL dispatch-error-classifier against both queues, mocking only spawnWorker + BullMQ Worker constructor. Plan 1's 3 module-integration tests continue to pass alongside plan 2's 5. Full unit suite: 8539 passed / 23 skipped / 0 failed. CLAUDE.md updated with a new "Dispatch failure semantics" section documenting the unified contract (capacity wait, retry budget, classifier, three-way decision-reason taxonomy from plan 1, wedged-lock canary). File now 182 lines, under the 200-line cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(spec): 015 done — router job dispatch failure recovery, all plans complete Closes the silent black-hole bug class verified live on 2026-04-26 (ucho/MNG-350). Plan 1 added failed-event lock compensation + three-way decision-reason taxonomy; plan 2 replaced the throw-on-capacity with wait-for-slot, added bounded retry with exponential backoff, and introduced a transient/terminal error classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump postcss from 8.5.8 to 8.5.12 (#1204) Bumps [postcss](https://github.com/postcss/postcss) from 8.5.8 to 8.5.12. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](postcss/postcss@8.5.8...8.5.12) --- updated-dependencies: - dependency-name: postcss dependency-version: 8.5.12 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump uuid, bullmq and dockerode (#1192) Removes [uuid](https://github.com/uuidjs/uuid). It's no longer used after updating ancestor dependencies [uuid](https://github.com/uuidjs/uuid), [bullmq](https://github.com/taskforcesh/bullmq) and [dockerode](https://github.com/apocas/dockerode). These dependencies need to be updated together. Removes `uuid` Updates `bullmq` from 5.72.0 to 5.76.2 - [Release notes](https://github.com/taskforcesh/bullmq/releases) - [Commits](taskforcesh/bullmq@v5.72.0...v5.76.2) Updates `dockerode` from 4.0.10 to 5.0.0 - [Release notes](https://github.com/apocas/dockerode/releases) - [Commits](apocas/dockerode@v4.0.10...v5.0.0) --- updated-dependencies: - dependency-name: uuid dependency-version: dependency-type: indirect - dependency-name: bullmq dependency-version: 5.76.2 dependency-type: direct:production - dependency-name: dockerode dependency-version: 5.0.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * spec 016: PM image delivery reliability (#1209) * docs(spec/plans): add spec 016 + plans for PM image delivery reliability Closes the silent screenshot-drop bug class verified live on 2026-04-26 (ucho/MNG-357): Linear's user-pasted-image URLs (uploads.linear.app/<uuid> with no file extension) were dropped at the pre-download MIME filter because mimeTypeFromUrl returned 'application/octet-stream' and filterImageMedia excluded them. This affected all engines on the disk-write path, regardless of PR #948's Claude-Code SDK delivery fix. Three plans, safety-net-first sequencing matching spec 015: - Plan 1 (boot-path-mime-fix-and-diagnostic-log): defers MIME authority to download response Content-Type via image/* wildcard sentinel; adds the grep-stable diagnostic log line at extract time. Independently fixes MNG-357. - Plan 2 (runtime-gadget-image-delivery): makes the runtime cascade-tools pm read-work-item gadget actually download + write images to disk with file paths returned in text. Closes the mid-run pickup gap. Depends on Plan 1's shared download-and-prepare helper. - Plan 3 (linear-fixture-and-extraction-coverage): captures a Linear GraphQL Issue payload fixture for an issue with a pasted screenshot; pins extraction with a regression test that fails loudly if Linear ever changes the payload shape. Mostly tests + docs. 9 ACs, 0 manual-only. CLAUDE.md not updated (already covered by spec 015's silent-failure → diagnostic-line pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 016/1 boot-path-mime-fix-and-diagnostic-log * feat(pm): plan 016/1 done — boot-path image MIME fix + diagnostic log Closes the silent screenshot-drop bug class verified live on 2026-04-26 (ucho/MNG-357). Linear's extension-less pasted-image URLs (uploads.linear.app/<uuid>) now survive the pre-download MIME filter via an image/* wildcard sentinel. The download response's Content-Type header is the authoritative MIME — wildcard is resolved before bytes are written. What landed: - src/pm/media.ts — new IMAGE_HOST_ALLOWLIST (currently 'uploads.linear.app'); mimeTypeFromUrl returns 'image/*' for extension-less URLs from allowlisted hosts; isImageMimeType accepts the wildcard. - src/pm/download-and-prepare.ts (new) — shared helper for the per-provider download dispatch loop (jira/linear/trello). Returns { images, failures }. Spec 016/2's runtime gadget will import this. - src/agents/definitions/contextSteps.ts — fetchWorkItemStep refactored to use the shared helper; emits the new grep-stable diagnostic line '[image-pipeline] work-item-fetch summary' with stable fields: { provider, workItemId, urlsDetected, urlsAfterFilter, urlsDownloaded, urlsFailed, urlsByMimeType }. Tests: - 6 new unit tests in tests/unit/pm/media.test.ts (wildcard sentinel, Linear extension-less, regression for extensioned + non-PM URLs) - 7 new unit tests in tests/unit/pm/download-and-prepare.test.ts - 3 new diagnostic-log tests in contextSteps.test.ts; existing log message expectations updated to the new helper-prefix - 3 module-integration tests in tests/integration/pm/image-pipeline.test.ts pinning the MNG-357 reproduction end-to-end with real mimeTypeFromUrl + filterImageMedia + extractMarkdownImages PR #948's Claude-Code initial-input ImageBlockParam path is unchanged; existing regression test (claude-code.test.ts:939 'logs image injection and strips images before buildTaskPrompt') confirms. Docs: - CHANGELOG.md entry under Unreleased. - src/integrations/README.md gains a new 'Image delivery contract' section documenting the shared resolution path, allowlist semantics, diagnostic log line schema, and the rule that providers shouldn't write their own MIME-detection. Full unit suite: 8521 passed / 23 skipped / 0 failed. Lint + typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 016/2 runtime-gadget-image-delivery * feat(pm): plan 016/2 done — runtime gadget delivers images on disk Closes the mid-run image pickup gap from spec 016. The runtime gadget `cascade-tools pm read-work-item` now downloads any image media and writes it to .cascade/context/images/work-item-<id>-img-<index>.<ext>, returning text whose new "Local Image Files" section lists actual file paths the agent's file-read tool can consume. What landed: - src/gadgets/pm/core/writeRuntimeImages.ts (new) — writes ContextImage arrays to .cascade/context/images/ with stable naming convention (work-item-<id>-img-<i>.<ext>); extension derived from resolved MIME; falls back to .bin + warn log for unresolved image/* sentinel. - src/gadgets/pm/core/readWorkItem.ts — readWorkItem now calls downloadAndPrepareImages (Plan 1's helper) + writeRuntimeImages (this plan), then mutates the returned text to include the local file paths via formatRuntimeImagePaths. Same diagnostic log line '[image-pipeline] work-item-fetch summary' as the boot path. Failed downloads surface in a "Failed Image Downloads" subsection. Tests: - 8 new unit tests in tests/unit/gadgets/pm/core/writeRuntimeImages.test.ts - 5 new unit tests in tests/unit/gadgets/pm/core/readWorkItem.test.ts (spec 016/2 sub-describe) - 4 new module-integration tests in tests/integration/gadgets/runtime-image-delivery.test.ts pinning the mid-run pickup contract end-to-end. CHANGELOG.md entry added. Full unit suite (single-fork): 8534 passed / 23 skipped / 0 failed. Lint + typecheck clean. Three PM manifest test suites occasionally time out under parallel load on this machine — verified to pass in isolation; not a code regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(plan): lock 016/3 linear-fixture-and-extraction-coverage * test(pm): plan 016/3 done — Linear fixture + extraction-coverage regression Closes spec 016 with the regression net for the contract Plans 1+2 established. If Linear ever changes its Issue payload shape in a way that loses inline images, the extraction-coverage test fails loudly with a specific URL-missing message. What landed: - tests/fixtures/linear-issue-with-screenshot.json (new) — reconstructed Linear GraphQL Issue payload covering: extension-less uploads.linear.app URL in description, extensioned Linear URL with alt text, external URL with image/svg+xml MIME, non-image markdown link (must NOT be picked up), one comment with a pasted screenshot, one comment without, and three formal Attachment records (Slack/GitHub/Sentry link previews). - tests/unit/pm/linear/extraction-coverage.test.ts (new) — 9 tests: description coverage with explicit expected-URL list, image/* sentinel for extension-less, concrete MIME for extensioned, image/svg+xml for external SVG, non-image link exclusion, comment coverage, comment source field, attachment-NOT-leaked rule, meta-test of regression net. - src/integrations/README.md — new "Linear: GraphQL surface for inline images" subsection documenting the conclusion: Issue.description markdown is canonical for inline-pasted screenshots; Issue.attachments is for formal Attachment records (link previews) and is the wrong surface for inline images. Links to the fixture and the test. No production code change — Plan 1's mimeTypeFromUrl + extractMarkdownImages already cover the cases. This plan ships the regression armor. CHANGELOG.md entry added. Lint + typecheck clean. 9/9 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(spec): 016 done — pm-image-delivery-reliability, all plans complete Closes the silent screenshot-drop bug class verified live on 2026-04-26 (ucho/MNG-357). Plan 1 added the Linear-extension-less MIME wildcard sentinel + diagnostic log line; plan 2 made the runtime cascade-tools pm read-work-item gadget actually deliver images on disk; plan 3 captured a Linear GraphQL fixture and pinned extraction coverage with a regression test. CLAUDE.md untouched by this spec — already covered by spec 015's broader silent-failure → diagnostic-line pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address code review concerns * test(image-pipeline): supply urlsDetected in readWorkItemWithMedia mocks The diagnostic-line assertion expected urlsDetected on the log payload, but the mocked readWorkItemWithMedia return values omitted it, so the field arrived as undefined and the toHaveBeenCalledWith match failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cascade Bot <bot@cascade.dev> * fix(linear): drop comment-mention planning-state gate that prod payload never satisfies (#1210) PR #1201 added a `currentStateId !== planningStateId` gate to the Linear comment @mention trigger that read `data.issue.stateId` from the webhook payload. Linear's Comment webhook does not ship `stateId` on the nested issue (verified across four prod payloads on 2026-04-26 — 8cd0108a, b93e4925, 6548cd14, 3d95b210). The gate therefore always evaluated to true and silently dropped every legitimate bot @mention, including the one on MNG-346 that motivated this fix. The agent (respond-to-planning-comment) is now responsible for any planning-only behavior; the trigger no longer gates on state and avoids an extra Linear GraphQL round-trip per comment. Also corrects `LinearWebhookCommentTriggerData.issue` to match what Linear actually ships (six keys, no `stateId`, optional `team`) — the old type lied and PR #1201 trusted it. Tests pin a real prod-shape Comment payload as a regression. JIRA's equivalent gate is unaffected (its `comment_created` payload does ship `issue.fields.status.name`). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: aaight <aaight42@gmail.com> Co-authored-by: Cascade Bot <bot@cascade.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

zbigniewsobiecki and others added 7 commits April 26, 2026 16:20

chore(plan): lock 015/1 failed-event-lock-compensation

e2fb2f8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(plan): lock 015/2 wait-for-slot-and-retry-classifier

a12ad40

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(plan): mark 015/2 status: wip

cf0c59b

zbigniewsobiecki requested a review from aaight April 26, 2026 16:53

nhopeatall approved these changes Apr 26, 2026

View reviewed changes

zbigniewsobiecki merged commit 6994eff into dev Apr 26, 2026
9 checks passed

zbigniewsobiecki mentioned this pull request Apr 26, 2026

Merge dev → main (deps + spec 015) #1208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec 015: router job dispatch failure recovery#1203

spec 015: router job dispatch failure recovery#1203
zbigniewsobiecki merged 7 commits intodevfrom
feat/spec-015-router-dispatch-recovery

zbigniewsobiecki commented Apr 26, 2026

Uh oh!

codecov Bot commented Apr 26, 2026

Uh oh!

nhopeatall left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zbigniewsobiecki commented Apr 26, 2026

Summary

Test plan

Files changed

Uh oh!

codecov Bot commented Apr 26, 2026

Codecov Report

Uh oh!

nhopeatall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants