Skip to content

fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents#1201

Merged
aaight merged 2 commits intodevfrom
feature/pm-feedback-inconsistencies
Apr 26, 2026
Merged

fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents#1201
aaight merged 2 commits intodevfrom
feature/pm-feedback-inconsistencies

Conversation

@aaight
Copy link
Copy Markdown
Collaborator

@aaight aaight commented Apr 26, 2026

Summary

Fixes 8 inconsistencies in how CASCADE agents post PM feedback and handle comment-mention triggers, as tracked in https://trello.com/c/69ee2e17cf3b0fb4a6616e28.

  • Linear ack support: postPMAckComment() now posts acknowledgment comments for Linear projects (was silently no-op)
  • respond-to-pr-comment in PM summary: Added respond-to-pr-comment to AGENT_OUTPUT_CONFIG so it posts review responses back to PM work items
  • Planning-status gating for JIRA/Linear: Comment-mention triggers now skip issues not in the configured PLANNING status/state, matching Trello's existing behavior
  • Consolidated bot identity caching: JIRA and Trello comment triggers now use the shared resolveJiraBotIdentity / resolveTrelloBotIdentity helpers (with per-project TTL cache) instead of per-trigger-instance module-level caches
  • Unified AgentInput comment fields: triggerCommentBody is now the canonical field; triggerCommentText is retained as a @deprecated alias for one release. All triggers populate both, and buildTaskPromptContext prefers triggerCommentBody
  • workItemUrl/workItemTitle on GitHub triggers: check-suite-failure, check-suite-success, pr-review-submitted, and pr-comment-mention triggers now resolve and populate work item display data via a new resolveWorkItemDisplayData helper
  • Shared resolveWorkItemDisplayData helper: Extracted best-effort helper in src/triggers/github/utils.ts using getPMProviderOrNull() to avoid hard failures

Changes

Core sources

  • src/triggers/shared/pm-ack.ts — add Linear case
  • src/triggers/shared/agent-pm-poster.ts — add respond-to-pr-comment to AGENT_OUTPUT_CONFIG
  • src/router/bot-identity-resolvers.ts — add TrelloBotIdentity / JiraBotIdentity interfaces and full-identity resolvers; existing resolveTrelloBotMemberId / resolveJiraBotAccountId delegate to them
  • src/router/acknowledgments.ts — re-export new types and resolvers
  • src/triggers/trello/comment-mention.ts — use shared resolveTrelloBotIdentity; add triggerCommentBody
  • src/triggers/jira/comment-mention.ts — use shared resolveJiraBotIdentity; add PLANNING-status gate; add triggerCommentBody; extract isInPlanningStatus helper to satisfy complexity limit
  • src/triggers/linear/comment-mention.ts — add PLANNING-state gate; add triggerCommentBody
  • src/triggers/github/utils.ts — add resolveWorkItemDisplayData helper
  • src/triggers/github/{check-suite-failure,check-suite-success,pr-review-submitted,pr-comment-mention}.ts — populate workItemUrl/workItemTitle on TriggerResult
  • src/types/index.ts — document triggerCommentBody as canonical; deprecate triggerCommentText
  • src/agents/prompts/index.ts — prefer triggerCommentBody ?? triggerCommentText in prompt context

Tests

  • tests/unit/triggers/trello-comment-mention.test.ts — mock resolveTrelloBotIdentity instead of trelloClient.getMe
  • tests/unit/triggers/jira-comment-mention.test.ts — mock resolveJiraBotIdentity; add statuses.planning to project config; add PLANNING-gate tests
  • tests/unit/triggers/linear-comment-mention.test.ts — add linear.statuses.planning to project config; add PLANNING-gate tests; fix fallback-issue test to include stateId

Test plan

  • All 460 test files pass (8497 tests)
  • npm run lint passes with no errors
  • npm run typecheck passes with no errors
  • Trello/JIRA/Linear comment-mention trigger tests: 57 tests across 3 files

🕵️ claude-code · claude-sonnet-4-6 · run details

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 82.22222% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/triggers/github/utils.ts 42.85% 12 Missing ⚠️
src/triggers/jira/comment-mention.ts 89.74% 4 Missing ⚠️
src/triggers/shared/pm-ack.ts 0.00% 4 Missing ⚠️
src/triggers/trello/comment-mention.ts 60.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid set of consistency fixes — the bot identity consolidation, Linear ack support, planning-status gating, and unified comment fields are all well-executed. One real issue with JIRA status comparison.

Code Issues

Should Fix

  • src/triggers/jira/comment-mention.ts:89 — The isInPlanningStatus helper uses strict equality (currentStatusName !== planningStatusName) for JIRA status comparison, but every other JIRA trigger in the codebase uses case-insensitive comparison via .toLowerCase(). See status-changed.ts:46 (jiraStatus.toLowerCase() === lower) and label-added.ts:90 (jiraStatus.toLowerCase() === currentStatus.toLowerCase()). JIRA status names are user-configurable and the API does not guarantee consistent casing — a project configured with "Planning" could receive a webhook with "planning" and the gate would silently reject valid comments. This should use case-insensitive comparison to match the established pattern.

Questions

  • resolveWorkItemDisplayData depends on getPMProviderOrNull() which uses AsyncLocalStorage. The JSDoc says it requires withPMScope but there are no unit tests for this helper. The function is simple and best-effort so this is not blocking, but was this a deliberate omission or an oversight?

🕵️ claude-code · claude-opus-4-6 · run details

Comment thread src/triggers/jira/comment-mention.ts Outdated
);
return false;
}
if (currentStatusName !== planningStatusName) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD_FIX] Strict equality here is inconsistent with the rest of the JIRA trigger codebase. status-changed.ts:46 and label-added.ts:90 both use .toLowerCase() for status comparison because JIRA status names are user-configurable and the API does not guarantee consistent casing.

This should be:

if (currentStatusName?.toLowerCase() !== planningStatusName.toLowerCase()) {

Otherwise a project with statuses.planning = "Planning" that receives a webhook reporting "planning" would silently skip the trigger.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Changed the comparison to currentStatusName?.toLowerCase() !== planningStatusName.toLowerCase() to match the case-insensitive pattern used in status-changed.ts:46 and label-added.ts:90. JIRA status names are user-configurable and casing can vary between config and webhook payloads.

…nningStatus

Match the established pattern from status-changed.ts and label-added.ts which
both use .toLowerCase() for JIRA status comparisons, since status names are
user-configurable and the API does not guarantee consistent casing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - testing

🕵️ claude-code · claude-opus-4-6 · run details

@aaight aaight merged commit c2ce419 into dev Apr 26, 2026
9 checks passed
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid PR that addresses real PM feedback inconsistencies across all three providers. The changes are well-structured, correctly scoped, and well-tested. CI is fully green.

Should Fix

  • src/triggers/github/utils.tsresolveWorkItemDisplayData has no unit tests. This is a new helper exercised by 4 GitHub trigger handlers. While it's best-effort (try/catch, returns {} on error), it contains branching logic (no workItemId, no provider, API error) that deserves basic coverage. A test verifying provider-present, provider-absent, and error paths would catch regressions — especially since none of the 4 caller trigger tests mock this function (Grep shows zero test files reference it).

Notes

  • The package-lock.json changes (removal of libc fields from 4 optional native deps) appear unrelated to the PR scope — likely an npm version artifact. Harmless, but worth noting for commit hygiene.
  • The test mocks for resolveJiraBotIdentity and resolveTrelloBotIdentity use the (...args) => mockFn(...args) wrapper pattern that tests/README.md calls out as an anti-pattern (loses toHaveBeenCalledWith tracking). Since these tests only check mock.calls.length, it works, but direct references via vi.hoisted() would be more idiomatic per project conventions.

🕵️ claude-code · claude-opus-4-6 · run details

});
return {};
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD_FIX] This new utility has no unit tests. It handles three distinct branches (no workItemId, no PM provider in scope, API error) plus the happy path. Since it's invoked by 4 trigger handlers and does an API call, basic test coverage would catch regressions.

Suggested test scenarios:

  1. Returns {} when workItemId is undefined
  2. Returns {} when getPMProviderOrNull() returns null
  3. Returns { workItemUrl, workItemTitle } when provider returns a valid work item
  4. Returns {} and logs debug on provider error

getMyself: mockJiraClientGetMyself,
},
vi.mock('../../../src/router/bot-identity-resolvers.js', () => ({
resolveJiraBotIdentity: (...args: unknown[]) => mockResolveJiraBotIdentity(...args),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NITPICK] The (...args: unknown[]) => mockResolveJiraBotIdentity(...args) wrapper is the pattern tests/README.md calls out as an anti-pattern — toHaveBeenCalledWith won't work on the outer mock. Since the mock is already vi.hoisted(), you can use a direct reference:

vi.mock('../../../src/router/bot-identity-resolvers.js', () => ({
  resolveJiraBotIdentity: mockResolveJiraBotIdentity,
}));

Same applies to the Trello and Linear test files.

@zbigniewsobiecki zbigniewsobiecki mentioned this pull request Apr 26, 2026
2 tasks
zbigniewsobiecki added a commit that referenced this pull request Apr 26, 2026
* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents (#1201)

* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents

* fix(triggers): use case-insensitive JIRA status comparison in isInPlanningStatus

Match the established pattern from status-changed.ts and label-added.ts which
both use .toLowerCase() for JIRA status comparisons, since status names are
user-configurable and the API does not guarantee consistent casing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(linear): populate inlineMedia from descriptions/comments and add downloadAttachment (#1202)

Co-authored-by: Cascade Bot <bot@cascade.dev>

---------

Co-authored-by: aaight <aaight42@gmail.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zbigniewsobiecki added a commit that referenced this pull request Apr 26, 2026
* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents (#1201)

* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents

* fix(triggers): use case-insensitive JIRA status comparison in isInPlanningStatus

Match the established pattern from status-changed.ts and label-added.ts which
both use .toLowerCase() for JIRA status comparisons, since status names are
user-configurable and the API does not guarantee consistent casing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(linear): populate inlineMedia from descriptions/comments and add downloadAttachment (#1202)

Co-authored-by: Cascade Bot <bot@cascade.dev>

* fix(claude-code): pin pathToClaudeCodeExecutable so SDK skips broken native-binary probe (#1206)

The agent-harness SDK bump in #1197 (claude-agent-sdk 0.2.91 → 0.2.119)
broke every review run on cascade-prod with:

  ReferenceError: Claude Code native binary not found at
  /app/node_modules/@anthropic-ai/claude-agent-sdk-linux-x64-musl/claude

The new SDK probes its own platform-specific optional-dependency
subpackages for a bundled `claude` binary. Two failure modes hit at once:

  1. Cascade installs `@anthropic-ai/claude-code@2.1.119` globally at
     /usr/local/bin/claude — the SDK never looks there.
  2. The SDK probes the `-musl` variant first regardless of host libc and
     errors on ENOENT instead of falling through to the glibc variant.

Pass an explicit `pathToClaudeCodeExecutable` to short-circuit the probe.
The resolver checks (in order):

  - $CLAUDE_CODE_EXECUTABLE_PATH env override (local-dev escape hatch)
  - `which claude` in $PATH
  - /usr/local/bin/claude (Docker default from Dockerfile.worker)

Two TDD tests pin the option onto query() and prove the env override
wins. No Dockerfile change needed; the existing global install at
/usr/local/bin/claude becomes the resolver's runtime target.

Confirmed broken on ucho PR #72 (cascade-prod review agent crash).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: aaight <aaight42@gmail.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zbigniewsobiecki added a commit that referenced this pull request Apr 26, 2026
* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents (#1201)

* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents

* fix(triggers): use case-insensitive JIRA status comparison in isInPlanningStatus

Match the established pattern from status-changed.ts and label-added.ts which
both use .toLowerCase() for JIRA status comparisons, since status names are
user-configurable and the API does not guarantee consistent casing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(linear): populate inlineMedia from descriptions/comments and add downloadAttachment (#1202)

Co-authored-by: Cascade Bot <bot@cascade.dev>

* fix(claude-code): pin pathToClaudeCodeExecutable so SDK skips broken native-binary probe (#1206)

The agent-harness SDK bump in #1197 (claude-agent-sdk 0.2.91 → 0.2.119)
broke every review run on cascade-prod with:

  ReferenceError: Claude Code native binary not found at
  /app/node_modules/@anthropic-ai/claude-agent-sdk-linux-x64-musl/claude

The new SDK probes its own platform-specific optional-dependency
subpackages for a bundled `claude` binary. Two failure modes hit at once:

  1. Cascade installs `@anthropic-ai/claude-code@2.1.119` globally at
     /usr/local/bin/claude — the SDK never looks there.
  2. The SDK probes the `-musl` variant first regardless of host libc and
     errors on ENOENT instead of falling through to the glibc variant.

Pass an explicit `pathToClaudeCodeExecutable` to short-circuit the probe.
The resolver checks (in order):

  - $CLAUDE_CODE_EXECUTABLE_PATH env override (local-dev escape hatch)
  - `which claude` in $PATH
  - /usr/local/bin/claude (Docker default from Dockerfile.worker)

Two TDD tests pin the option onto query() and prove the env override
wins. No Dockerfile change needed; the existing global install at
/usr/local/bin/claude becomes the resolver's runtime target.

Confirmed broken on ucho PR #72 (cascade-prod review agent crash).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* spec 015: router job dispatch failure recovery (#1203)

* docs(spec/plans): add spec 015 + plans for router dispatch failure recovery

Spec captures the silent black-hole bug class verified live on 2026-04-26
(ucho/MNG-350): a transient capacity miss or Docker error during worker
spawn turns a webhook-driven job into a permanently failed BullMQ entry
while stranding the work-item / agent-type locks for up to 30 minutes,
silently rejecting subsequent webhooks for the same work item.

Decomposed into two plans with safety-net-first sequencing: plan 1 hooks
the BullMQ failed event to release locks on every dispatch failure path;
plan 2 replaces the throw-on-capacity with a wait-for-slot semaphore,
adds bounded retry with exponential backoff, and a transient/terminal
error classifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 015/1 failed-event-lock-compensation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(router): plan 015/1 done — release locks on dispatch failure

Closes the stranded-lock half of spec 015's bug class verified live in
prod on 2026-04-26 (ucho/MNG-350). When a webhook-driven job's dispatch
fails — capacity throw, Docker spawn error, or any future throw site —
the work-item lock, agent-type concurrency counter, and recently-dispatched
dedup mark established by the webhook → enqueue path are now released by
a compensator hooked to BullMQ's `worker.on('failed')` event.

What landed:

- `src/router/dispatch-compensator.ts` (new) — `releaseLocksForFailedJob`
  wraps `extractProjectIdFromJob` / `extractWorkItemId` / `extractAgentType`
  and calls into `clearWorkItemEnqueued` / `clearAgentTypeEnqueued` /
  `clearRecentlyDispatched`. Never propagates errors; captures to Sentry
  with `tags: { source: 'dispatch_compensator' }`.
- `src/router/agent-type-lock.ts` — exports new `clearRecentlyDispatched`
  for the compensator. The existing `markRecentlyDispatched` semantics
  are unchanged (60s TTL, NOT cleared on completion); this helper exists
  solely so a permanently-failed dispatch doesn't keep deduping a fresh
  webhook for ~60s while the user retries.
- `src/router/bullmq-workers.ts` — extends the existing
  `worker.on('failed')` handler to invoke `releaseLocksForFailedJob`
  alongside the existing logger + Sentry calls. Wraps the call in a
  defensive `.catch` so a future regression in the compensator can't
  poison the worker.
- `src/router/lock-state-classifier.ts` (new) — `classifyLockState` returns
  `'awaiting-slot'` when an active worker or queued/waiting job matches
  the trio, `'wedged'` when neither correlation matches. Defaults to
  `'awaiting-slot'` on classifier error so a Redis blip doesn't mis-emit
  the wedged canary.
- `src/router/active-workers.ts` — `getActiveWorkers()` now exposes
  `(projectId, workItemId, agentType)` so the classifier can correlate.
  Backwards-compatible (existing callers work unchanged; new fields are
  additive optional).
- `src/router/webhook-processor.ts` — Step 8 (work-item lock check) now
  splits the decision-reason vocabulary into three states:
    * `Job queued: ...` (success path)
    * `Awaiting worker slot: ...` (lock held + dispatch in flight; healthy)
    * `Work item locked (no active dispatch): ...` (wedged-lock canary)
  The wedged branch additionally fires `captureException` with
  `tags: { source: 'wedged_lock_canary' }` so any regression in
  compensation is loud in production.

What this does NOT change (intentional, all in plan 015/2):
- `guardedSpawn` still throws on capacity (BullMQ marks the job failed,
  the compensator now releases the locks, but the job itself is still
  lost). Plan 2 replaces the throw with a wait-for-slot semaphore.
- Both queues still default to `attempts: 1`. Plan 2 raises this with
  exponential backoff and adds a transient/terminal error classifier.
- CLAUDE.md is intentionally not updated by this plan — the unified
  passage describing both halves of the new contract lands in plan 015/2.

Tests:
- 5 new unit tests in `dispatch-compensator.test.ts`
- 3 new unit tests in `agent-type-lock.test.ts` for `clearRecentlyDispatched`
- 4 new unit tests in `bullmq-workers.test.ts` for the failed-event seam
- 5 new unit tests in `lock-state-classifier.test.ts`
- 2 new unit tests in `active-workers.test.ts` for the extended shape
- 4 new unit tests in `webhook-processor.test.ts` for the three-way taxonomy
- 3 new module-integration tests in
  `tests/integration/router/dispatch-failure-compensation.test.ts` exercise
  the real lock modules + real bullmq-workers.ts failed-event handler +
  real compensator end-to-end (only BullMQ's Worker constructor + the
  worker-env extractors are mocked).

Full suite: 8515 passed / 23 skipped / 0 failed. Lint + typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 015/2 wait-for-slot-and-retry-classifier

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): mark 015/2 status: wip

* feat(router): plan 015/2 done — wait-for-slot + retry budget + classifier

Closes the lost-job half of spec 015's bug class. Combined with plan 015/1,
the silent black-hole failure mode verified live in prod on 2026-04-26
(ucho/MNG-350) is now fully closed.

What landed:

- `src/router/slot-waiter.ts` (new) — semaphore-style primitive:
  `acquireSlot({ timeoutMs })` resolves immediately when capacity is
  below `routerConfig.maxWorkers`, otherwise queues a FIFO waiter with
  a bounded timeout that rejects with `code: 'SLOT_WAIT_TIMEOUT'`.
  `slotReleased()` pops the head waiter; `clearAllWaiters()` rejects
  every pending waiter with `code: 'SHUTDOWN'` on router stop.
- `src/router/dispatch-error-classifier.ts` (new) — classifies thrown
  errors into `'transient'` (Docker socket Node codes, HTTP 429/409,
  SLOT_WAIT_TIMEOUT, anything unknown — default-to-retry) vs
  `'terminal'` (TypeError, ZodError, image-not-found-after-fallback).
- `src/router/worker-manager.ts` — `guardedSpawn` rewritten:
  `await acquireSlot(...)` replaces the synchronous capacity throw;
  on spawn error, terminal errors are wrapped in BullMQ's
  `UnrecoverableError` so retries skip; transient errors propagate
  unchanged so BullMQ retries via attempts/backoff.
- `src/router/active-workers.ts` — `cleanupWorker` now calls
  `slotReleased()` exactly once per cleanup, including on the crash
  path. The existing `if (worker)` guard ensures idempotence.
- `src/router/config.ts` — new `slotWaitTimeoutMs` field (default
  5min, configurable via `SLOT_WAIT_TIMEOUT_MS`).
- `src/router/queue.ts` and `src/queue/client.ts` — both queues now
  default to `attempts: 4` with `backoff: { type: 'exponential', delay: 5000 }`
  (~75s total before exhaustion). Terminal errors bypass via
  `UnrecoverableError`.
- `src/router/container-manager.ts` — exports the existing
  `isImageNotFoundError` predicate so the classifier can reuse it.

Test contract change (spec AC #9):

The previous `tests/unit/router/worker-manager.test.ts:179` assertion
`'processFn throws when at capacity'` is REPLACED (not deleted) with
`'processFn awaits a slot when at capacity, then dispatches when one
frees'`. The throw-on-capacity contract is gone forever.

Tests:

- 7 new unit tests in `slot-waiter.test.ts` (FIFO, timeout, no-op,
  shutdown rejection)
- 11 new unit tests in `dispatch-error-classifier.test.ts` covering
  every transient/terminal class
- 4 new unit tests in `worker-manager.test.ts` (replaced original
  capacity-throw test + 3 for retry classification)
- 3 new unit tests in `active-workers.test.ts` for slotReleased
  integration
- 5 new module-integration tests in `dispatch-retry.test.ts` exercise
  REAL guardedSpawn + REAL slot-waiter + REAL dispatch-error-classifier
  against both queues, mocking only spawnWorker + BullMQ Worker
  constructor.

Plan 1's 3 module-integration tests continue to pass alongside plan 2's 5.
Full unit suite: 8539 passed / 23 skipped / 0 failed.

CLAUDE.md updated with a new "Dispatch failure semantics" section
documenting the unified contract (capacity wait, retry budget,
classifier, three-way decision-reason taxonomy from plan 1, wedged-lock
canary). File now 182 lines, under the 200-line cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): 015 done — router job dispatch failure recovery, all plans complete

Closes the silent black-hole bug class verified live on 2026-04-26
(ucho/MNG-350). Plan 1 added failed-event lock compensation +
three-way decision-reason taxonomy; plan 2 replaced the throw-on-capacity
with wait-for-slot, added bounded retry with exponential backoff, and
introduced a transient/terminal error classifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump postcss from 8.5.8 to 8.5.12 (#1204)

Bumps [postcss](https://github.com/postcss/postcss) from 8.5.8 to 8.5.12.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](postcss/postcss@8.5.8...8.5.12)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.12
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump uuid, bullmq and dockerode (#1192)

Removes [uuid](https://github.com/uuidjs/uuid). It's no longer used after updating ancestor dependencies [uuid](https://github.com/uuidjs/uuid), [bullmq](https://github.com/taskforcesh/bullmq) and [dockerode](https://github.com/apocas/dockerode). These dependencies need to be updated together.


Removes `uuid`

Updates `bullmq` from 5.72.0 to 5.76.2
- [Release notes](https://github.com/taskforcesh/bullmq/releases)
- [Commits](taskforcesh/bullmq@v5.72.0...v5.76.2)

Updates `dockerode` from 4.0.10 to 5.0.0
- [Release notes](https://github.com/apocas/dockerode/releases)
- [Commits](apocas/dockerode@v4.0.10...v5.0.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-version: 
  dependency-type: indirect
- dependency-name: bullmq
  dependency-version: 5.76.2
  dependency-type: direct:production
- dependency-name: dockerode
  dependency-version: 5.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: aaight <aaight42@gmail.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
zbigniewsobiecki added a commit that referenced this pull request Apr 26, 2026
…ad never satisfies (#1210)

PR #1201 added a `currentStateId !== planningStateId` gate to the Linear
comment @mention trigger that read `data.issue.stateId` from the webhook
payload. Linear's Comment webhook does not ship `stateId` on the nested
issue (verified across four prod payloads on 2026-04-26 — 8cd0108a,
b93e4925, 6548cd14, 3d95b210). The gate therefore always evaluated to
true and silently dropped every legitimate bot @mention, including the
one on MNG-346 that motivated this fix.

The agent (respond-to-planning-comment) is now responsible for any
planning-only behavior; the trigger no longer gates on state and avoids
an extra Linear GraphQL round-trip per comment.

Also corrects `LinearWebhookCommentTriggerData.issue` to match what
Linear actually ships (six keys, no `stateId`, optional `team`) — the
old type lied and PR #1201 trusted it.

Tests pin a real prod-shape Comment payload as a regression. JIRA's
equivalent gate is unaffected (its `comment_created` payload does ship
`issue.fields.status.name`).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zbigniewsobiecki added a commit that referenced this pull request Apr 26, 2026
* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents (#1201)

* fix(triggers): audit & fix PM feedback inconsistencies across respond-to-* agents

* fix(triggers): use case-insensitive JIRA status comparison in isInPlanningStatus

Match the established pattern from status-changed.ts and label-added.ts which
both use .toLowerCase() for JIRA status comparisons, since status names are
user-configurable and the API does not guarantee consistent casing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(linear): populate inlineMedia from descriptions/comments and add downloadAttachment (#1202)

Co-authored-by: Cascade Bot <bot@cascade.dev>

* fix(claude-code): pin pathToClaudeCodeExecutable so SDK skips broken native-binary probe (#1206)

The agent-harness SDK bump in #1197 (claude-agent-sdk 0.2.91 → 0.2.119)
broke every review run on cascade-prod with:

  ReferenceError: Claude Code native binary not found at
  /app/node_modules/@anthropic-ai/claude-agent-sdk-linux-x64-musl/claude

The new SDK probes its own platform-specific optional-dependency
subpackages for a bundled `claude` binary. Two failure modes hit at once:

  1. Cascade installs `@anthropic-ai/claude-code@2.1.119` globally at
     /usr/local/bin/claude — the SDK never looks there.
  2. The SDK probes the `-musl` variant first regardless of host libc and
     errors on ENOENT instead of falling through to the glibc variant.

Pass an explicit `pathToClaudeCodeExecutable` to short-circuit the probe.
The resolver checks (in order):

  - $CLAUDE_CODE_EXECUTABLE_PATH env override (local-dev escape hatch)
  - `which claude` in $PATH
  - /usr/local/bin/claude (Docker default from Dockerfile.worker)

Two TDD tests pin the option onto query() and prove the env override
wins. No Dockerfile change needed; the existing global install at
/usr/local/bin/claude becomes the resolver's runtime target.

Confirmed broken on ucho PR #72 (cascade-prod review agent crash).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* spec 015: router job dispatch failure recovery (#1203)

* docs(spec/plans): add spec 015 + plans for router dispatch failure recovery

Spec captures the silent black-hole bug class verified live on 2026-04-26
(ucho/MNG-350): a transient capacity miss or Docker error during worker
spawn turns a webhook-driven job into a permanently failed BullMQ entry
while stranding the work-item / agent-type locks for up to 30 minutes,
silently rejecting subsequent webhooks for the same work item.

Decomposed into two plans with safety-net-first sequencing: plan 1 hooks
the BullMQ failed event to release locks on every dispatch failure path;
plan 2 replaces the throw-on-capacity with a wait-for-slot semaphore,
adds bounded retry with exponential backoff, and a transient/terminal
error classifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 015/1 failed-event-lock-compensation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(router): plan 015/1 done — release locks on dispatch failure

Closes the stranded-lock half of spec 015's bug class verified live in
prod on 2026-04-26 (ucho/MNG-350). When a webhook-driven job's dispatch
fails — capacity throw, Docker spawn error, or any future throw site —
the work-item lock, agent-type concurrency counter, and recently-dispatched
dedup mark established by the webhook → enqueue path are now released by
a compensator hooked to BullMQ's `worker.on('failed')` event.

What landed:

- `src/router/dispatch-compensator.ts` (new) — `releaseLocksForFailedJob`
  wraps `extractProjectIdFromJob` / `extractWorkItemId` / `extractAgentType`
  and calls into `clearWorkItemEnqueued` / `clearAgentTypeEnqueued` /
  `clearRecentlyDispatched`. Never propagates errors; captures to Sentry
  with `tags: { source: 'dispatch_compensator' }`.
- `src/router/agent-type-lock.ts` — exports new `clearRecentlyDispatched`
  for the compensator. The existing `markRecentlyDispatched` semantics
  are unchanged (60s TTL, NOT cleared on completion); this helper exists
  solely so a permanently-failed dispatch doesn't keep deduping a fresh
  webhook for ~60s while the user retries.
- `src/router/bullmq-workers.ts` — extends the existing
  `worker.on('failed')` handler to invoke `releaseLocksForFailedJob`
  alongside the existing logger + Sentry calls. Wraps the call in a
  defensive `.catch` so a future regression in the compensator can't
  poison the worker.
- `src/router/lock-state-classifier.ts` (new) — `classifyLockState` returns
  `'awaiting-slot'` when an active worker or queued/waiting job matches
  the trio, `'wedged'` when neither correlation matches. Defaults to
  `'awaiting-slot'` on classifier error so a Redis blip doesn't mis-emit
  the wedged canary.
- `src/router/active-workers.ts` — `getActiveWorkers()` now exposes
  `(projectId, workItemId, agentType)` so the classifier can correlate.
  Backwards-compatible (existing callers work unchanged; new fields are
  additive optional).
- `src/router/webhook-processor.ts` — Step 8 (work-item lock check) now
  splits the decision-reason vocabulary into three states:
    * `Job queued: ...` (success path)
    * `Awaiting worker slot: ...` (lock held + dispatch in flight; healthy)
    * `Work item locked (no active dispatch): ...` (wedged-lock canary)
  The wedged branch additionally fires `captureException` with
  `tags: { source: 'wedged_lock_canary' }` so any regression in
  compensation is loud in production.

What this does NOT change (intentional, all in plan 015/2):
- `guardedSpawn` still throws on capacity (BullMQ marks the job failed,
  the compensator now releases the locks, but the job itself is still
  lost). Plan 2 replaces the throw with a wait-for-slot semaphore.
- Both queues still default to `attempts: 1`. Plan 2 raises this with
  exponential backoff and adds a transient/terminal error classifier.
- CLAUDE.md is intentionally not updated by this plan — the unified
  passage describing both halves of the new contract lands in plan 015/2.

Tests:
- 5 new unit tests in `dispatch-compensator.test.ts`
- 3 new unit tests in `agent-type-lock.test.ts` for `clearRecentlyDispatched`
- 4 new unit tests in `bullmq-workers.test.ts` for the failed-event seam
- 5 new unit tests in `lock-state-classifier.test.ts`
- 2 new unit tests in `active-workers.test.ts` for the extended shape
- 4 new unit tests in `webhook-processor.test.ts` for the three-way taxonomy
- 3 new module-integration tests in
  `tests/integration/router/dispatch-failure-compensation.test.ts` exercise
  the real lock modules + real bullmq-workers.ts failed-event handler +
  real compensator end-to-end (only BullMQ's Worker constructor + the
  worker-env extractors are mocked).

Full suite: 8515 passed / 23 skipped / 0 failed. Lint + typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 015/2 wait-for-slot-and-retry-classifier

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): mark 015/2 status: wip

* feat(router): plan 015/2 done — wait-for-slot + retry budget + classifier

Closes the lost-job half of spec 015's bug class. Combined with plan 015/1,
the silent black-hole failure mode verified live in prod on 2026-04-26
(ucho/MNG-350) is now fully closed.

What landed:

- `src/router/slot-waiter.ts` (new) — semaphore-style primitive:
  `acquireSlot({ timeoutMs })` resolves immediately when capacity is
  below `routerConfig.maxWorkers`, otherwise queues a FIFO waiter with
  a bounded timeout that rejects with `code: 'SLOT_WAIT_TIMEOUT'`.
  `slotReleased()` pops the head waiter; `clearAllWaiters()` rejects
  every pending waiter with `code: 'SHUTDOWN'` on router stop.
- `src/router/dispatch-error-classifier.ts` (new) — classifies thrown
  errors into `'transient'` (Docker socket Node codes, HTTP 429/409,
  SLOT_WAIT_TIMEOUT, anything unknown — default-to-retry) vs
  `'terminal'` (TypeError, ZodError, image-not-found-after-fallback).
- `src/router/worker-manager.ts` — `guardedSpawn` rewritten:
  `await acquireSlot(...)` replaces the synchronous capacity throw;
  on spawn error, terminal errors are wrapped in BullMQ's
  `UnrecoverableError` so retries skip; transient errors propagate
  unchanged so BullMQ retries via attempts/backoff.
- `src/router/active-workers.ts` — `cleanupWorker` now calls
  `slotReleased()` exactly once per cleanup, including on the crash
  path. The existing `if (worker)` guard ensures idempotence.
- `src/router/config.ts` — new `slotWaitTimeoutMs` field (default
  5min, configurable via `SLOT_WAIT_TIMEOUT_MS`).
- `src/router/queue.ts` and `src/queue/client.ts` — both queues now
  default to `attempts: 4` with `backoff: { type: 'exponential', delay: 5000 }`
  (~75s total before exhaustion). Terminal errors bypass via
  `UnrecoverableError`.
- `src/router/container-manager.ts` — exports the existing
  `isImageNotFoundError` predicate so the classifier can reuse it.

Test contract change (spec AC #9):

The previous `tests/unit/router/worker-manager.test.ts:179` assertion
`'processFn throws when at capacity'` is REPLACED (not deleted) with
`'processFn awaits a slot when at capacity, then dispatches when one
frees'`. The throw-on-capacity contract is gone forever.

Tests:

- 7 new unit tests in `slot-waiter.test.ts` (FIFO, timeout, no-op,
  shutdown rejection)
- 11 new unit tests in `dispatch-error-classifier.test.ts` covering
  every transient/terminal class
- 4 new unit tests in `worker-manager.test.ts` (replaced original
  capacity-throw test + 3 for retry classification)
- 3 new unit tests in `active-workers.test.ts` for slotReleased
  integration
- 5 new module-integration tests in `dispatch-retry.test.ts` exercise
  REAL guardedSpawn + REAL slot-waiter + REAL dispatch-error-classifier
  against both queues, mocking only spawnWorker + BullMQ Worker
  constructor.

Plan 1's 3 module-integration tests continue to pass alongside plan 2's 5.
Full unit suite: 8539 passed / 23 skipped / 0 failed.

CLAUDE.md updated with a new "Dispatch failure semantics" section
documenting the unified contract (capacity wait, retry budget,
classifier, three-way decision-reason taxonomy from plan 1, wedged-lock
canary). File now 182 lines, under the 200-line cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): 015 done — router job dispatch failure recovery, all plans complete

Closes the silent black-hole bug class verified live on 2026-04-26
(ucho/MNG-350). Plan 1 added failed-event lock compensation +
three-way decision-reason taxonomy; plan 2 replaced the throw-on-capacity
with wait-for-slot, added bounded retry with exponential backoff, and
introduced a transient/terminal error classifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump postcss from 8.5.8 to 8.5.12 (#1204)

Bumps [postcss](https://github.com/postcss/postcss) from 8.5.8 to 8.5.12.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](postcss/postcss@8.5.8...8.5.12)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.12
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump uuid, bullmq and dockerode (#1192)

Removes [uuid](https://github.com/uuidjs/uuid). It's no longer used after updating ancestor dependencies [uuid](https://github.com/uuidjs/uuid), [bullmq](https://github.com/taskforcesh/bullmq) and [dockerode](https://github.com/apocas/dockerode). These dependencies need to be updated together.


Removes `uuid`

Updates `bullmq` from 5.72.0 to 5.76.2
- [Release notes](https://github.com/taskforcesh/bullmq/releases)
- [Commits](taskforcesh/bullmq@v5.72.0...v5.76.2)

Updates `dockerode` from 4.0.10 to 5.0.0
- [Release notes](https://github.com/apocas/dockerode/releases)
- [Commits](apocas/dockerode@v4.0.10...v5.0.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-version: 
  dependency-type: indirect
- dependency-name: bullmq
  dependency-version: 5.76.2
  dependency-type: direct:production
- dependency-name: dockerode
  dependency-version: 5.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* spec 016: PM image delivery reliability (#1209)

* docs(spec/plans): add spec 016 + plans for PM image delivery reliability

Closes the silent screenshot-drop bug class verified live on 2026-04-26
(ucho/MNG-357): Linear's user-pasted-image URLs (uploads.linear.app/<uuid>
with no file extension) were dropped at the pre-download MIME filter
because mimeTypeFromUrl returned 'application/octet-stream' and
filterImageMedia excluded them. This affected all engines on the
disk-write path, regardless of PR #948's Claude-Code SDK delivery fix.

Three plans, safety-net-first sequencing matching spec 015:
- Plan 1 (boot-path-mime-fix-and-diagnostic-log): defers MIME authority
  to download response Content-Type via image/* wildcard sentinel; adds
  the grep-stable diagnostic log line at extract time. Independently
  fixes MNG-357.
- Plan 2 (runtime-gadget-image-delivery): makes the runtime
  cascade-tools pm read-work-item gadget actually download + write
  images to disk with file paths returned in text. Closes the mid-run
  pickup gap. Depends on Plan 1's shared download-and-prepare helper.
- Plan 3 (linear-fixture-and-extraction-coverage): captures a Linear
  GraphQL Issue payload fixture for an issue with a pasted screenshot;
  pins extraction with a regression test that fails loudly if Linear
  ever changes the payload shape. Mostly tests + docs.

9 ACs, 0 manual-only. CLAUDE.md not updated (already covered by spec 015's
silent-failure → diagnostic-line pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 016/1 boot-path-mime-fix-and-diagnostic-log

* feat(pm): plan 016/1 done — boot-path image MIME fix + diagnostic log

Closes the silent screenshot-drop bug class verified live on 2026-04-26
(ucho/MNG-357). Linear's extension-less pasted-image URLs
(uploads.linear.app/<uuid>) now survive the pre-download MIME filter via
an image/* wildcard sentinel. The download response's Content-Type
header is the authoritative MIME — wildcard is resolved before bytes
are written.

What landed:

- src/pm/media.ts — new IMAGE_HOST_ALLOWLIST (currently 'uploads.linear.app');
  mimeTypeFromUrl returns 'image/*' for extension-less URLs from allowlisted
  hosts; isImageMimeType accepts the wildcard.
- src/pm/download-and-prepare.ts (new) — shared helper for the per-provider
  download dispatch loop (jira/linear/trello). Returns { images, failures }.
  Spec 016/2's runtime gadget will import this.
- src/agents/definitions/contextSteps.ts — fetchWorkItemStep refactored to
  use the shared helper; emits the new grep-stable diagnostic line
  '[image-pipeline] work-item-fetch summary' with stable fields:
  { provider, workItemId, urlsDetected, urlsAfterFilter, urlsDownloaded,
    urlsFailed, urlsByMimeType }.

Tests:
- 6 new unit tests in tests/unit/pm/media.test.ts (wildcard sentinel,
  Linear extension-less, regression for extensioned + non-PM URLs)
- 7 new unit tests in tests/unit/pm/download-and-prepare.test.ts
- 3 new diagnostic-log tests in contextSteps.test.ts; existing log
  message expectations updated to the new helper-prefix
- 3 module-integration tests in tests/integration/pm/image-pipeline.test.ts
  pinning the MNG-357 reproduction end-to-end with real
  mimeTypeFromUrl + filterImageMedia + extractMarkdownImages

PR #948's Claude-Code initial-input ImageBlockParam path is unchanged;
existing regression test (claude-code.test.ts:939
'logs image injection and strips images before buildTaskPrompt') confirms.

Docs:
- CHANGELOG.md entry under Unreleased.
- src/integrations/README.md gains a new 'Image delivery contract' section
  documenting the shared resolution path, allowlist semantics, diagnostic
  log line schema, and the rule that providers shouldn't write their own
  MIME-detection.

Full unit suite: 8521 passed / 23 skipped / 0 failed. Lint + typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 016/2 runtime-gadget-image-delivery

* feat(pm): plan 016/2 done — runtime gadget delivers images on disk

Closes the mid-run image pickup gap from spec 016. The runtime gadget
`cascade-tools pm read-work-item` now downloads any image media and
writes it to .cascade/context/images/work-item-<id>-img-<index>.<ext>,
returning text whose new "Local Image Files" section lists actual file
paths the agent's file-read tool can consume.

What landed:

- src/gadgets/pm/core/writeRuntimeImages.ts (new) — writes ContextImage
  arrays to .cascade/context/images/ with stable naming convention
  (work-item-<id>-img-<i>.<ext>); extension derived from resolved MIME;
  falls back to .bin + warn log for unresolved image/* sentinel.
- src/gadgets/pm/core/readWorkItem.ts — readWorkItem now calls
  downloadAndPrepareImages (Plan 1's helper) + writeRuntimeImages
  (this plan), then mutates the returned text to include the local
  file paths via formatRuntimeImagePaths. Same diagnostic log line
  '[image-pipeline] work-item-fetch summary' as the boot path.
  Failed downloads surface in a "Failed Image Downloads" subsection.

Tests:
- 8 new unit tests in tests/unit/gadgets/pm/core/writeRuntimeImages.test.ts
- 5 new unit tests in tests/unit/gadgets/pm/core/readWorkItem.test.ts
  (spec 016/2 sub-describe)
- 4 new module-integration tests in
  tests/integration/gadgets/runtime-image-delivery.test.ts pinning
  the mid-run pickup contract end-to-end.

CHANGELOG.md entry added.

Full unit suite (single-fork): 8534 passed / 23 skipped / 0 failed.
Lint + typecheck clean. Three PM manifest test suites occasionally
time out under parallel load on this machine — verified to pass in
isolation; not a code regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(plan): lock 016/3 linear-fixture-and-extraction-coverage

* test(pm): plan 016/3 done — Linear fixture + extraction-coverage regression

Closes spec 016 with the regression net for the contract Plans 1+2
established. If Linear ever changes its Issue payload shape in a way
that loses inline images, the extraction-coverage test fails loudly
with a specific URL-missing message.

What landed:

- tests/fixtures/linear-issue-with-screenshot.json (new) — reconstructed
  Linear GraphQL Issue payload covering: extension-less uploads.linear.app
  URL in description, extensioned Linear URL with alt text, external URL
  with image/svg+xml MIME, non-image markdown link (must NOT be picked
  up), one comment with a pasted screenshot, one comment without, and
  three formal Attachment records (Slack/GitHub/Sentry link previews).
- tests/unit/pm/linear/extraction-coverage.test.ts (new) — 9 tests:
  description coverage with explicit expected-URL list, image/* sentinel
  for extension-less, concrete MIME for extensioned, image/svg+xml for
  external SVG, non-image link exclusion, comment coverage, comment
  source field, attachment-NOT-leaked rule, meta-test of regression net.
- src/integrations/README.md — new "Linear: GraphQL surface for inline
  images" subsection documenting the conclusion: Issue.description
  markdown is canonical for inline-pasted screenshots; Issue.attachments
  is for formal Attachment records (link previews) and is the wrong
  surface for inline images. Links to the fixture and the test.

No production code change — Plan 1's mimeTypeFromUrl + extractMarkdownImages
already cover the cases. This plan ships the regression armor.

CHANGELOG.md entry added.

Lint + typecheck clean. 9/9 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): 016 done — pm-image-delivery-reliability, all plans complete

Closes the silent screenshot-drop bug class verified live on 2026-04-26
(ucho/MNG-357). Plan 1 added the Linear-extension-less MIME wildcard
sentinel + diagnostic log line; plan 2 made the runtime
cascade-tools pm read-work-item gadget actually deliver images on
disk; plan 3 captured a Linear GraphQL fixture and pinned extraction
coverage with a regression test.

CLAUDE.md untouched by this spec — already covered by spec 015's
broader silent-failure → diagnostic-line pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address code review concerns

* test(image-pipeline): supply urlsDetected in readWorkItemWithMedia mocks

The diagnostic-line assertion expected urlsDetected on the log payload, but
the mocked readWorkItemWithMedia return values omitted it, so the field
arrived as undefined and the toHaveBeenCalledWith match failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>

* fix(linear): drop comment-mention planning-state gate that prod payload never satisfies (#1210)

PR #1201 added a `currentStateId !== planningStateId` gate to the Linear
comment @mention trigger that read `data.issue.stateId` from the webhook
payload. Linear's Comment webhook does not ship `stateId` on the nested
issue (verified across four prod payloads on 2026-04-26 — 8cd0108a,
b93e4925, 6548cd14, 3d95b210). The gate therefore always evaluated to
true and silently dropped every legitimate bot @mention, including the
one on MNG-346 that motivated this fix.

The agent (respond-to-planning-comment) is now responsible for any
planning-only behavior; the trigger no longer gates on state and avoids
an extra Linear GraphQL round-trip per comment.

Also corrects `LinearWebhookCommentTriggerData.issue` to match what
Linear actually ships (six keys, no `stateId`, optional `team`) — the
old type lied and PR #1201 trusted it.

Tests pin a real prod-shape Comment payload as a regression. JIRA's
equivalent gate is unaffected (its `comment_created` payload does ship
`issue.fields.status.name`).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: aaight <aaight42@gmail.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants