Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
948f130
feat(alerting): add configurable PM results list for alerting agent +…
Apr 27, 2026
738cb81
fix(alerting): map PM provider to correct discovery capability in con…
Apr 27, 2026
6b5417f
fix(triggers): persist workItemId on respond-to-review and respond-to…
zbigniewsobiecki Apr 29, 2026
0354857
Merge pull request #1220 from mongrel-intelligence/fix/respond-to-rev…
zbigniewsobiecki Apr 29, 2026
e91a834
fix(alerting): fix PM container picker to fetch correct sub-container…
Apr 29, 2026
6abf45e
docs(spec): 017 router-silent-failure-hardening + plans
zbigniewsobiecki Apr 29, 2026
26717c3
chore(plan): 017/3 lock
zbigniewsobiecki Apr 29, 2026
ddb90ea
test: add coverage for buildExecutionPlan in secretOrchestrator
Apr 29, 2026
8a72616
fix(triggers): suppress redundant progress-comment DELETE after gadge…
zbigniewsobiecki Apr 29, 2026
c3f29d9
Merge pull request #1222 from mongrel-intelligence/fix/progress-comme…
zbigniewsobiecki Apr 29, 2026
b2ba6a3
chore(plan): 017/1 lock
zbigniewsobiecki Apr 29, 2026
cfb0afa
fix(router): consolidate PM-ack dispatch via manifest registry, resto…
zbigniewsobiecki Apr 29, 2026
71ac7c2
Merge pull request #1223 from mongrel-intelligence/fix/pm-ack-coverage
zbigniewsobiecki Apr 29, 2026
2638a37
chore(plan): 017/2 lock
zbigniewsobiecki Apr 29, 2026
532dda5
fix(router): wrap PM-source dispatch in PM-provider scope so capacity…
zbigniewsobiecki Apr 29, 2026
b03fb8e
chore(spec): 017 done — all three plans complete (router-side silent-…
zbigniewsobiecki Apr 29, 2026
515b30f
Merge pull request #1219 from mongrel-intelligence/feature/alerting-r…
zbigniewsobiecki Apr 29, 2026
a080cd1
fix(integration-tests): wrap implementation-trigger handle() in withP…
zbigniewsobiecki Apr 29, 2026
80421d4
Merge pull request #1224 from mongrel-intelligence/fix/capacity-gate-…
zbigniewsobiecki Apr 29, 2026
8e68ff9
fix(router): replace in-memory PM coalesce window with BullMQ delayed…
Apr 29, 2026
73e7e91
ci: add Redis service to integration-tests job
Apr 29, 2026
8f9e187
fix(router): address review feedback on PM coalesce deferred ack
Apr 29, 2026
6704274
chore(worker): remove now-unused biome complexity suppression
Apr 29, 2026
22cde8f
fix(router): address lock-leak and active-job-status issues in coales…
Apr 29, 2026
182b472
test(coverage): cover deferred-ack worker path + coalesce-config
Apr 29, 2026
a08b537
fix(router): capture coalesce-schedule failures to Sentry + rename ac…
zbigniewsobiecki Apr 29, 2026
8f9da02
Merge pull request #1226 from mongrel-intelligence/fix/pm-coalesce-bu…
zbigniewsobiecki Apr 29, 2026
dfd5271
fix(router): sanitize Docker-invalid chars in jobId when building wor…
zbigniewsobiecki Apr 29, 2026
47478b2
Merge pull request #1228 from mongrel-intelligence/fix/coalesce-conta…
zbigniewsobiecki Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,16 @@ jobs:
--health-timeout 5s
--health-retries 10

redis:
image: redis:7-alpine
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 2s
--health-timeout 5s
--health-retries 10

steps:
- uses: actions/checkout@v4

Expand All @@ -111,6 +121,7 @@ jobs:
run: npm run test:integration
env:
TEST_DATABASE_URL: postgresql://cascade_test:cascade_test@localhost:5433/cascade_test
REDIS_URL: redis://localhost:6379

docker-build-check:
name: Validate Docker builds
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ All notable user-visible changes to CASCADE are documented here. The format is l

### Changed

- **Pipeline-capacity gate now enforces `maxInFlightItems` for PM `status-changed` triggers** (spec 017, plan 2 of 3). The gate at `src/triggers/shared/pipeline-capacity-gate.ts` is the hard cap on the active pipeline (TODO + IN_PROGRESS + IN_REVIEW work items) introduced after a prior incident where a human moved three cards into TODO simultaneously and three concurrent implementation runs fired against a project pinned to `maxInFlightItems: 1`. The gate calls `getPMProvider()` to count in-flight items, but for every PM `status-changed` trigger the call threw `No PMProvider in scope` because the three PM router adapters (`src/router/adapters/{linear,trello,jira}.ts`) wrapped trigger dispatch in their per-PM-type credential `AsyncLocalStorage` scope but NOT in PM-provider scope (the GitHub adapter at `src/router/adapters/github.ts:280` already had both wrappings). The gate fell through to its conservative branch (`WARN: pipeline-capacity-gate: PM provider unavailable, allowing run` and `return false`) — silently no-op for the only triggers that actually need it. 32 occurrences/day on cascade-router (verified 2026-04-29). The fix introduces a shared helper `withPMScopeForDispatch(project, dispatch)` at `src/router/adapters/_shared.ts` that the three PM router adapters consume, mirroring the GitHub adapter's correct shape. The gate's "PM provider unavailable" branch is converted from `WARN + return false` (allow) to ERROR-level + Sentry capture under stable tag `pipeline_capacity_gate_no_pm_provider` + `return true` (block) — once the routine path establishes scope, hitting that branch is a real `AsyncLocalStorage` scope leak operators need to investigate. A static-guard test at `tests/unit/integrations/pm-router-adapter-pm-scope.test.ts` enforces the wrapping invariant per adapter; CLAUDE.md gains a "Capacity-gate invariant" passage in the Architecture section. See [spec 017](docs/specs/017-router-silent-failure-hardening.md).
- **PM-ack dispatch consolidation: Linear-based PM-focused agents now post their PM-side ack comment** (spec 017, plan 1 of 3). PM-focused agents (e.g. `backlog-manager`) triggered from a GitHub webhook used to silently skip their PM-side ack on Linear projects: the router-adapter's local `postPMAck` helper had `if (pmType === 'trello')` / `if (pmType === 'jira')` branches but no Linear branch, so Linear-based projects fell through to a `WARN: Unknown PM type for PM-focused agent ack, skipping` and never saw the "🔧 On it" comment that Trello/JIRA projects got (24 silent skips per day on cascade-router, all from `ucho`, verified 2026-04-29). A near-identical helper at `src/triggers/shared/pm-ack.ts` already had the Linear branch — pure parallel-path drift. The fix introduces a single consolidated helper `dispatchPMAck` at `src/router/pm-ack-dispatch.ts` that indexes the manifest registry directly and invokes `manifest.platformClientFactory(projectId).postComment(...)` — no per-PM-type literal branching anywhere on the dispatch surface. Both legacy call sites delegate. The PM manifest conformance harness gains a per-provider `dispatchPMAck reaches this provider without throwing` assertion, and a static-guard test pins "no `pmType === '<literal>'` branching" against all three call sites; adding a future PM provider to the registry lands the dispatch path for free. Genuinely-unknown PM types (configuration error: project pinned to a deleted provider) now log at ERROR + capture to Sentry under stable tag `pm_ack_unknown_pm_type` instead of a silent WARN. See [spec 017](docs/specs/017-router-silent-failure-hardening.md).
- **Progress-comment lifecycle: post-agent cleanup hook now skips when an in-run gadget already deleted the comment** (spec 017, plan 3 of 3). The post-agent `deleteProgressCommentOnSuccess` hook used to read `sessionState.initialCommentId`, fall back to `result.agentInput.ackCommentId` when session state was empty, and issue a redundant DELETE — but "session state cleared by a gadget" was indistinguishable from "session state never populated", so the fallback fired and re-deleted comments that were already gone. GitHub returned 404 and `WARN: Failed to delete progress comment after agent success` was logged 72 times per day on cascade-router (live audit on 2026-04-29). Adds an explicit `initialCommentIdConsumed: boolean` flag on `SessionStateData`. Both `deleteInitialComment` (gadget-driven) and `clearInitialComment` (sidecar-driven) now set the flag to `true` after disposing of the comment. The post-agent hook checks the flag first and skips the entire deletion path — including the legacy `agentInput.ackCommentId` fallback — when consumed. As defense in depth, `githubClient.deletePRComment` now treats HTTP 404 as success (RFC-7231 idempotency) and logs at DEBUG instead of letting the error bubble as a WARN; other HTTP errors (5xx, 401, network) continue to throw. The legacy fallback to `agentInput.ackCommentId` continues to work for code paths that never populate session state. See [spec 017](docs/specs/017-router-silent-failure-hardening.md).
- **PM image delivery: Linear GraphQL fixture + extraction-coverage regression test** (spec 016, plan 3 of 3). Captures a reconstructed Linear `Issue` GraphQL payload at `tests/fixtures/linear-issue-with-screenshot.json` containing extension-less and extensioned inline-pasted images (description + comment bodies) plus formal Attachment records (Slack/GitHub/Sentry link previews) that must NOT be mistaken for inline images. The unit test at `tests/unit/pm/linear/extraction-coverage.test.ts` pins the contract and fails loudly with a specific URL-missing message if Linear ever changes its payload shape in a way that loses inline images. Documents the conclusion in `src/integrations/README.md`: `Issue.description` markdown is canonical for Linear inline images; `Issue.attachments` is the wrong surface (formal Attachment records, not pastes). No production code change — this plan ships the regression net for the contract Plans 1+2 established. See [spec 016](docs/specs/016-pm-image-delivery-reliability.md).
- **PM image delivery: runtime `cascade-tools pm read-work-item` gadget now delivers images on disk** (spec 016, plan 2 of 3). The runtime gadget that agents call mid-run for a work item used to return text only — its "Pre-fetched Images" section listed URL refs but no local file paths, so an agent that needed to re-read a work item (e.g. after a teammate added a screenshot) had no way to actually see the new image. After this plan, the gadget downloads any image media present and writes it to `.cascade/context/images/work-item-<id>-img-<index>.<ext>` (extension derived from the resolved Content-Type MIME), then returns text whose new "Local Image Files" section lists actual file paths the agent's file-read tool can consume. Failed downloads are surfaced in a "Failed Image Downloads" subsection so they're never silently dropped. Same diagnostic log line as the boot path (`[image-pipeline] work-item-fetch summary`) — operators see consistent shape across boot and runtime fetches. Closes the mid-run pickup gap. See [spec 016](docs/specs/016-pm-image-delivery-reliability.md).
- **PM image delivery: extension-less Linear pasted-image URLs are no longer dropped at the pre-download MIME filter** (spec 016, plan 1 of 3). Linear's `https://uploads.linear.app/<uuid>` URLs (with no file extension in the pathname) used to fall through `mimeTypeFromUrl` to `application/octet-stream` and were silently filtered out by `filterImageMedia` before the download loop ran. The fix introduces an `image/*` wildcard sentinel for trusted PM-provider upload hosts (allowlisted by hostname); `isImageMimeType` now accepts the wildcard, and the download response's `Content-Type` header resolves it to a concrete MIME (`image/png`, etc.) before any image is written. The shared `downloadAndPrepareImages` helper consolidates the per-provider download dispatch (jira/linear/trello) so both the boot-path and the runtime gadget (spec 016 plan 2) share one code path. Adds AC#5's grep-stable diagnostic line — `[image-pipeline] work-item-fetch summary` — emitted once per work-item-fetch with stable fields (`provider`, `workItemId`, `urlsDetected`, `urlsAfterFilter`, `urlsDownloaded`, `urlsFailed`, `urlsByMimeType`). Closes the silent screenshot-drop bug class verified live on 2026-04-26 (ucho/MNG-357). See [spec 016](docs/specs/016-pm-image-delivery-reliability.md).
Expand Down
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Three separate services, **no monolithic server mode**:

Flow: `PM/SCM/alerting webhook → Router → Redis → Worker → TriggerRegistry → Agent → Code → PR`.

**Capacity-gate invariant.** Every PM router adapter (`src/router/adapters/{linear,trello,jira}.ts`) must wrap `triggerRegistry.dispatch(ctx)` in PM-provider `AsyncLocalStorage` scope via the shared `withPMScopeForDispatch(fullProject, dispatch)` helper at `src/router/adapters/_shared.ts` — in addition to the per-PM-type credential scope (`withLinearCredentials` / `withTrelloCredentials` / `withJiraCredentials`). Without the PM-provider wrapping, the pipeline-capacity gate at `src/triggers/shared/pipeline-capacity-gate.ts` cannot resolve `getPMProvider()`, **fails closed** under the spec-017 fail-closed policy (blocks the run + ERROR + Sentry capture under tag `pipeline_capacity_gate_no_pm_provider`), and `maxInFlightItems` is silently disabled for the PM-source path. Mirror the GitHub adapter's existing correct shape at `src/router/adapters/github.ts:dispatchWithCredentials`. The static guard at `tests/unit/integrations/pm-router-adapter-pm-scope.test.ts` enforces this at CI time — adding a new PM router adapter without the wrapping fails CI with a precise file path.

Integration abstraction lives in `src/integrations/`. For **adding a new PM provider**, see @src/integrations/README.md — PM providers (Trello, JIRA, Linear) use the `PMProviderManifest` registry with a **behavioral conformance harness** (spec 009 — config round-trip, discovery shape, full lifecycle scenario, auth-header provenance, single-entrypoint invariant). Each provider owns its Zod config schema (`src/integrations/pm/<provider>/config-schema.ts`) as the single source of truth — the central `src/config/schema.ts` imports it. PM adapter method signatures use branded `StateId` / `LabelId` / `ContainerId` from `src/pm/ids.ts` to make state-name-vs-ID confusion a compile error at direct-adapter call sites. All runtime surfaces (router, worker, CLI, dashboard) register integrations through a single entrypoint at `src/integrations/entrypoint.ts`. **Spec 010 follow-ups** added generic `pm.discovery.createLabel` / `createCustomField` mutation endpoints + `currentUser` discovery capability + real shared React components for every `StandardStepKind` under `web/src/components/projects/pm-providers/steps/`. **Spec 011** migrated all three production providers (Trello, JIRA, Linear) onto those shared components, added a 7th `StandardStepKind: custom-field-mapping`, widened `container-pick` / `project-scope` / `webhook-url-display` with optional props, and deleted the three legacy `pm-wizard-{trello,jira,linear}-steps.tsx` files. **Spec 012** migrated each provider's webhook UX (programmatic create for Trello/JIRA, signing-secret + instructions for Linear) into per-provider manifest webhook adapters (Fragment compositions around the shared `WebhookUrlDisplayStep`); deleted the legacy `WebhookStep` + `LinearWebhookInfoPanel` + `useWebhookManagement` + `useLinearWebhookInfo`. Every PM wizard step now renders via the manifest path without exception. A new PM provider writes zero edits to shared orchestration (`pm-wizard.tsx`, `pm-wizard-common-steps.tsx`, `pm-wizard-hooks.ts`); provider-specific UI ships either as `kind: 'custom'` steps or as Fragment compositions inside the provider folder's wizard adapters. SCM (GitHub) and alerting (Sentry) still use the legacy `IntegrationModule` pattern via self-registration in `src/github/register.ts` + `src/sentry/register.ts`. Don't improvise; the README covers both patterns.

## PR checkout (worker) — gotcha
Expand Down Expand Up @@ -173,7 +175,7 @@ Optional:
- `CREDENTIAL_MASTER_KEY` — 64-char hex (AES-256 key) to encrypt project credentials at rest. Without it, credentials are stored as plaintext; both modes coexist.
- `GITHUB_WEBHOOK_SECRET` — opt-in HMAC verification; store as the `webhook_secret` role on the GitHub SCM integration.
- `SENTRY_DSN`, `SENTRY_ENVIRONMENT`, `SENTRY_RELEASE`, `SENTRY_TRACES_SAMPLE_RATE` — observability.
- `PM_CREATE_COALESCE_WINDOW_MS` — window (ms) the router waits after a PM `pm:status-changed` create trigger before enqueuing, so a follow-up `update` (same `${projectId}:${workItemId}`) can supersede it. Defaults to `2000`; `0` disables. Fixes JIRA's double-fire when an issue is created in a non-default workflow column (JIRA emits `issue_created` at the initial status, then `issue_updated` transitioning to the target).
- `PM_COALESCE_WINDOW_MS` — settle window (ms) for BullMQ delayed-job coalescing on `pm:status-changed` events. Any dispatch for the same `${projectId}:${workItemId}` within the window supersedes the prior pending dispatch, across agent types. Ack comment is deferred to job fire time to avoid orphaned comments on supersede. Defaults to `10000` (10 s); `0` disables. Fixes JIRA's double-fire when an issue is created in a non-default workflow column. The legacy name `PM_CREATE_COALESCE_WINDOW_MS` is still accepted as a fallback.

**Project credentials (GitHub tokens, Trello/JIRA/Linear keys, LLM API keys) live in the `project_credentials` table.** The DB is the **sole source of truth** — there is no env var fallback for project-scoped secrets.

Expand Down
Loading
Loading