From 949e0dfa20a2ac3dccc75483e31d620a590adc34 Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 06:47:29 +0000 Subject: [PATCH 1/7] docs(007): spec + plans for robust review dispatch Spec 007 addresses the silent review drop observed on MNG-122/PR-572: the work-item lock's total-concurrency cap (MAX_WORK_ITEM_CONCURRENCY=2) blocked review dispatch while other agents were enqueued for the same work item. Three intersecting fixes specified: per-agent-type locking, lock release timing, and a post-completion review hook. Two plans: - 007/1 (lock-infra): remove MAX_WORK_ITEM_CONCURRENCY total cap, keep per-type MAX_SAME_TYPE_PER_WORK_ITEM=1, enrich lock-skip log. - 007/2 (post-completion-review): deterministic review dispatch from the implementation pipeline after success + green CI. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../1-lock-infra.md | 149 +++++++++++++++ .../2-post-completion-review.md | 178 ++++++++++++++++++ .../007-robust-review-dispatch/_coverage.md | 28 +++ docs/specs/007-robust-review-dispatch.md | 120 ++++++++++++ 4 files changed, 475 insertions(+) create mode 100644 docs/plans/007-robust-review-dispatch/1-lock-infra.md create mode 100644 docs/plans/007-robust-review-dispatch/2-post-completion-review.md create mode 100644 docs/plans/007-robust-review-dispatch/_coverage.md create mode 100644 docs/specs/007-robust-review-dispatch.md diff --git a/docs/plans/007-robust-review-dispatch/1-lock-infra.md b/docs/plans/007-robust-review-dispatch/1-lock-infra.md new file mode 100644 index 00000000..c7b6becc --- /dev/null +++ b/docs/plans/007-robust-review-dispatch/1-lock-infra.md @@ -0,0 +1,149 @@ +--- +id: 007 +slug: robust-review-dispatch +plan: 1 +plan_slug: lock-infra +level: plan +parent_spec: docs/specs/007-robust-review-dispatch.md +depends_on: [] +status: pending +--- + +# 007/1: Lock infrastructure — remove false cross-type serialization + +> Part 1 of 2 in the 007-robust-review-dispatch plan. See [parent spec](../../specs/007-robust-review-dispatch.md). + +## Summary + +The work-item lock in `src/router/work-item-lock.ts` enforces two thresholds: `MAX_SAME_TYPE_PER_WORK_ITEM = 1` (per-agent-type) and `MAX_WORK_ITEM_CONCURRENCY = 2` (total across all types). The total cap is what blocked the review for MNG-122: two agents (implementation + one other) were enqueued for MNG-122, hitting the total cap of 2, so the review couldn't dispatch. + +This plan removes the `MAX_WORK_ITEM_CONCURRENCY` total cap from `isWorkItemLocked`, retaining only the per-type cap. Different agent types can then run concurrently on the same work item. The per-type cap (`MAX_SAME_TYPE_PER_WORK_ITEM = 1`) still prevents duplicate same-type runs — the invariant the lock was introduced to protect. + +This plan also enriches the lock-skip log with the lock-holder's agent type so operators can diagnose blocked dispatches. + +**Components delivered:** +- `src/router/work-item-lock.ts` — remove total-concurrency checks from `isWorkItemLocked` and `getInMemoryCounts`; delete `MAX_WORK_ITEM_CONCURRENCY` constant +- `src/router/webhook-processor.ts` — enrich lock-skip log with lock-holder agent type + trigger handler name +- `tests/unit/router/work-item-lock.test.ts` — update existing tests + add cross-type concurrency tests +- `tests/unit/router/webhook-processor.test.ts` — update lock-skip log assertion + +**Deferred to plan 2:** +- Post-completion hook that fires review deterministically after implementation success (the 30-second guarantee) +- Dedup coordination between the hook and the `check-suite-success` trigger + +--- + +## Spec ACs satisfied by this plan + +- **Spec AC #2** (review can dispatch during container cleanup) — **full**: removing the total cap means a review is never blocked by an implementation's in-memory entry. +- **Spec AC #3** (different agent types can run concurrently) — **full**: the total cap was the only thing preventing this. +- **Spec AC #4** (same agent type still deduplicated) — **full**: `MAX_SAME_TYPE_PER_WORK_ITEM = 1` is unchanged. +- **Spec AC #6** (structured lock-skip log) — **full**: log enriched with lock-holder agent type + trigger handler name. +- **Spec AC #7** (works identically for all PM providers) — **full**: no per-provider branching in the lock logic (was already true; this plan doesn't introduce any). +- **Spec AC #8** (router restart doesn't leave permanent stale locks) — **full**: 30-min TTL unchanged; removing the total cap doesn't affect TTL behaviour. +- **Spec AC #1** (review within 30s) — **partial**: this plan unblocks cross-type dispatch so `check-suite-success` webhooks that arrive during container cleanup are no longer blocked. But the 30s deterministic guarantee requires plan 2's post-completion hook. +- **Spec AC #5** (no double-enqueue with hook) — **deferred to plan 2**. + +--- + +## Depends On + +- None — this is the first plan. + +--- + +## Detailed Task List (TDD) + +### 1. Remove total-concurrency cap from `isWorkItemLocked` + +**Tests first** (`tests/unit/router/work-item-lock.test.ts`): + +- `allows a different agent type to enqueue when another type is already enqueued` — `markWorkItemEnqueued('p1', 'wi1', 'implementation')` then `isWorkItemLocked('p1', 'wi1', 'review')`: expect `{ locked: false }`. +- `still blocks same agent type from double-enqueuing` — `markWorkItemEnqueued('p1', 'wi1', 'review')` then `isWorkItemLocked('p1', 'wi1', 'review')`: expect `{ locked: true, reason: /same-type/ }`. +- `allows 3+ different agent types concurrently` — enqueue `implementation`, `review`, `respond-to-ci` for the same work item, then `isWorkItemLocked('p1', 'wi1', 'respond-to-review')`: expect `{ locked: false }`. +- Update existing tests that assert `MAX_WORK_ITEM_CONCURRENCY` behaviour — they should now pass without the total cap. +- DB-layer same-type check still works: mock `countActiveRuns` to return 1 for same-type, 0 otherwise; expect locked. + +**Implementation** (`src/router/work-item-lock.ts`): + +- Delete `MAX_WORK_ITEM_CONCURRENCY` constant (line 16). +- In `getInMemoryCounts(projectId, workItemId, agentType)`: remove the `total` computation (iterating all keys). Return only `sameType`. +- In `isWorkItemLocked`: remove lines 94-99 (in-memory total check) and lines 118-124 (DB total check). Remove the `dbTotal` query at line 104 (saves one DB round-trip). Keep lines 88-93 (in-memory same-type) and lines 108-115 (effective same-type with DB). +- Export type change: `getInMemoryCounts` returns `{ sameType: number }` instead of `{ total: number; sameType: number }`. + +### 2. Enrich lock-skip log + +**Tests first** (`tests/unit/router/webhook-processor.test.ts` — or nearest existing test): + +- `lock-skip log includes trigger handler name and lock reason` — mock `isWorkItemLocked` to return `{ locked: true, reason: 'same-type: 1 running' }`. Assert `logger.info` called with a message containing `projectId`, `workItemId`, `agentType`, `reason`, and `triggerHandler`. + +**Implementation** (`src/router/webhook-processor.ts:154`): + +- Add `triggerHandler: result.triggerHandler ?? 'unknown'` to the log object. (`result` is the `TriggerResult` — it already carries the handler name from the trigger match; verify the field name and thread it through if needed.) +- Change log level: keep at INFO (already is — spec calls for INFO with structured context). + +### 3. Update existing tests + +**Tests** (`tests/unit/router/work-item-lock.test.ts`): + +- Remove or update any test that asserts the old total-concurrency behaviour (e.g., "returns locked when total exceeds MAX_WORK_ITEM_CONCURRENCY"). These tests should now assert the opposite: "allows cross-type enqueue even when multiple types are enqueued." +- Verify no other test files reference `MAX_WORK_ITEM_CONCURRENCY` — grep for it and update any assertions. + +--- + +## Test Plan + +### Unit tests +- [ ] `tests/unit/router/work-item-lock.test.ts`: ~5 new + ~3 updated tests covering cross-type concurrency, same-type dedup, and DB fallback. +- [ ] `tests/unit/router/webhook-processor.test.ts`: ~1 updated test for enriched lock-skip log. + +### Acceptance tests +- [ ] Cross-type dispatch: implementation enqueued → review `isWorkItemLocked` returns false. +- [ ] Same-type dedup: implementation enqueued → implementation `isWorkItemLocked` returns true. +- [ ] Lock-skip log: structured context includes all required fields. + +--- + +## Acceptance Criteria (per-plan, testable) + +1. `isWorkItemLocked('p', 'wi', 'review')` returns `{ locked: false }` when only an `implementation` agent is enqueued for `('p', 'wi')`. +2. `isWorkItemLocked('p', 'wi', 'review')` returns `{ locked: true }` when a `review` agent is already enqueued for `('p', 'wi')`. +3. `MAX_WORK_ITEM_CONCURRENCY` constant no longer exists in the codebase. +4. Lock-skip log message includes: `projectId`, `workItemId`, `agentType`, `reason`, and `triggerHandler` (or equivalent structured context). +5. All new/modified code has corresponding tests. +6. `npm test` passes. +7. `npm run typecheck` passes. +8. `npm run lint` passes. +9. `CLAUDE.md` updated to document per-agent-type locking semantics under the "Agent triggers" section. + +--- + +## Documentation Impact (this plan only) + +| File | Change | +|---|---| +| `CLAUDE.md` | Add a note under "Agent triggers" that the work-item lock is per-agent-type: different agent types can run concurrently on the same work item; only same-type duplicate runs are prevented. | + +--- + +## Out of Scope (this plan) + +- Post-completion review dispatch hook (plan 2). +- Dedup coordination between the hook and `check-suite-success` (plan 2). +- Enabling `pr-opened` for review on any project (per-project config decision, out of spec scope). +- Lock persistence to Redis/DB (out of spec scope). + +--- + +## Progress + + +- [ ] AC #1 +- [ ] AC #2 +- [ ] AC #3 +- [ ] AC #4 +- [ ] AC #5 +- [ ] AC #6 +- [ ] AC #7 +- [ ] AC #8 +- [ ] AC #9 diff --git a/docs/plans/007-robust-review-dispatch/2-post-completion-review.md b/docs/plans/007-robust-review-dispatch/2-post-completion-review.md new file mode 100644 index 00000000..921353b2 --- /dev/null +++ b/docs/plans/007-robust-review-dispatch/2-post-completion-review.md @@ -0,0 +1,178 @@ +--- +id: 007 +slug: robust-review-dispatch +plan: 2 +plan_slug: post-completion-review +level: plan +parent_spec: docs/specs/007-robust-review-dispatch.md +depends_on: [1-lock-infra.md] +status: pending +--- + +# 007/2: Post-completion review dispatch hook + +> Part 2 of 2 in the 007-robust-review-dispatch plan. See [parent spec](../../specs/007-robust-review-dispatch.md). + +## Summary + +Plan 1 removed the false cross-type serialization so `check-suite-success` webhooks can dispatch the review while the implementation container is still shutting down. But the review is still at the mercy of webhook timing: if all CI-completion webhooks arrive and are processed during the narrow window before the lock is released, there's no second chance. + +This plan adds a **deterministic** review dispatch: when the implementation agent completes successfully with a PR URL, the agent execution pipeline itself (running inside the worker container, before exit) checks whether the PR has green CI and no review already dispatched, and if so, enqueues a review job directly via BullMQ. This fires before the container exits, guaranteeing the review dispatch within seconds of implementation completion. + +The hook reuses the existing `claimReviewDispatch` dedup mechanism so a subsequent `check-suite-success` webhook doesn't double-enqueue a second review. + +**Components delivered:** +- `src/triggers/shared/agent-execution.ts` — post-completion review-dispatch logic inside `runAgentExecutionPipeline`, after implementation success +- Helper to enqueue a review BullMQ job from within the worker container (new small module or inline in agent-execution) +- Dedup via `claimReviewDispatch` / `buildReviewDispatchKey` (reuse from `src/triggers/github/review-dispatch-dedup.ts`) +- CI-status check via `githubClient.getCheckSuiteStatus` (reuse from `src/triggers/github/pr-ready-to-merge.ts`) +- Tests for the new hook + +**Deferred (out of spec scope):** +- Generalizing the hook to other trigger chains (splitting → planning, etc.) + +--- + +## Spec ACs satisfied by this plan + +- **Spec AC #1** (review within 30s after implementation + green CI) — **full**: the hook fires from the execution pipeline before container exit, deterministically. +- **Spec AC #5** (no double-enqueue with hook + webhook) — **full**: `claimReviewDispatch` dedup prevents the `check-suite-success` webhook from enqueuing a second review after the hook already fired. +- **Spec AC #1** (partial from plan 1 → now **complete**): the 30s guarantee is satisfied by the hook, not by the lock fix alone. + +--- + +## Depends On + +- **Plan 1 (lock-infra)** — provides per-type locking so the review job enqueued by the hook is not blocked by the implementation's lock. + +--- + +## Detailed Task List (TDD) + +### 1. Post-completion review check + enqueue + +**Tests first** (`tests/unit/triggers/shared/agent-execution.test.ts` — extend the existing `runAgentExecutionPipeline` test suite): + +- `fires review dispatch after successful implementation with prUrl and green CI` — mock `agentType = 'implementation'`, `agentResult.success = true`, `agentResult.prUrl = 'https://...'`, mock `githubClient.getCheckSuiteStatus` to return `{ allPassing: true }`, mock `claimReviewDispatch` to return `true` (not yet dispatched). Assert: a review BullMQ job is enqueued with the correct `TriggerResult` shape. +- `does NOT fire review dispatch when agentType is not implementation` — `agentType = 'review'`, same success result. Assert: no review job enqueued. +- `does NOT fire review dispatch when implementation failed` — `agentResult.success = false`. Assert: no review job enqueued. +- `does NOT fire review dispatch when implementation has no prUrl` — `agentResult.prUrl = undefined`. Assert: no review job enqueued. +- `does NOT fire review dispatch when CI is not all green` — `getCheckSuiteStatus` returns `{ allPassing: false }`. Assert: no review job enqueued. +- `does NOT fire review dispatch when claimReviewDispatch returns false (already dispatched)` — mock `claimReviewDispatch` to return `false`. Assert: no review job enqueued, log message indicates "review already dispatched". +- `does NOT fire review dispatch when project has no repo` — `project.repo` is undefined. Assert: no review job enqueued. +- `swallows errors gracefully — does not break the implementation pipeline` — mock `getCheckSuiteStatus` to throw. Assert: implementation pipeline completes normally, error is logged as warn. + +**Implementation** (`src/triggers/shared/agent-execution.ts`): + +- In `runAgentExecutionPipeline`, after the `linkPRPostExecution` block (line ~465), add a new block: + +```ts +// Post-completion review dispatch: when an implementation agent succeeds +// with a PR, check CI and fire review deterministically. This guarantees +// review dispatch within seconds of completion, regardless of webhook timing. +if ( + agentType === 'implementation' && + agentResult.success && + agentResult.prUrl && + project.repo +) { + await tryDispatchPostCompletionReview(agentResult, project, config, executionConfig); +} +``` + +- New function `tryDispatchPostCompletionReview(agentResult, project, config, executionConfig)`: + 1. Extract `prNumber` from `agentResult.prUrl` via `extractPRNumber`. + 2. Parse `owner/repo` via `parseRepoFullName(project.repo)`. + 3. Get `headSha` from `githubClient.getPR(owner, repo, prNumber).headSha`. + 4. Check CI: `githubClient.getCheckSuiteStatus(owner, repo, headSha)`. If not `allPassing`, return (CI not yet green — the `check-suite-success` webhook will handle it when CI finishes). + 5. Dedup: `claimReviewDispatch(buildReviewDispatchKey(owner, repo, prNumber, headSha), 'post-completion-hook', { ... })`. If returns false, return (already dispatched via webhook). + 6. Build `TriggerResult` matching what `check-suite-success` would produce: `{ agentType: 'review', agentInput: { prNumber, prBranch, repoFullName, headSha, triggerType: 'post-completion', triggerEvent: 'scm:check-suite-success', workItemId }, prNumber, prUrl, prTitle, workItemId }`. + 7. Enqueue via the shared execution pipeline: call `runAgentWithCredentials(integration, result, project, config, executionConfig)` — or, if running inside the worker container where credentials are already in scope, directly call `runAgentExecutionPipeline(result, project, config, { ...executionConfig, skipPrepareForAgent: true })`. + + The exact enqueue mechanism depends on whether the worker container can dispatch a BullMQ job or must reuse the in-process pipeline. Investigate during implementation: if `runAgentExecutionPipeline` for review runs in-process (within the same container), this is simplest. If it must be a separate container, enqueue a BullMQ job via a new `Queue('cascade-jobs')` instance (the worker has `REDIS_URL`). + + Wrap the entire function in `try/catch` — log warn on failure but never break the implementation pipeline. + +### 2. Dedup integration + +**Tests first** (`tests/unit/triggers/shared/agent-execution.test.ts`): + +- `subsequent check-suite-success webhook does not enqueue review after post-completion hook already fired` — setup: call `runAgentExecutionPipeline` for implementation (which fires the hook and claims the dedup key). Then simulate a `check-suite-success` trigger for the same PR+SHA. Assert: the trigger's `claimReviewDispatch` returns false, no second review enqueued. + +**Implementation:** +- Import `buildReviewDispatchKey`, `claimReviewDispatch` from `src/triggers/github/review-dispatch-dedup.ts`. +- The dedup key format is already `${owner}/${repo}:${prNumber}:${headSha}` — same key regardless of whether claimed by the webhook trigger or the post-completion hook. + +### 3. Log the dispatch decision + +**Tests first** (`tests/unit/triggers/shared/agent-execution.test.ts`): + +- `logs the post-completion review dispatch decision at INFO` — when hook fires and enqueues, assert logger.info contains `'Post-completion review dispatch'` with `{ prNumber, workItemId, headSha }`. +- `logs skip reason when CI is not green` — assert logger.debug with `'Skipping post-completion review: CI not all passing'`. +- `logs skip reason when already dispatched` — assert logger.info with `'Skipping post-completion review: already dispatched'`. + +**Implementation** (`src/triggers/shared/agent-execution.ts`): +- Add structured log calls at each decision point inside `tryDispatchPostCompletionReview`. + +--- + +## Test Plan + +### Unit tests +- [ ] `tests/unit/triggers/shared/agent-execution.test.ts`: ~8 new tests covering the post-completion hook (fire conditions, skip conditions, dedup, error handling). + +### Acceptance tests +- [ ] Implementation success + green CI → review dispatched. +- [ ] Implementation success + CI not green → no review dispatched (webhook will handle later). +- [ ] Hook fires → subsequent webhook deduped. +- [ ] Hook failure → implementation pipeline completes normally. + +--- + +## Acceptance Criteria (per-plan, testable) + +1. When `agentType = 'implementation'`, `agentResult.success = true`, `agentResult.prUrl` is set, and `getCheckSuiteStatus` returns `allPassing: true`, the review agent is dispatched from the post-completion hook. +2. When `agentType` is not `'implementation'`, no review dispatch fires from the hook. +3. When CI is not all-passing, no review dispatch fires (deferred to `check-suite-success` webhook). +4. When `claimReviewDispatch` returns false (review already dispatched), no second review enqueues. +5. When the hook throws (GitHub API down, Redis error), the implementation pipeline completes normally — error is logged as warn. +6. The hook's dispatch uses the same dedup key format as `check-suite-success`, so the two cannot double-enqueue. +7. All new/modified code has corresponding tests. +8. `npm test` passes. +9. `npm run typecheck` passes. +10. `npm run lint` passes. +11. `CLAUDE.md` updated to document the post-completion review dispatch under "Agent triggers". + +--- + +## Documentation Impact (this plan only) + +| File | Change | +|---|---| +| `CLAUDE.md` | Add a note under "Agent triggers" describing the post-completion review dispatch: when an implementation agent succeeds with a PR, the execution pipeline checks CI status and fires review before the container exits. Note the dedup interaction with `check-suite-success`. | + +--- + +## Out of Scope (this plan) + +- Generalizing the post-completion hook to other trigger chains (splitting → planning, etc.) — per spec, only implementation → review is specified. +- `pr-opened` trigger enablement — per-project config decision. +- Lock persistence to Redis/DB — out of spec scope. +- Making the hook work without `REDIS_URL` in the worker container — all CASCADE worker containers have Redis access. + +--- + +## Progress + + +- [ ] AC #1 +- [ ] AC #2 +- [ ] AC #3 +- [ ] AC #4 +- [ ] AC #5 +- [ ] AC #6 +- [ ] AC #7 +- [ ] AC #8 +- [ ] AC #9 +- [ ] AC #10 +- [ ] AC #11 diff --git a/docs/plans/007-robust-review-dispatch/_coverage.md b/docs/plans/007-robust-review-dispatch/_coverage.md new file mode 100644 index 00000000..27abe0e4 --- /dev/null +++ b/docs/plans/007-robust-review-dispatch/_coverage.md @@ -0,0 +1,28 @@ +# Coverage map for spec 007-robust-review-dispatch + +Auto-generated by /plan. Tracks which plans satisfy which spec ACs. + +## Spec ACs + +| # | Spec AC (short) | Satisfied by | Status | +|---|---|---|---| +| 1 | Review dispatched within 30s after impl + green CI | plan 1 (lock-infra) partial + plan 2 (post-completion-review) full | partial chain | +| 2 | Review can dispatch during container cleanup | plan 1 (lock-infra) | full | +| 3 | Different agent types can run concurrently | plan 1 (lock-infra) | full | +| 4 | Same-type dedup preserved | plan 1 (lock-infra) | full | +| 5 | No double-enqueue between hook + webhook | plan 2 (post-completion-review) | full | +| 6 | Structured lock-skip log | plan 1 (lock-infra) | full | +| 7 | Works for all PM providers (no branching) | plan 1 (lock-infra) | full | +| 8 | Router restart safe (30-min TTL) | plan 1 (lock-infra) | full | + +## Coverage summary + +- **8 spec ACs** mapped to **2 plans** +- **7 plans** with full-coverage ACs (testable in isolation after plan 1) +- **1 spec AC** (#1) with partial-coverage (plan 1 removes the blocker; plan 2 adds the deterministic guarantee) + +## Plan dependency graph + +``` +1-lock-infra ──→ 2-post-completion-review +``` diff --git a/docs/specs/007-robust-review-dispatch.md b/docs/specs/007-robust-review-dispatch.md new file mode 100644 index 00000000..eeee9258 --- /dev/null +++ b/docs/specs/007-robust-review-dispatch.md @@ -0,0 +1,120 @@ +--- +id: 007 +slug: robust-review-dispatch +level: spec +title: Robust review dispatch after implementation completes +created: 2026-04-17 +status: draft +--- + +# 007: Robust review dispatch after implementation completes + +## Problem & Motivation + +When an implementation agent finishes and the PR's CI goes green, the `check-suite-success` trigger fires the review agent. On 2026-04-16 (MNG-122 / PR #572 on llmist), the review was **silently dropped** — the trigger matched twice, both times hitting `Skipping github job — work item already locked`, and the review was never enqueued. The user saw a PR sitting with zero reviews for hours until they noticed manually. + +Three intersecting failures combined to produce the silent drop: + +1. **Lock release lags agent completion by 60+ seconds.** The work-item lock is released when the worker *container* exits, not when the agent finishes. Between the two: snapshot commit, log collection, container teardown — all blocking the lock while GitHub webhooks arrive and are discarded. On MNG-122, the `agent_runs` row was marked `completed` at 05:29:13 but the lock was still held at 05:30:15 and 05:31:59. + +2. **Blocked webhooks are silently discarded.** When `isWorkItemLocked` returns true, the router logs `Skipping github job` and drops the event with no retry mechanism. GitHub does not redeliver `check_suite` webhooks, so the review opportunity is permanently lost. + +3. **Lock doesn't distinguish agent types.** The lock key is `(projectId, workItemId)` — an implementation lock blocks review dispatch even though the two agents don't compete for the same resource. An implementation run holding a lock has no reason to prevent a review from being queued. + +The `pr-opened` trigger for review is intentionally disabled on llmist (project config) — `check-suite-success` was the sole path to review, making it a single point of failure when the lock blocked it. + +This is not a rare edge case. The implementation → PR-open → CI-green → review chain fires on every Linear-backed implementation run, and the timing window (container exit lags agent completion by 30-120s depending on snapshot size) overlaps the CI completion window on most repos with 2-3 minute CI jobs. The review will be silently dropped whenever CI finishes before the container fully exits. + +--- + +## Goals + +- Reviews are always dispatched after an implementation PR passes CI, regardless of lock timing +- The work-item lock accurately reflects what is actually competing — different agent types on the same work item are not competitors +- Lock release happens at the natural "work is done" boundary (DB completion), not at the infrastructure cleanup boundary (container exit) +- A post-completion hook on the implementation agent ensures review fires deterministically, without relying on webhook timing + +--- + +## Non-goals + +- Changing GitHub webhook retry behaviour (GitHub controls redelivery) +- Enabling `pr-opened` for review on llmist or other projects — that's a per-project config decision, not an architectural fix +- Making the work-item lock persistent across router restarts (the in-memory lock with TTL is fit for purpose once the timing and granularity issues are fixed) +- Generalizing the post-completion hook to all trigger types — only the implementation→review chain needs it today; other chains can be added incrementally + +--- + +## Constraints + +- The fix must not break existing Trello/JIRA projects that rely on the coarse lock to prevent duplicate implementation runs on the same card +- Container cleanup (snapshot commit, log collection) must still happen after agent completion — it just must not block the lock +- The BullMQ job queue must remain the sole execution path — no second queue, no cron, no polling loop +- Router restarts must not leave stale locks that block review indefinitely (the existing 30-min TTL is acceptable as a hard ceiling) + +--- + +## User stories / Requirements + +1. **As an operator**, when an implementation agent completes and CI goes green, I see a review agent dispatched within 30 seconds — regardless of how long the worker container takes to shut down. +2. **As an operator**, when I look at the router logs for a stuck PR, I can see exactly why the review was or wasn't dispatched — no silent drops with only a `Skipping` log at DEBUG level. +3. **As a developer adding a 4th PM provider**, the lock and dispatch behaviour works identically for Trello, JIRA, Linear, and the new provider — no per-provider special cases in the lock logic. +4. **As an operator**, if the post-completion hook somehow fails to fire the review, the next `check-suite-success` webhook (e.g. from a late-completing CI workflow) can still dispatch the review — the lock is no longer blocking it. + +--- + +## Research Notes + +- Internal investigation only — no external OSS or academic research is applicable. The problem is specific to CASCADE's in-memory work-item lock, container lifecycle, and webhook processing pipeline. +- The lock was introduced to prevent duplicate agent runs when rapid-fire webhooks (e.g. multiple `check_suite` events for the same PR) enqueue the same agent type multiple times. That invariant must be preserved. +- BullMQ supports delayed jobs natively, but the chosen approach (post-completion hook) is simpler and doesn't introduce timing sensitivity. + +--- + +## Open Source Decisions + +| Tool | Solves | Decision | Reason | +|------|--------|----------|--------| +| BullMQ delayed jobs | Retry after lock | Skip | Post-completion hook is simpler and deterministic; delayed jobs add timing guesswork | + +--- + +## Strategic decisions + +1. **Per-agent-type locking** — chose `(projectId, workItemId, agentType)` as the lock key over the current `(projectId, workItemId)`. Reason: implementation and review don't compete for the same resource; the coarse lock was a false serialization. The duplicate-prevention invariant (same agent type doesn't double-enqueue) is preserved because the lock still holds per-type. + +2. **Lock release at DB completion, not container exit** — chose to release the in-memory lock when `agent_runs.status` transitions to a terminal state (`completed`/`failed`/`timed_out`) over keeping it tied to container exit. Reason: closes the 60s+ gap where the lock falsely blocks new dispatch. Container cleanup (snapshot commit, log collection) continues in the background without holding the lock — those operations don't affect the work item's state. + +3. **Post-completion hook fires review** — chose a synchronous post-completion step in the implementation agent's lifecycle (after success, before container cleanup) that checks "does this work item have an open PR with green CI and no review dispatched yet?" and, if so, enqueues the review directly. Chose this over BullMQ delayed retry because it's deterministic (no timing guesswork), simpler (no new queue patterns), and fires regardless of whether a GitHub webhook arrived while the lock was held. + +4. **Log level for lock-skip raised to INFO with structured context** — the current `Skipping github job` log is at INFO but lacks enough context to diagnose which webhook was dropped and why. The spec requires the log to include: projectId, workItemId, agentType, the trigger handler name, and the lock holder's agent type. This turns the silent drop into a diagnosable event even when the post-completion hook rescues the dispatch. + +--- + +## Acceptance Criteria (outcome-level) + +1. When an implementation agent completes and the corresponding PR has green CI, the review agent is dispatched within 30 seconds — regardless of container exit timing. +2. The review agent can be dispatched while the implementation agent's container is still shutting down (snapshot commit, log collection in progress). +3. Two different agent types for the same work item can run concurrently (e.g. review starts while implementation's container cleanup is finishing). +4. Two runs of the SAME agent type for the same work item are still prevented by the lock (duplicate-prevention invariant preserved). +5. If the post-completion hook fires the review, a subsequent `check-suite-success` webhook does not double-enqueue a second review (deduplication still works). +6. Router logs show structured context when a webhook is blocked by a lock: projectId, workItemId, blocked agent type, lock holder agent type, trigger handler name. +7. The fix works identically for Trello, JIRA, and Linear projects — no per-provider branching in the lock or dispatch logic. +8. Router restart does not leave stale locks that permanently block review dispatch. (Existing 30-min TTL hard ceiling is acceptable.) + +--- + +## Documentation Impact (high-level) + +- `CLAUDE.md` — Update the "Agent triggers" section to note the post-completion review dispatch and per-agent-type locking. +- Integration README — No change needed (PM integration doc doesn't cover lock behaviour). + +--- + +## Out of Scope + +- Enabling `pr-opened` for review on llmist or other projects — that's a per-project operator decision. +- Persisting the work-item lock to Redis or DB — the in-memory lock with TTL is sufficient for the router's single-process deployment model. +- Generalizing the post-completion hook to all trigger chains (e.g. splitting → planning). Only implementation → review is specified. Other chains can be added incrementally. +- Refactoring the orphan-cleanup scan to also clear stale in-memory locks — the per-agent-type locking + DB-completion-release makes this unnecessary for the review use case. If a separate need arises, spec it separately. +- Making the lock distributed across multiple router instances — CASCADE runs a single router process per deployment. From d5d7c2b6f473f029759aa7efecdd26d7fd56ad98 Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 06:48:43 +0000 Subject: [PATCH 2/7] chore(007): lock plan 007/1 as .wip --- .../{1-lock-infra.md => 1-lock-infra.md.wip} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename docs/plans/007-robust-review-dispatch/{1-lock-infra.md => 1-lock-infra.md.wip} (99%) diff --git a/docs/plans/007-robust-review-dispatch/1-lock-infra.md b/docs/plans/007-robust-review-dispatch/1-lock-infra.md.wip similarity index 99% rename from docs/plans/007-robust-review-dispatch/1-lock-infra.md rename to docs/plans/007-robust-review-dispatch/1-lock-infra.md.wip index c7b6becc..51c11674 100644 --- a/docs/plans/007-robust-review-dispatch/1-lock-infra.md +++ b/docs/plans/007-robust-review-dispatch/1-lock-infra.md.wip @@ -6,7 +6,7 @@ plan_slug: lock-infra level: plan parent_spec: docs/specs/007-robust-review-dispatch.md depends_on: [] -status: pending +status: wip --- # 007/1: Lock infrastructure — remove false cross-type serialization From 2cf4039338c9c8b88a2596c9c1f1856c9c174459 Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 06:56:51 +0000 Subject: [PATCH 3/7] =?UTF-8?q?feat(007/1):=20per-agent-type=20work-item?= =?UTF-8?q?=20lock=20=E2=80=94=20remove=20false=20cross-type=20serializati?= =?UTF-8?q?on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Removes the MAX_WORK_ITEM_CONCURRENCY total cap from isWorkItemLocked. The total cap falsely serialized unrelated agent types: the review for MNG-122/PR-572 was silently dropped because 2 agents (implementation + backlog-manager) were already enqueued for the same work item, hitting the total limit of 2. Now only MAX_SAME_TYPE_PER_WORK_ITEM = 1 is enforced. Different agent types can run concurrently on the same work item (e.g. review starts while implementation's container is still cleaning up). Same-type duplicate prevention is preserved. Changes: - src/router/work-item-lock.ts — deleted MAX_WORK_ITEM_CONCURRENCY constant and the total-count checks (in-memory + DB). Simplified getInMemoryCounts → getInMemorySameTypeCount (no longer iterates all keys). Removed the dbTotal query (saves one DB round-trip per lock check). Deleted the unused keyPrefix helper. - src/router/webhook-processor.ts — enriched the lock-skip log with source (adapter type) and renamed agentType → blockedAgentType for clarity. - CLAUDE.md — added per-agent-type lock semantics note under "Agent triggers". Tests: updated 6 existing tests + added 2 new cross-type concurrency tests. All 7852 unit tests pass. Lint + typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 2 + ...lock-infra.md.wip => 1-lock-infra.md.done} | 20 ++-- src/router/webhook-processor.ts | 3 +- src/router/work-item-lock.ts | 91 ++++++------------- tests/unit/router/work-item-lock.test.ts | 49 +++++----- 5 files changed, 66 insertions(+), 99 deletions(-) rename docs/plans/007-robust-review-dispatch/{1-lock-infra.md.wip => 1-lock-infra.md.done} (98%) diff --git a/CLAUDE.md b/CLAUDE.md index 20efcab7..a6bd7506 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -117,6 +117,8 @@ cascade projects trigger-set --agent --event --enabl Some triggers take params (e.g. `review` + `scm:check-suite-success` accepts `{"authorMode":"own"|"external"}`). Legacy configs on `project_integrations.triggers` are auto-migrated on merge to `dev`/`main`. +**Work-item concurrency lock** — the router prevents duplicate agent runs via a per-agent-type lock on `(projectId, workItemId, agentType)`. Only same-type duplicates are blocked; **different agent types can run concurrently** on the same work item (e.g. review starts while implementation's container is still cleaning up). The lock has a 30-minute TTL hard ceiling that auto-clears stale entries after router restart. + ## Review agent — context shape (debugging) Review agent receives a **compact per-file diff context**, not full file contents. Each changed file is a `### (, +N -M)` section with a unified diff hunk. Budget: `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` = 200k tokens, per-file cap 10%. diff --git a/docs/plans/007-robust-review-dispatch/1-lock-infra.md.wip b/docs/plans/007-robust-review-dispatch/1-lock-infra.md.done similarity index 98% rename from docs/plans/007-robust-review-dispatch/1-lock-infra.md.wip rename to docs/plans/007-robust-review-dispatch/1-lock-infra.md.done index 51c11674..56aabfa4 100644 --- a/docs/plans/007-robust-review-dispatch/1-lock-infra.md.wip +++ b/docs/plans/007-robust-review-dispatch/1-lock-infra.md.done @@ -6,7 +6,7 @@ plan_slug: lock-infra level: plan parent_spec: docs/specs/007-robust-review-dispatch.md depends_on: [] -status: wip +status: done --- # 007/1: Lock infrastructure — remove false cross-type serialization @@ -138,12 +138,12 @@ This plan also enriches the lock-skip log with the lock-holder's agent type so o ## Progress -- [ ] AC #1 -- [ ] AC #2 -- [ ] AC #3 -- [ ] AC #4 -- [ ] AC #5 -- [ ] AC #6 -- [ ] AC #7 -- [ ] AC #8 -- [ ] AC #9 +- [x] AC #1 +- [x] AC #2 +- [x] AC #3 +- [x] AC #4 +- [x] AC #5 +- [x] AC #6 +- [x] AC #7 +- [x] AC #8 +- [x] AC #9 diff --git a/src/router/webhook-processor.ts b/src/router/webhook-processor.ts index 46d8f16f..45860cd1 100644 --- a/src/router/webhook-processor.ts +++ b/src/router/webhook-processor.ts @@ -152,9 +152,10 @@ export async function processRouterWebhook( if (lockStatus.locked) { result.onBlocked?.(); logger.info(`Skipping ${adapter.type} job — work item already locked`, { + source: adapter.type, projectId: project.id, workItemId: result.workItemId, - agentType: result.agentType, + blockedAgentType: result.agentType, reason: lockStatus.reason, }); return { diff --git a/src/router/work-item-lock.ts b/src/router/work-item-lock.ts index 8da88a52..67d37ddc 100644 --- a/src/router/work-item-lock.ts +++ b/src/router/work-item-lock.ts @@ -1,8 +1,12 @@ /** * Work-item concurrency lock for the router. * - * Allows up to 2 agents per work item (e.g. implementation + review overlap), - * but only 1 agent of the same type per work item. + * Only 1 agent of the same type per work item. Different agent types can + * run concurrently (e.g. review starts while implementation's container is + * still cleaning up). The total-concurrency cap was removed in spec 007 + * because it falsely serialized unrelated agent types — the review for + * MNG-122/PR-572 was silently dropped because two agents were already + * enqueued for the same work item. * * Two layers: * 1. In-memory map — closes the race window between addJob() and worker createRun() @@ -13,7 +17,6 @@ import { countActiveRuns } from '../db/repositories/runsRepository.js'; import { logger } from '../utils/logging.js'; import { routerConfig } from './config.js'; -export const MAX_WORK_ITEM_CONCURRENCY = 2; export const MAX_SAME_TYPE_PER_WORK_ITEM = 1; const TTL_MS = 30 * 60 * 1000; // 30 minutes @@ -29,83 +32,56 @@ function makeKey(projectId: string, workItemId: string, agentType: string): stri return `${projectId}:${workItemId}:${agentType}`; } -function keyPrefix(projectId: string, workItemId: string): string { - return `${projectId}:${workItemId}:`; -} - /** - * Sum in-memory counts for all agent types on a given work item. - * Skips TTL-expired entries and cleans them up lazily. + * Get the in-memory enqueue count for a specific (projectId, workItemId, agentType). + * Cleans up TTL-expired entries lazily. */ -function getInMemoryCounts( +function getInMemorySameTypeCount( projectId: string, workItemId: string, agentType: string, -): { total: number; sameType: number } { - const prefix = keyPrefix(projectId, workItemId); - const now = Date.now(); - let total = 0; - let sameType = 0; - - for (const [key, entry] of enqueuedMap) { - if (!key.startsWith(prefix)) continue; - if (now - entry.timestamp > TTL_MS) { - enqueuedMap.delete(key); - logger.info('[WorkItemLock] TTL expired, releasing in-memory lock', { - projectId, - workItemId, - }); - continue; - } - total += entry.count; - if (key === makeKey(projectId, workItemId, agentType)) { - sameType = entry.count; - } +): number { + const key = makeKey(projectId, workItemId, agentType); + const entry = enqueuedMap.get(key); + if (!entry) return 0; + if (Date.now() - entry.timestamp > TTL_MS) { + enqueuedMap.delete(key); + logger.info('[WorkItemLock] TTL expired, releasing in-memory lock', { + projectId, + workItemId, + agentType, + }); + return 0; } - - return { total, sameType }; + return entry.count; } /** * Check whether a work item is currently locked for the given agent type. * - * Locked when: - * - Same agent type already has MAX_SAME_TYPE_PER_WORK_ITEM agents running/enqueued - * - Total agents on this work item already at MAX_WORK_ITEM_CONCURRENCY + * Locked when the same agent type already has MAX_SAME_TYPE_PER_WORK_ITEM + * agents running or enqueued. Different agent types are NOT blocked — they + * can run concurrently on the same work item (spec 007). */ export async function isWorkItemLocked( projectId: string, workItemId: string, agentType: string, ): Promise<{ locked: boolean; reason?: string }> { - const { total: inMemoryTotal, sameType: inMemorySameType } = getInMemoryCounts( - projectId, - workItemId, - agentType, - ); + const inMemorySameType = getInMemorySameTypeCount(projectId, workItemId, agentType); - // Short-circuit: in-memory alone proves locked + // Short-circuit: in-memory alone proves locked for same type if (inMemorySameType >= MAX_SAME_TYPE_PER_WORK_ITEM) { return { locked: true, reason: `in-memory same-type: ${inMemorySameType} enqueued (max ${MAX_SAME_TYPE_PER_WORK_ITEM} per type)`, }; } - if (inMemoryTotal >= MAX_WORK_ITEM_CONCURRENCY) { - return { - locked: true, - reason: `in-memory total: ${inMemoryTotal} enqueued (max ${MAX_WORK_ITEM_CONCURRENCY})`, - }; - } - // DB check — ignore runs older than 2× worker timeout (stale/orphaned) + // DB check — same-type only, ignore runs older than 2× worker timeout const maxAgeMs = 2 * routerConfig.workerTimeoutMs; - const [dbTotal, dbSameType] = await Promise.all([ - countActiveRuns({ projectId, workItemId, maxAgeMs }), - countActiveRuns({ projectId, workItemId, agentType, maxAgeMs }), - ]); + const dbSameType = await countActiveRuns({ projectId, workItemId, agentType, maxAgeMs }); - // Same-type check first (more specific) const effectiveSameType = Math.max(dbSameType, inMemorySameType); if (effectiveSameType >= MAX_SAME_TYPE_PER_WORK_ITEM) { return { @@ -114,15 +90,6 @@ export async function isWorkItemLocked( }; } - // Total work-item check - const effectiveTotal = Math.max(dbTotal, inMemoryTotal); - if (effectiveTotal >= MAX_WORK_ITEM_CONCURRENCY) { - return { - locked: true, - reason: `total: ${dbTotal} running, ${inMemoryTotal} enqueued (max ${MAX_WORK_ITEM_CONCURRENCY})`, - }; - } - return { locked: false }; } diff --git a/tests/unit/router/work-item-lock.test.ts b/tests/unit/router/work-item-lock.test.ts index e45af74f..19eb5c76 100644 --- a/tests/unit/router/work-item-lock.test.ts +++ b/tests/unit/router/work-item-lock.test.ts @@ -31,12 +31,8 @@ describe('work-item-lock', () => { const result = await isWorkItemLocked('proj1', 'card1', 'implementation'); expect(result).toEqual({ locked: false }); const maxAgeMs = 2 * 30 * 60 * 1000; - // Two parallel countActiveRuns calls: one for total (workItemId only) and one for same-type - expect(countActiveRuns).toHaveBeenCalledWith({ - projectId: 'proj1', - workItemId: 'card1', - maxAgeMs, - }); + // Only one DB query: same-type count (no total query — total cap removed) + expect(countActiveRuns).toHaveBeenCalledTimes(1); expect(countActiveRuns).toHaveBeenCalledWith({ projectId: 'proj1', workItemId: 'card1', @@ -45,7 +41,7 @@ describe('work-item-lock', () => { }); }); - it('1 enqueued agent does not lock (1 < MAX_WORK_ITEM_CONCURRENCY)', async () => { + it('1 enqueued agent of different type does not lock (per-type only)', async () => { markWorkItemEnqueued('proj1', 'card1', 'implementation'); const result = await isWorkItemLocked('proj1', 'card1', 'review'); expect(result.locked).toBe(false); @@ -58,12 +54,17 @@ describe('work-item-lock', () => { expect(result.reason).toContain('same-type'); }); - it('2 enqueued agents of different types locks (total = MAX_WORK_ITEM_CONCURRENCY)', async () => { + it('allows 3+ different agent types concurrently (no total cap)', async () => { markWorkItemEnqueued('proj1', 'card1', 'implementation'); markWorkItemEnqueued('proj1', 'card1', 'review'); const result = await isWorkItemLocked('proj1', 'card1', 'debug'); - expect(result.locked).toBe(true); - expect(result.reason).toContain('total'); + expect(result.locked).toBe(false); + }); + + it('allows review dispatch while implementation is enqueued', async () => { + markWorkItemEnqueued('proj1', 'card1', 'implementation'); + const result = await isWorkItemLocked('proj1', 'card1', 'review'); + expect(result.locked).toBe(false); }); it('clearWorkItemEnqueued decrements count, does not immediately delete', async () => { @@ -83,31 +84,28 @@ describe('work-item-lock', () => { }); it('DB count of 1 does not lock for different type', async () => { - // First call (total): 1, second call (same-type): 0 - vi.mocked(countActiveRuns).mockResolvedValueOnce(1).mockResolvedValueOnce(0); + // Single DB call: same-type count for 'review' is 0 + vi.mocked(countActiveRuns).mockResolvedValueOnce(0); const result = await isWorkItemLocked('proj1', 'card1', 'review'); expect(result.locked).toBe(false); }); - it('DB total count of 2 locks', async () => { - // First call (total): 2, second call (same-type): 0 - vi.mocked(countActiveRuns).mockResolvedValueOnce(2).mockResolvedValueOnce(0); + it('DB total count irrelevant for cross-type dispatch (total cap removed)', async () => { + // Only one DB call now: same-type. Return 0 for it. + vi.mocked(countActiveRuns).mockResolvedValueOnce(0); const result = await isWorkItemLocked('proj1', 'card1', 'review'); - expect(result.locked).toBe(true); - expect(result.reason).toContain('total'); + expect(result.locked).toBe(false); }); it('DB same-type count of 1 locks for same type', async () => { - // First call (total): 1, second call (same-type): 1 - vi.mocked(countActiveRuns).mockResolvedValueOnce(1).mockResolvedValueOnce(1); + vi.mocked(countActiveRuns).mockResolvedValueOnce(1); const result = await isWorkItemLocked('proj1', 'card1', 'implementation'); expect(result.locked).toBe(true); expect(result.reason).toContain('same-type'); }); - it('DB same-type count of 1 does not lock for different type when total < max', async () => { - // First call (total): 1, second call (same-type for 'review'): 0 - vi.mocked(countActiveRuns).mockResolvedValueOnce(1).mockResolvedValueOnce(0); + it('DB same-type count of 0 does not lock for different type', async () => { + vi.mocked(countActiveRuns).mockResolvedValueOnce(0); const result = await isWorkItemLocked('proj1', 'card1', 'review'); expect(result.locked).toBe(false); }); @@ -155,12 +153,11 @@ describe('work-item-lock', () => { expect(countActiveRuns).not.toHaveBeenCalled(); }); - it('short-circuits on in-memory total without DB query', async () => { + it('does NOT short-circuit on in-memory total (total cap removed)', async () => { markWorkItemEnqueued('proj1', 'card1', 'implementation'); markWorkItemEnqueued('proj1', 'card1', 'review'); const result = await isWorkItemLocked('proj1', 'card1', 'debug'); - expect(result.locked).toBe(true); - expect(result.reason).toContain('in-memory total'); - expect(countActiveRuns).not.toHaveBeenCalled(); + // No total cap — 'debug' same-type is 0, so unlocked + expect(result.locked).toBe(false); }); }); From 6656ea89370f7ba84403dae0e1ca2f84c939ee6d Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 07:15:10 +0000 Subject: [PATCH 4/7] chore(007): lock plan 007/2 as .wip --- ...ost-completion-review.md => 2-post-completion-review.md.wip} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename docs/plans/007-robust-review-dispatch/{2-post-completion-review.md => 2-post-completion-review.md.wip} (99%) diff --git a/docs/plans/007-robust-review-dispatch/2-post-completion-review.md b/docs/plans/007-robust-review-dispatch/2-post-completion-review.md.wip similarity index 99% rename from docs/plans/007-robust-review-dispatch/2-post-completion-review.md rename to docs/plans/007-robust-review-dispatch/2-post-completion-review.md.wip index 921353b2..9122b76e 100644 --- a/docs/plans/007-robust-review-dispatch/2-post-completion-review.md +++ b/docs/plans/007-robust-review-dispatch/2-post-completion-review.md.wip @@ -6,7 +6,7 @@ plan_slug: post-completion-review level: plan parent_spec: docs/specs/007-robust-review-dispatch.md depends_on: [1-lock-infra.md] -status: pending +status: wip --- # 007/2: Post-completion review dispatch hook From d5ab86d01e3efaf4a92d1bd597187632385ace83 Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 07:24:01 +0000 Subject: [PATCH 5/7] =?UTF-8?q?feat(007/2):=20post-completion=20review=20d?= =?UTF-8?q?ispatch=20=E2=80=94=20deterministic=20review=20after=20implemen?= =?UTF-8?q?tation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When an implementation agent succeeds with a PR, the execution pipeline now checks CI status and fires the review agent before the container exits. This guarantees review dispatch within seconds of implementation completion, regardless of GitHub webhook timing. Uses the same recursive `runAgentExecutionPipeline` pattern as the splitting → backlog-manager chain. The review runs in the same container, same credential scope. Uses `claimReviewDispatch` with the same dedup key format as the `check-suite-success` trigger, so the two paths cannot double-enqueue. The hook is best-effort: GitHub API failures, Redis errors, or any exception is caught, logged as warn, and does NOT break the implementation pipeline. New function `tryDispatchPostCompletionReview` in agent-execution.ts: 1. Extracts prNumber from agentResult.prUrl 2. Fetches PR details (headSha, headRef) from GitHub 3. Checks CI status via getCheckSuiteStatus — if not allPassing, returns (check-suite-success webhook will handle it when CI finishes) 4. Claims the dedup key via claimReviewDispatch — if already claimed, returns (review was already dispatched by the webhook path) 5. Builds a review TriggerResult and calls runAgentExecutionPipeline recursively (same pattern as splitting → backlog-manager) Tests: +7 new tests covering: fires review on success + green CI, skips for non-implementation, skips on failure, skips when no prUrl, skips when CI not green, skips when already dispatched, swallows errors gracefully. All 7859 unit tests pass. Lint + typecheck clean. CLAUDE.md updated with post-completion review dispatch documentation. Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 2 + ...d.wip => 2-post-completion-review.md.done} | 24 +-- src/triggers/shared/agent-execution.ts | 110 +++++++++++ .../triggers/shared/agent-execution.test.ts | 182 +++++++++++++++++- 4 files changed, 305 insertions(+), 13 deletions(-) rename docs/plans/007-robust-review-dispatch/{2-post-completion-review.md.wip => 2-post-completion-review.md.done} (98%) diff --git a/CLAUDE.md b/CLAUDE.md index a6bd7506..363e9ffd 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -119,6 +119,8 @@ Some triggers take params (e.g. `review` + `scm:check-suite-success` accepts `{" **Work-item concurrency lock** — the router prevents duplicate agent runs via a per-agent-type lock on `(projectId, workItemId, agentType)`. Only same-type duplicates are blocked; **different agent types can run concurrently** on the same work item (e.g. review starts while implementation's container is still cleaning up). The lock has a 30-minute TTL hard ceiling that auto-clears stale entries after router restart. +**Post-completion review dispatch** — when an implementation agent succeeds with a PR, the execution pipeline checks CI status and fires the review agent deterministically (before the container exits). This guarantees review dispatch within seconds of implementation completion, regardless of GitHub webhook timing. Uses the same `claimReviewDispatch` dedup key as the `check-suite-success` trigger, so the two paths cannot double-enqueue. + ## Review agent — context shape (debugging) Review agent receives a **compact per-file diff context**, not full file contents. Each changed file is a `### (, +N -M)` section with a unified diff hunk. Budget: `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` = 200k tokens, per-file cap 10%. diff --git a/docs/plans/007-robust-review-dispatch/2-post-completion-review.md.wip b/docs/plans/007-robust-review-dispatch/2-post-completion-review.md.done similarity index 98% rename from docs/plans/007-robust-review-dispatch/2-post-completion-review.md.wip rename to docs/plans/007-robust-review-dispatch/2-post-completion-review.md.done index 9122b76e..b1710c2e 100644 --- a/docs/plans/007-robust-review-dispatch/2-post-completion-review.md.wip +++ b/docs/plans/007-robust-review-dispatch/2-post-completion-review.md.done @@ -6,7 +6,7 @@ plan_slug: post-completion-review level: plan parent_spec: docs/specs/007-robust-review-dispatch.md depends_on: [1-lock-infra.md] -status: wip +status: done --- # 007/2: Post-completion review dispatch hook @@ -165,14 +165,14 @@ if ( ## Progress -- [ ] AC #1 -- [ ] AC #2 -- [ ] AC #3 -- [ ] AC #4 -- [ ] AC #5 -- [ ] AC #6 -- [ ] AC #7 -- [ ] AC #8 -- [ ] AC #9 -- [ ] AC #10 -- [ ] AC #11 +- [x] AC #1 +- [x] AC #2 +- [x] AC #3 +- [x] AC #4 +- [x] AC #5 +- [x] AC #6 +- [x] AC #7 +- [x] AC #8 +- [x] AC #9 +- [x] AC #10 +- [x] AC #11 diff --git a/src/triggers/shared/agent-execution.ts b/src/triggers/shared/agent-execution.ts index e6703d7c..c72796d1 100644 --- a/src/triggers/shared/agent-execution.ts +++ b/src/triggers/shared/agent-execution.ts @@ -10,10 +10,15 @@ import { PMLifecycleManager, resolveProjectPMConfig, } from '../../pm/index.js'; +import { + buildReviewDispatchKey, + claimReviewDispatch, +} from '../../triggers/github/review-dispatch-dedup.js'; import { checkTriggerEnabled } from '../../triggers/shared/trigger-check.js'; import type { AgentResult, CascadeConfig, ProjectConfig } from '../../types/index.js'; import { logger } from '../../utils/logging.js'; import { extractPRNumber } from '../../utils/prUrl.js'; +import { parseRepoFullName } from '../../utils/repo.js'; import type { TriggerResult } from '../types.js'; import { handleAgentResultArtifacts } from './agent-result-handler.js'; import { isPipelineAtCapacity } from './backlog-check.js'; @@ -228,6 +233,96 @@ async function linkPRPostExecution( } } +/** + * Dispatch a review agent after a successful implementation run, if the PR's + * CI is green and no review has been dispatched yet. + * + * Uses `claimReviewDispatch` with the same dedup key format as the + * `check-suite-success` trigger, so the two paths cannot double-enqueue. + * If CI isn't green yet, does nothing — the webhook-triggered path will + * handle it when CI finishes. + * + * Runs inside the worker container, before exit. Uses the same recursive + * `runAgentExecutionPipeline` pattern as the splitting → backlog-manager chain. + * + * Best-effort: errors are logged as warn but never break the implementation + * pipeline. + */ +async function tryDispatchPostCompletionReview( + agentResult: AgentResult & { prUrl: string }, + project: ProjectConfig & { repo: string }, + workItemId: string | undefined, + config: CascadeConfig, + executionConfig: AgentExecutionConfig, +): Promise { + try { + const prNumber = extractPRNumber(agentResult.prUrl); + if (!prNumber) return; + + const { owner, repo } = parseRepoFullName(project.repo); + const { githubClient } = await import('../../github/client.js'); + + const pr = await githubClient.getPR(owner, repo, prNumber); + const headSha = pr.headSha; + if (!headSha) return; + + const checkStatus = await githubClient.getCheckSuiteStatus(owner, repo, headSha); + if (!checkStatus.allPassing) { + logger.debug('Skipping post-completion review: CI not all passing', { + prNumber, + workItemId, + }); + return; + } + + const dedupKey = buildReviewDispatchKey(owner, repo, prNumber, headSha); + if (!claimReviewDispatch(dedupKey, 'post-completion-hook', { prNumber, headSha })) { + logger.info('Skipping post-completion review: already dispatched', { + prNumber, + workItemId, + dedupKey, + }); + return; + } + + logger.info('Post-completion review dispatch: firing review for implementation PR', { + prNumber, + workItemId, + headSha, + }); + + const reviewResult: TriggerResult = { + agentType: 'review', + agentInput: { + prNumber, + prBranch: pr.headRef, + repoFullName: project.repo, + headSha, + triggerType: 'ci-success', + triggerEvent: 'scm:check-suite-success', + workItemId, + }, + prNumber, + prUrl: agentResult.prUrl, + prTitle: pr.title, + workItemId, + }; + + await runAgentExecutionPipeline(reviewResult, project, config, { + ...executionConfig, + skipPrepareForAgent: true, + skipHandleFailure: true, + logLabel: 'review (post-completion)', + }); + } catch (err) { + logger.warn('Post-completion review dispatch failed (non-fatal)', { + prUrl: agentResult.prUrl, + workItemId, + error: String(err), + }); + } +} + /** * Post an agent summary to the PM work item after a successful agent run. * Cross-source concern: fires for all trigger types (GitHub, Trello, JIRA). @@ -506,6 +601,21 @@ export async function runAgentExecutionPipeline( await onFailure(result, agentResult); } + // Post-completion review dispatch: when an implementation agent succeeds + // with a PR, check CI and fire review deterministically. This guarantees + // review dispatch within seconds of completion, regardless of webhook + // timing (spec 007). Uses the same recursive pattern as the splitting → + // backlog-manager chain below. + if (agentType === 'implementation' && agentResult.success && agentResult.prUrl && project.repo) { + await tryDispatchPostCompletionReview( + agentResult as AgentResult & { prUrl: string }, + project as ProjectConfig & { repo: string }, + workItemId, + config, + executionConfig, + ); + } + // After a successful splitting run, propagate auto label and optionally chain backlog-manager if (agentType === 'splitting' && agentResult.success && workItemId) { const chainResult = await propagateAutoLabelAfterSplitting(workItemId, project); diff --git a/tests/unit/triggers/shared/agent-execution.test.ts b/tests/unit/triggers/shared/agent-execution.test.ts index 69eaca38..172131d0 100644 --- a/tests/unit/triggers/shared/agent-execution.test.ts +++ b/tests/unit/triggers/shared/agent-execution.test.ts @@ -28,6 +28,8 @@ const { mockGithubClient, mockParseRepoFullName, mockGetAgentProfile, + mockClaimReviewDispatch, + mockBuildReviewDispatchKey, } = vi.hoisted(() => ({ mockRunAgent: vi.fn(), mockGetPMProvider: vi.fn(), @@ -71,9 +73,14 @@ const { t === 'respond-to-ci' || t === 'respond-to-review' || t === 'resolve-conflicts', ), mockLookupWorkItemForPR: vi.fn().mockResolvedValue(null), - mockGithubClient: { getPR: vi.fn().mockResolvedValue({ title: 'feat: test PR' }) }, + mockGithubClient: { + getPR: vi.fn().mockResolvedValue({ title: 'feat: test PR', headSha: 'abc123' }), + getCheckSuiteStatus: vi.fn().mockResolvedValue({ allPassing: false }), + }, mockParseRepoFullName: vi.fn().mockReturnValue({ owner: 'acme', repo: 'myapp' }), mockGetAgentProfile: vi.fn().mockResolvedValue({ lifecycleHooks: {} }), + mockClaimReviewDispatch: vi.fn().mockReturnValue(true), + mockBuildReviewDispatchKey: vi.fn().mockReturnValue('acme/myapp:42:abc123'), })); vi.mock('../../../../src/agents/registry.js', () => ({ @@ -162,6 +169,11 @@ vi.mock('../../../../src/agents/definitions/profiles.js', () => ({ getAgentProfile: mockGetAgentProfile, })); +vi.mock('../../../../src/triggers/github/review-dispatch-dedup.js', () => ({ + claimReviewDispatch: (...args: unknown[]) => mockClaimReviewDispatch(...args), + buildReviewDispatchKey: (...args: unknown[]) => mockBuildReviewDispatchKey(...args), +})); + import { linkPRToWorkItem } from '../../../../src/db/repositories/prWorkItemsRepository.js'; import { runAgentExecutionPipeline } from '../../../../src/triggers/shared/agent-execution.js'; @@ -922,3 +934,171 @@ describe('workItemId staleness recovery (via runAgentExecutionPipeline)', () => ); }); }); + +// --------------------------------------------------------------------------- +// Post-completion review dispatch (via runAgentExecutionPipeline) +// --------------------------------------------------------------------------- + +describe('post-completion review dispatch (via runAgentExecutionPipeline)', () => { + beforeEach(() => { + mockCreatePMProvider.mockReturnValue({}); + mockResolveProjectPMConfig.mockReturnValue(PM_CONFIG); + mockValidateIntegrations.mockResolvedValue({ valid: true, errors: [] }); + mockCheckBudgetExceeded.mockResolvedValue(null); + mockHandleAgentResultArtifacts.mockResolvedValue(undefined); + mockShouldTriggerDebug.mockResolvedValue(null); + mockGetSessionState.mockReturnValue({}); + mockParseRepoFullName.mockReturnValue({ owner: 'acme', repo: 'myapp' }); + mockGithubClient.getPR.mockResolvedValue({ + title: 'feat: test PR', + headSha: 'sha-abc123', + head: { ref: 'feat/test' }, + }); + mockGithubClient.getCheckSuiteStatus.mockResolvedValue({ allPassing: true }); + mockClaimReviewDispatch.mockReturnValue(true); + mockBuildReviewDispatchKey.mockReturnValue('acme/myapp:42:sha-abc123'); + }); + + it('fires review after successful implementation with prUrl and green CI', async () => { + mockRunAgent + .mockResolvedValueOnce({ + success: true, + output: '', + runId: 'run-impl', + prUrl: 'https://github.com/acme/myapp/pull/42', + }) + .mockResolvedValueOnce({ success: true, output: '', runId: 'run-review' }); + + await runAgentExecutionPipeline( + { agentType: 'implementation', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ); + + // runAgent called twice: implementation + review + expect(mockRunAgent).toHaveBeenCalledTimes(2); + expect(mockRunAgent).toHaveBeenNthCalledWith( + 2, + 'review', + expect.objectContaining({ project: PROJECT }), + ); + expect(mockClaimReviewDispatch).toHaveBeenCalled(); + }); + + it('does NOT fire review when agentType is not implementation', async () => { + mockRunAgent.mockResolvedValueOnce({ + success: true, + output: '', + runId: 'run-review', + prUrl: 'https://github.com/acme/myapp/pull/42', + }); + + await runAgentExecutionPipeline( + { agentType: 'review', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ); + + expect(mockRunAgent).toHaveBeenCalledTimes(1); + expect(mockClaimReviewDispatch).not.toHaveBeenCalled(); + }); + + it('does NOT fire review when implementation failed', async () => { + mockRunAgent.mockResolvedValueOnce({ + success: false, + output: '', + error: 'build failed', + }); + + await runAgentExecutionPipeline( + { agentType: 'implementation', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ); + + expect(mockRunAgent).toHaveBeenCalledTimes(1); + expect(mockClaimReviewDispatch).not.toHaveBeenCalled(); + }); + + it('does NOT fire review when implementation has no prUrl', async () => { + mockRunAgent.mockResolvedValueOnce({ + success: true, + output: '', + runId: 'run-impl', + }); + + await runAgentExecutionPipeline( + { agentType: 'implementation', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ); + + expect(mockRunAgent).toHaveBeenCalledTimes(1); + expect(mockGithubClient.getCheckSuiteStatus).not.toHaveBeenCalled(); + }); + + it('does NOT fire review when CI is not all green', async () => { + mockGithubClient.getCheckSuiteStatus.mockResolvedValueOnce({ allPassing: false }); + mockRunAgent.mockResolvedValueOnce({ + success: true, + output: '', + runId: 'run-impl', + prUrl: 'https://github.com/acme/myapp/pull/42', + }); + + await runAgentExecutionPipeline( + { agentType: 'implementation', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ); + + expect(mockRunAgent).toHaveBeenCalledTimes(1); + expect(mockClaimReviewDispatch).not.toHaveBeenCalled(); + }); + + it('does NOT fire review when claimReviewDispatch returns false (already dispatched)', async () => { + mockClaimReviewDispatch.mockReturnValueOnce(false); + mockRunAgent.mockResolvedValueOnce({ + success: true, + output: '', + runId: 'run-impl', + prUrl: 'https://github.com/acme/myapp/pull/42', + }); + + await runAgentExecutionPipeline( + { agentType: 'implementation', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ); + + expect(mockRunAgent).toHaveBeenCalledTimes(1); + expect(mockLogger.info).toHaveBeenCalledWith( + expect.stringContaining('already dispatched'), + expect.anything(), + ); + }); + + it('swallows errors gracefully — does not break the implementation pipeline', async () => { + mockGithubClient.getCheckSuiteStatus.mockRejectedValueOnce(new Error('GitHub API down')); + mockRunAgent.mockResolvedValueOnce({ + success: true, + output: '', + runId: 'run-impl', + prUrl: 'https://github.com/acme/myapp/pull/42', + }); + + // Pipeline should complete normally despite the hook failing + await expect( + runAgentExecutionPipeline( + { agentType: 'implementation', agentInput: {}, workItemId: 'card-1' }, + PROJECT, + CONFIG, + ), + ).resolves.not.toThrow(); + + expect(mockLogger.warn).toHaveBeenCalledWith( + expect.stringContaining('Post-completion review dispatch failed'), + expect.objectContaining({ error: expect.any(String) }), + ); + }); +}); From 4352689388425e4948d2b591a1d65eb94a6f6f4e Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 07:24:43 +0000 Subject: [PATCH 6/7] =?UTF-8?q?docs(007):=20spec=20complete=20=E2=80=94=20?= =?UTF-8?q?all=20plans=20done?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...st-review-dispatch.md => 007-robust-review-dispatch.md.done} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename docs/specs/{007-robust-review-dispatch.md => 007-robust-review-dispatch.md.done} (99%) diff --git a/docs/specs/007-robust-review-dispatch.md b/docs/specs/007-robust-review-dispatch.md.done similarity index 99% rename from docs/specs/007-robust-review-dispatch.md rename to docs/specs/007-robust-review-dispatch.md.done index eeee9258..44428889 100644 --- a/docs/specs/007-robust-review-dispatch.md +++ b/docs/specs/007-robust-review-dispatch.md.done @@ -4,7 +4,7 @@ slug: robust-review-dispatch level: spec title: Robust review dispatch after implementation completes created: 2026-04-17 -status: draft +status: done --- # 007: Robust review dispatch after implementation completes From 1f94178525cd98d479c0cca0b0f9619066448682 Mon Sep 17 00:00:00 2001 From: Zbigniew Sobiecki Date: Fri, 17 Apr 2026 07:30:24 +0000 Subject: [PATCH 7/7] chore(deps): update protobufjs to fix critical CVE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit npm audit flagged protobufjs <7.5.5 for arbitrary code execution (GHSA-xq3m-2v4x-88gg). Updated via `npm update protobufjs` — lockfile only, no package.json change. Co-Authored-By: Claude Opus 4.6 (1M context) --- package-lock.json | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/package-lock.json b/package-lock.json index 7769e382..5bbe5da4 100644 --- a/package-lock.json +++ b/package-lock.json @@ -95,13 +95,13 @@ } }, "node_modules/@anthropic-ai/claude-agent-sdk": { - "version": "0.2.91", - "resolved": "https://registry.npmjs.org/@anthropic-ai/claude-agent-sdk/-/claude-agent-sdk-0.2.91.tgz", - "integrity": "sha512-DCd5Ad5XKBbIIOMZ73L+c+e9azM6NtZzOtdKQAzykzRG/KxSCMraMAsMMQrJrIUMH3oTtHY7QuQimAiElVVVpA==", + "version": "0.2.112", + "resolved": "https://registry.npmjs.org/@anthropic-ai/claude-agent-sdk/-/claude-agent-sdk-0.2.112.tgz", + "integrity": "sha512-vMFoiDKlOive8p3tphpV1gQaaytOipwGJ+uw9mvvaLQUODSC2+fCdRDAY25i2Tsv+lOtxzXBKctmaDuWqZY7ig==", "license": "SEE LICENSE IN README.md", "dependencies": { - "@anthropic-ai/sdk": "^0.80.0", - "@modelcontextprotocol/sdk": "^1.27.1" + "@anthropic-ai/sdk": "^0.81.0", + "@modelcontextprotocol/sdk": "^1.29.0" }, "engines": { "node": ">=18.0.0" @@ -122,9 +122,9 @@ } }, "node_modules/@anthropic-ai/claude-agent-sdk/node_modules/@anthropic-ai/sdk": { - "version": "0.80.0", - "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.80.0.tgz", - "integrity": "sha512-WeXLn7zNVk3yjeshn+xZHvld6AoFUOR3Sep6pSoHho5YbSi6HwcirqgPA5ccFuW8QTVJAAU7N8uQQC6Wa9TG+g==", + "version": "0.81.0", + "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.81.0.tgz", + "integrity": "sha512-D4K5PvEV6wPiRtVlVsJHIUhHAmOZ6IT/I9rKlTf84gR7GyyAurPJK7z9BOf/AZqC5d1DhYQGJNKRmV+q8dGhgw==", "license": "MIT", "dependencies": { "json-schema-to-ts": "^3.1.1" @@ -9429,7 +9429,9 @@ "license": "MIT" }, "node_modules/protobufjs": { - "version": "7.5.4", + "version": "7.5.5", + "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.5.5.tgz", + "integrity": "sha512-3wY1AxV+VBNW8Yypfd1yQY9pXnqTAN+KwQxL8iYm3/BjKYMNg4i0owhEe26PWDOMaIrzeeF98Lqd5NGz4omiIg==", "hasInstallScript": true, "license": "BSD-3-Clause", "dependencies": {