diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 00000000..a69641f5 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,13 @@ +# Changelog + +All notable user-visible changes to CASCADE are documented here. The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); the project does not use strict semver for releases. + +## Unreleased + +### Changed + +- **Review-agent context shape: compact diffs instead of full files.** The review agent's pre-fetched PR context now consists of compact per-file diffs (using GitHub's `file.patch`) rather than full file contents. Files that can't fit the budget — deleted, binary, oversized patch, or cumulative budget exhausted — are surfaced in a structured `SKIPPED FILES` injection that names each file with a reason and tells the agent how to fetch it on demand (`gh pr diff`, `Read`, `Grep`). This scales with PR size rather than repo size, mitigates LLM context rot, and ensures the agent is aware of (rather than blind to) the omissions. The context budget is `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` (200k tokens), replacing the prior 25k full-file cap. The `SKIPPED FILES` injection is also delivered to the four other agents that share the PR context pipeline (`respond-to-ci`, `respond-to-pr-comment`, `respond-to-review`, `resolve-conflicts`); explicit prompt guidance is added in `review.yaml` only. (Spec [001](docs/specs/001-pr-review-correctness.md), plan [2/2](docs/plans/001-pr-review-correctness/2-context-rework.md.done).) + +### Fixed + +- **Review agent on external-fork and large PRs.** PR checkouts now use the canonical `refs/pull/N/head` ref, which works for same-repo branches and external-fork branches alike. Previously, a silent `git checkout ` failure on fork PRs caused the worker to review the base branch (`dev`) while believing it was on the PR branch, producing confidently wrong reviews. Any git or HEAD-SHA mismatch now fails the run loudly rather than silently continuing. Additionally, every paginated GitHub REST endpoint used in the review setup pipeline now paginates to completion, so PRs with more than 100 changed files are no longer truncated at the first page. (Spec [001](docs/specs/001-pr-review-correctness.md), plan [1/2](docs/plans/001-pr-review-correctness/1-checkout-and-pagination.md.done).) diff --git a/CLAUDE.md b/CLAUDE.md index b6cc7127..517e0073 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -41,6 +41,18 @@ Projects are configured in the PostgreSQL database (`projects` table). Each proj ## Development +### PR Checkout (worker) + +The worker checks out PRs via the canonical `refs/pull/N/head` ref — works for both same-repo branches and external-fork branches. When `prNumber` is set on `AgentInput`, `setupRepository`: + +1. Fetches `+refs/pull//head:refs/remotes/pr/` from `origin`. +2. Detached-checks out `pr/`. +3. If `headSha` is also set on `AgentInput`, verifies `git rev-parse HEAD` matches. + +Any non-zero git exit code throws — there is no warn-and-continue in setup. Failed runs are marked failed in the dashboard rather than proceeding on a stale or wrong working tree. + +The legacy `prBranch` field is retained for human-readable logging but is **not** used to drive checkout (fork branches don't exist on `origin` and the by-name path silently 404s). + ### Testing > **For a full catalog of test helpers, factory functions, and mock objects**, see [`tests/README.md`](tests/README.md). @@ -759,6 +771,14 @@ CASCADE integrates llmist's resilience features to ensure reliable operation dur ## Debugging Production Sessions +### Review Agent — Context Shape + +The review agent receives a **compact per-file diff context** rather than full file contents. Each changed file appears as a `### (, +N -M)` section followed by a unified diff hunk. The budget is `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` (200,000 tokens), with a per-file cap of 10% of that. + +Files that can't fit (deleted, binary, oversized patch, or budget exhausted) are surfaced via a separate `SKIPPED FILES` injection. The injection is self-documenting: it lists each filename + reason and instructs the agent to fetch on demand using `gh pr diff -- `, `Read `, or `Grep `. + +When debugging review-agent output that misses something, check the `PR context prepared` log entry for `included`/`skipped`/`skipReasons` to confirm whether the file was even visible to the agent. + ### Manual Session Download Download session logs and card data from a Trello card for debugging: diff --git a/docs/architecture/03-trigger-system.md b/docs/architecture/03-trigger-system.md index 4674433a..54b2ba70 100644 --- a/docs/architecture/03-trigger-system.md +++ b/docs/architecture/03-trigger-system.md @@ -145,7 +145,7 @@ Each trigger in a YAML agent definition can declare a `contextPipeline` — an o | `contextFiles` | Read key project files (README, etc.) | | `workItem` | Fetch work item details from PM tool | | `prepopulateTodos` | Pre-populate todo list from work item checklists | -| `prContext` | Fetch PR details, diff, reviews | +| `prContext` | Fetch PR details, compact per-file diffs, CI checks; emit a `SKIPPED FILES` injection when files are omitted (over budget, deleted, binary) | | `prConversation` | Fetch PR comments and review threads | | `pipelineSnapshot` | Fetch CI pipeline status | | `alertingIssue` | Fetch Sentry issue and event details | diff --git a/docs/plans/001-pr-review-correctness/1-checkout-and-pagination.md.done b/docs/plans/001-pr-review-correctness/1-checkout-and-pagination.md.done new file mode 100644 index 00000000..0ce14278 --- /dev/null +++ b/docs/plans/001-pr-review-correctness/1-checkout-and-pagination.md.done @@ -0,0 +1,245 @@ +--- +id: 001 +slug: pr-review-correctness +plan: 1 +plan_slug: checkout-and-pagination +level: plan +parent_spec: docs/specs/001-pr-review-correctness.md +depends_on: [] +status: done +--- + +# 001/1: Checkout Correctness & Pagination + +> Part 1 of 2 in the 001-pr-review-correctness plan. See [parent spec](../../specs/001-pr-review-correctness.md). + +## Summary + +This plan delivers the foundation: the worker correctly places the working tree at the PR head commit for **every** PR (internal or external-fork), it surfaces every step failure as a failed run instead of silently continuing, and it enumerates **every** changed file in the PR rather than the first 100. It also paginates every other GitHub endpoint that the review setup pipeline touches, so we don't carry the same latent bug class forward. + +After this plan ships, the specific PR #1092 hallucination class is substantially blocked even before plan 2 lands: with the working tree on the correct commit, any `Read` the agent does to "verify" a claim returns the actual PR-branch content, and the file enumeration is no longer truncated. What plan 1 does **not** fix is the 25K-token full-file-contents cap on the agent's pre-fetched view — that's plan 2. + +**Components delivered:** +- New `fetchAndCheckoutPR` helper in `src/agents/shared/repository.ts` using `+refs/pull/N/head:refs/remotes/pr/N` fetch spec, detached checkout, and `git rev-parse HEAD` SHA verification against the PR's head SHA. +- `setupRepository` and `refreshSnapshotWorkspace` reworked to call the helper when `prNumber` is set; both now throw on any non-zero git exit code. +- `SetupRepositoryOptions` and `AgentInput` extended with `prNumber` and `prHeadSha` (`prBranch` retained for human-readable logging only). +- `AgentInput` populated with `prHeadSha` at every trigger handler that constructs it for review runs. +- All seven `per_page: 100` call sites in `src/github/client.ts` converted to paginate to completion. +- New structured log fields: `Fetched PR ref`, `Resolved HEAD SHA`, `Total changed files`. + +**Deferred to plan 2:** +- Pre-fetch swap from full file contents to compact diffs. +- Skipped-file structured contract with the agent. +- Agent prompt template updates for the new context shape. +- Logging for `included / skipped + per-skip reason` (plan 1 only logs total file count from pagination). + +--- + +## Spec ACs satisfied by this plan + +- Spec AC #1 (External-fork → working tree at PR head) — **full** +- Spec AC #2 (>100 files → all enumerated) — **full** +- Spec AC #3 (Setup failure → run failed) — **full** +- Spec AC #4 (HEAD SHA mismatch → run failed before review) — **full** +- Spec AC #7 (Other paginated endpoints read to completion) — **full** +- Spec AC #8 (Operator log visibility) — **partial** (this plan adds: ref fetched, resolved HEAD SHA, total changed-file count; plan 2 adds: included vs skipped counts with per-skip reasons) +- Spec AC #9 (PR #1092 reproduction case produces no fabrications) — **partial** (this plan delivers correct working tree + complete file enumeration; plan 2 delivers the diff-shape that prevents partial-context hallucinations on very large PRs) + +--- + +## Depends On + +- _None._ This is the foundation plan. + +--- + +## Detailed Task List (TDD) + +### 1. Type extensions + +**Divergence from initial draft**: `AgentInput.headSha` already exists in `src/types/index.ts` and is widely populated by trigger handlers from `payload.pull_request.head.sha`. We reuse it instead of introducing a redundant `prHeadSha` field. Inside the helper signature we name the parameter `prHeadSha` for clarity at call site, fed from `input.headSha`. + +**Tests first** (extend `tests/unit/agents/shared/repository.test.ts`): +- `SetupRepositoryOptions accepts prNumber and prHeadSha` — type-only test using `expectTypeOf` from vitest. + +**Implementation** (`src/agents/shared/repository.ts`): +- Extend `SetupRepositoryOptions` interface: add `prNumber?: number` and `prHeadSha?: string`. + +**Implementation** (`src/types/index.ts`): +- Update the comment on `headSha` from "PR context fields for check-failure flow" to "PR's head SHA at trigger time — used for check-failure flow AND for post-checkout HEAD verification". No structural change. + +### 2. Helper: `fetchAndCheckoutPR` + +**Tests first** (`tests/unit/agents/shared/repository.test.ts` — extend existing 528-line file): +- `fetchAndCheckoutPR fetches refs/pull/N/head with the + force-update prefix` — mock `runCommand`; assert the first call's args are `['fetch', 'origin', '+refs/pull/1092/head:refs/remotes/pr/1092']`. +- `fetchAndCheckoutPR checks out detached pr/N after fetch` — assert second call: `['checkout', '--detach', 'pr/1092']`. +- `fetchAndCheckoutPR throws when fetch returns non-zero exit code` — mock fetch to return `{exitCode: 128, stderr: 'fatal: ...'}`; expect rejection with stderr substring in message. +- `fetchAndCheckoutPR throws when checkout returns non-zero exit code` — mock fetch ok, checkout fail; expect rejection. +- `fetchAndCheckoutPR verifies HEAD SHA matches expected when prHeadSha provided` — mock `git rev-parse HEAD` to return matching SHA; expect resolution. +- `fetchAndCheckoutPR throws on HEAD SHA mismatch` — `rev-parse` returns different SHA; expect rejection with both SHAs in message. +- `fetchAndCheckoutPR skips SHA check when prHeadSha is undefined` — no `rev-parse` call should occur; resolves cleanly. +- `fetchAndCheckoutPR logs fetched ref and resolved HEAD SHA on success` — assert `log.info` was called with both fields. + +**Implementation** (`src/agents/shared/repository.ts`): +- New `async function fetchAndCheckoutPR(repoDir: string, prNumber: number, prHeadSha: string | undefined, scmProvider: 'github' | 'gitlab' | undefined, log: AgentLogger): Promise`. +- Sequence: `git fetch origin ` → `git checkout --detach pr/${prNumber}` → optionally `git rev-parse HEAD` → compare with `prHeadSha`. +- The `` is provider-aware: GitHub → `+refs/pull/${prNumber}/head:refs/remotes/pr/${prNumber}`. The GitLab branch is **out of scope this plan** (depends on PR #1092 landing); for now, when `scmProvider !== 'github'`, throw an explicit `Error('fetchAndCheckoutPR: only GitHub is currently supported; GitLab support follows PR #1092 merge')`. The provider parameter is in the signature so the GitLab extension is a one-line follow-up. +- Each step: check `result.exitCode !== 0`, throw `new Error(...)` with stderr (last 500 chars). +- On HEAD SHA mismatch: throw `new Error(\`HEAD SHA mismatch after PR checkout: expected ${prHeadSha}, got ${actualSha}\`)`. +- On success: `log.info('PR checked out', { prNumber, ref: 'refs/pull/${prNumber}/head', headSha: prHeadSha ?? '(unverified)' })`. + +### 3. `setupRepository` cold-start path rework + +**Tests first** (extend `tests/unit/agents/shared/repository.test.ts`): +- `setupRepository calls fetchAndCheckoutPR when prNumber is set` — mock the helper; verify it's called with the right args. +- `setupRepository does not call git checkout ` — verify no `runCommand` call with args matching `['checkout', 'claude/cranky-johnson']` or similar. +- `setupRepository propagates fetchAndCheckoutPR errors` — helper throws; expect `setupRepository` to reject. +- `setupRepository works when prNumber is not set (non-PR runs)` — uses base branch from clone, no PR-ref fetch. + +**Implementation** (`src/agents/shared/repository.ts:151-155`): +- Replace the existing `if (prBranch) { ... await runCommand('git', ['checkout', prBranch], ...) }` block with `if (prNumber) { await fetchAndCheckoutPR(repoDir, prNumber, prHeadSha, log) }`. +- Update destructuring at line 111: `const { project, log, agentType, prNumber, prHeadSha, warmTsCache } = options;` (drop `prBranch` from active use; can still log it earlier if present). + +### 4. `refreshSnapshotWorkspace` (snapshot-reuse path) rework + +**Tests first** (extend `tests/unit/agents/shared/repository.test.ts`): +- `refreshSnapshotWorkspace fetches origin then PR ref when prNumber is set` — mock `runCommand`; assert sequence. +- `refreshSnapshotWorkspace throws on non-zero git fetch exit (no warn-and-continue)` — assert rejection, not warning. +- `refreshSnapshotWorkspace throws on non-zero git reset exit` — assert rejection. +- `refreshSnapshotWorkspace uses fetchAndCheckoutPR when prNumber is set instead of branch checkout` — assert helper is called. +- `refreshSnapshotWorkspace works for non-PR runs (uses baseBranch fetch + reset)` — assert legacy path. + +**Implementation** (`src/agents/shared/repository.ts:49-88`): +- Change all three "warn and continue" blocks (lines 62-67, 71-76, 80-85) to throw on non-zero exit code with the stderr in the error message. +- When `prNumber` is set, call `fetchAndCheckoutPR(repoDir, prNumber, prHeadSha, log)` instead of the existing `git reset --hard origin/${branch}` + `git checkout branch` pair. +- When `prNumber` is not set (non-PR runs), keep existing behavior but with throw-on-failure semantics. +- Pass `prHeadSha` through `refreshSnapshotWorkspace`'s signature. + +### 5. Plumb `prNumber` + `prHeadSha` through `setupRepository` callers + +**Tests first** (extend `tests/unit/backends/adapter.test.ts` if it exists, else create): +- `executeWithEngine forwards prNumber and headSha from input to setupRepository as prHeadSha` — mock `setupRepository`; verify call args include `prNumber: input.prNumber` and `prHeadSha: input.headSha`. + +**Implementation** (`src/backends/adapter.ts:27-33`): +- Update the `setupRepository({ ... })` call to forward `prNumber: input.prNumber` and `prHeadSha: input.headSha`. + +### 6. Verify `headSha` is populated on `AgentInput` at every PR trigger site + +**Divergence from initial draft**: since we reuse `headSha` (already populated by most PR triggers), this task narrows to a verification + gap-fill exercise rather than a wholesale field addition. + +**Tests first** (extend per-trigger test files in `tests/unit/triggers/github/`): +- For each PR-related trigger handler that constructs an `AgentInput`-shaped result: assert the test asserts `result.agentInput.headSha === payload.pull_request.head.sha` (or equivalent webhook field). +- Add a test only where the assertion is missing today. + +**Implementation**: +- `grep -rn "prNumber:" src/triggers/` — for every site that sets `prNumber` on an `AgentInput`-shaped object, confirm `headSha` is also set (sourced from the webhook payload's `pull_request.head.sha`, `check_suite.head_sha`, or equivalent). +- Add `headSha` where missing. +- Preliminary list (verify via grep during implementation): `src/triggers/github/check-suite-success.ts`, `src/triggers/github/pr-opened.ts`, `src/triggers/github/pr-review-submitted.ts`, `src/triggers/github/review-requested.ts`, `src/triggers/github/pr-comment-mention.ts`, `src/triggers/github/check-suite-failure.ts`. + +### 7. Pagination of all paginated GitHub client endpoints + +**Tests first** (`tests/unit/github/client.test.ts` — new file if absent, else extend): +- `getPRDiff paginates beyond 100 files` — stub Octokit to return 100 on page 1, 29 on page 2, `[]` on page 3; assert returned array has 129 entries. +- `getPRDiff stops when a page returns fewer than per_page` — stub to return 100, then 50; assert exactly 2 page requests, 150 total entries. +- `getPRReviewComments paginates to completion` — analogous test. +- `getPRIssueComments paginates to completion` — analogous test. +- `getPRReviews paginates to completion` — analogous test. +- `getCheckSuiteStatus paginates listWorkflowRunsForRepo` — stub multiple pages; assert all runs returned. +- `getCheckSuiteStatus paginates listJobsForWorkflowRun for each run` — assert per-run jobs list is complete. +- `getFailedWorkflowJobs paginates listWorkflowRunsForRepo and listJobsForWorkflowRun` — analogous. + +**Implementation** (`src/github/client.ts`): +- Replace each `const { data } = await getClient().({ ..., per_page: 100 })` with the Octokit `paginate` helper: `const data = await getClient().paginate(getClient()., { ..., per_page: 100 })`. +- Affected lines (per Phase 5 reconnaissance): 151 (`getPRReviewComments`), 272 (`getPRIssueComments`), 295 + 305 (`getCheckSuiteStatus` — 2 sites), 342 (`getPRDiff`), 428 + 446 (`getFailedWorkflowJobs` — 2 sites). Also page through `getPRReviews` at line 243-style call (verify exact line during implementation). +- For the `Promise.all(runs.map(...))` patterns at lines 299-308 and 439-450, the inner per-run jobs call also needs `paginate`. + +### 8. Structured logging for setup observability + +**Tests first** (extend `tests/unit/agents/shared/repository.test.ts`): +- `setupRepository logs fetched-ref, resolved-HEAD-SHA when PR checkout succeeds` — assert log.info call with structured fields. + +**Implementation** (`src/agents/shared/repository.ts`, `src/agents/definitions/contextSteps.ts`): +- After successful `fetchAndCheckoutPR`, log `Fetched PR ref` and `Resolved HEAD SHA` (already covered by helper in step 2). +- After `getPRDiff` returns the (now complete) file list at `contextSteps.ts:170`, log `Total changed files` with the count. Replace existing `Reading PR file contents { fileCount: prDiff.length }` log message — same field, but now after pagination the count is accurate. + +--- + +## Test Plan + +### Unit tests + +- [ ] `tests/unit/types/agentInput.test.ts`: 2 type-only tests +- [ ] `tests/unit/agents/shared/repository.test.ts`: ~14 new tests (`fetchAndCheckoutPR` helper + `setupRepository` + `refreshSnapshotWorkspace`) +- [ ] `tests/unit/backends/adapter.test.ts`: 1 new test (forwarding prNumber/prHeadSha) +- [ ] `tests/unit/github/client.test.ts`: ~9 new tests (one per paginated endpoint plus boundary cases) +- [ ] Per-trigger test files under `tests/unit/triggers/github/`: ~5-7 new tests (one per handler that builds `AgentInput` for PR runs) + +### Integration tests + +- [ ] _(Optional)_ `tests/integration/agents/external-pr-checkout.test.ts`: stand up a bare git repo with a `refs/pull/42/head` ref pointing to a commit not on any local branch; run `setupRepository` with `prNumber: 42`; assert the working tree HEAD matches the expected SHA. + +### Acceptance tests + +- [ ] Per-plan AC #1: external-fork PR test repo → checkout succeeds at PR head. +- [ ] Per-plan AC #2: 129-file fixture PR → `getPRDiff` returns 129 entries. +- [ ] Per-plan AC #3: stub `runCommand` to fail → run rejects with descriptive error. +- [ ] Per-plan AC #4: stub `git rev-parse` to return wrong SHA → rejects with both SHAs in error. +- [ ] Per-plan AC #5: every paginated endpoint test green. + +--- + +## Acceptance Criteria (per-plan, testable) + +1. Given an `AgentInput` with `prNumber: 1092` and `prHeadSha: '96f5136...'`, `setupRepository` produces a working tree whose `git rev-parse HEAD` equals `96f5136...`, regardless of whether `claude/cranky-johnson` exists on `origin`. +2. Given a PR with 129 changed files, `getPRDiff` returns an array of 129 entries. +3. Given any non-zero exit code from any git command in the setup pipeline, `setupRepository` rejects with an error message containing the failing command's stderr. +4. Given a `prHeadSha` that does not match the post-checkout `git rev-parse HEAD` output, `setupRepository` rejects with an error message containing both SHAs. +5. Each of the seven previously-truncated `per_page: 100` call sites in `src/github/client.ts` returns the full result set across multiple pages when stubbed to return non-trivial multi-page data. +6. The worker log for a successful PR review setup contains structured entries for: the fetched ref (e.g. `refs/pull/1092/head`), the resolved HEAD SHA, and the total changed-files count. +7. All new and modified code has corresponding tests. +8. `npm run build` passes. +9. `npm test` passes (unit suite). +10. `npm run lint` and `npm run typecheck` pass. +11. All documentation listed in Documentation Impact has been updated. + +--- + +## Documentation Impact (this plan only) + +| File | Change | +|---|---| +| `CLAUDE.md` | Add a paragraph in the "Development" section noting that PR review runs use `refs/pull/N/head` and verify HEAD SHA after checkout; document the fail-loud contract for setup errors. | +| `CHANGELOG.md` | Entry: `fix(review): correctly check out external-fork PRs and paginate all GitHub list endpoints (#001/1)`. Brief description of behavior change visible in run logs. | + +--- + +## Out of Scope (this plan) + +- **GitLab MR ref support in `fetchAndCheckoutPR`** — depends on PR #1092 (suda's GitLab SCM provider) landing. Once it merges, extending the helper to use `refs/merge-requests/N/head` for `scmProvider === 'gitlab'` is a one-line follow-up (the signature already accepts the parameter). +- Pre-fetch swap from full file contents to compact diffs (→ plan 2). +- Skipped-file structured contract with the agent (→ plan 2). +- Agent prompt template updates for the new context shape (→ plan 2). +- Logging of `included / skipped + per-skip reason` (→ plan 2). +- Ops-layer detection of historical runs where final HEAD ≠ PR head (spec OOS). +- Agentic multi-hop context gathering (spec OOS). +- Codebase-wide indexing (spec OOS). +- Running review against PR's merge commit instead of head (spec OOS). +- Multi-agent review orchestration (spec OOS). +- Changes to the review agent's model, prompt template, or output format (spec OOS — but plan 2 makes minor prompt changes that are scoped there). + +--- + +## Progress + + +- [x] AC #1 — refs/pull/N/head fetch + detached checkout verified by `setupRepository fetches and checks out PR via refs/pull/N/head when prNumber is provided` and `setupRepository — snapshot-reuse path uses fetchAndCheckoutPR (refs/pull/N/head) when prNumber is provided` +- [x] AC #2 — `getPRDiff uses octokit paginate (paginates beyond 100 files)` returns 129 +- [x] AC #3 — `setupRepository throws and stops when PR fetch fails (no silent continuation)`, snapshot-reuse `throws (no longer warns-and-continues) when git fetch/reset/checkout exits non-zero` +- [x] AC #4 — `setupRepository throws on HEAD SHA mismatch` (cold + snapshot paths) +- [x] AC #5 — `pagination — all paginated endpoints use octokit paginate` describe block (5 endpoints + 2 inner-loop calls) +- [x] AC #6 — helper logs `PR checked out` with `prNumber`, `ref`, `headSha`; `Total changed files in PR` log entry in contextSteps +- [x] AC #7 — TDD discipline; new tests for every behavior change +- [x] AC #8 — `npm run build` passes +- [x] AC #9 — `npm test` passes (7513 tests, 0 failures) +- [x] AC #10 — typecheck ✅; lint ✅ after `npm ci` synced biome to 2.4.10 (stale `node_modules` was holding 1.9.4 against a v2-syntax `biome.json`) +- [x] AC #11 — `CLAUDE.md` updated with new "PR Checkout (worker)" subsection; new `CHANGELOG.md` created with an Unreleased entry describing the fix diff --git a/docs/plans/001-pr-review-correctness/2-context-rework.md.done b/docs/plans/001-pr-review-correctness/2-context-rework.md.done new file mode 100644 index 00000000..98f0943d --- /dev/null +++ b/docs/plans/001-pr-review-correctness/2-context-rework.md.done @@ -0,0 +1,234 @@ +--- +id: 001 +slug: pr-review-correctness +plan: 2 +plan_slug: context-rework +level: plan +parent_spec: docs/specs/001-pr-review-correctness.md +depends_on: [1-checkout-and-pagination.md] +status: done +--- + +# 001/2: Context Pre-Fetch Rework — Compact Diffs & Skipped-File Contract + +> Part 2 of 2 in the 001-pr-review-correctness plan. See [parent spec](../../specs/001-pr-review-correctness.md). + +## Summary + +This plan re-shapes the data the review agent receives. The current pre-fetch echoes **full file contents** of changed files up to a 25K-token cap; this plan replaces it with **compact per-file diffs** (using GitHub's `file.patch` from the now-paginated diff endpoint), which scales with PR size rather than repo size and aligns with industry best practice for LLM code review. + +It also delivers the **skipped-file contract**: whenever any changed file's diff or content cannot be included in the agent's context (over budget, deleted, binary, missing patch), the agent receives an explicit, structured list of those filenames with a per-file reason and prompt guidance to fetch them on demand via its existing `Read`, `Grep`, and `Bash` (`gh pr diff`) tools. + +After this plan ships, the spec is fully delivered: external-fork PRs of any size produce reviews that reflect the actual change set, and the agent has both the compact view it needs and clear awareness of what it doesn't have. + +**Components delivered:** +- `readPRFileContents` removed; replaced by `extractPRDiffs` that produces per-file compact diffs from `PRDiffFile.patch`. +- `REVIEW_FILE_CONTENT_TOKEN_LIMIT` (full-file budget) replaced by `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` (compact-diff budget; larger ceiling appropriate for diff content). +- New `SkippedFile` type and structured "skipped files" injection format. +- `fetchPRContextStep` rewritten to inject diffs + skipped-file list (instead of the current full-file inclusion loop). +- `review.yaml` agent definition updated: prompt instructs the agent on the new context shape, names the `SKIPPED FILES` injection, and tells it to fetch those files via `Read` or `gh pr diff` when relevant. +- `formatPRDiff` enhanced to render compact per-file headers consistent with the new diff context. +- Logging: `included`, `skipped`, and per-skip `reason` surfaced in the run log. + +**Deferred to follow-up specs (already spec OOS):** +- Agentic multi-hop context gathering (reading dependent files automatically). +- Codebase-wide indexing. +- Adaptive diff-vs-full-file based on file size or change ratio. + +--- + +## Spec ACs satisfied by this plan + +- Spec AC #5 (Primary view = compact diffs) — **full** +- Spec AC #6 (Skipped files → structured list + guidance) — **full** +- Spec AC #8 (Operator log visibility) — **partial** (this plan adds: included vs skipped counts with per-skip reasons; plan 1 already added: ref, HEAD SHA, total changed-file count) +- Spec AC #9 (PR #1092 reproduction case produces no fabrications) — **partial** (this plan delivers the diff-shape that prevents partial-context hallucinations on very large PRs; plan 1 already delivered correct working tree + complete file enumeration) + +--- + +## Depends On + +- Plan 1 (`checkout-and-pagination`) — provides the complete, paginated changed-files list that this plan iterates over to produce compact diffs. + +--- + +## Detailed Task List (TDD) + +### 1. Configuration: replace token-limit constant + +**Tests first** (`tests/unit/config/reviewConfig.test.ts` — new file or extend): +- `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT exists and is a positive number` — sanity import test. +- `the constant is sized for diff content not full files` — assert it is at least 100_000 (diffs are dense, model is opus 4.6 with 1M-token window). + +**Implementation** (`src/config/reviewConfig.ts`): +- Remove `export const REVIEW_FILE_CONTENT_TOKEN_LIMIT = 25_000;`. +- Add `export const REVIEW_DIFF_CONTEXT_TOKEN_LIMIT = 200_000;` (number is illustrative; tune via load test in implementation; goal: comfortably fit median PR diff content). + +### 2. New types: `SkippedFile` and `PRDiffContext` + +**Tests first** (`tests/unit/agents/shared/prFormatting.test.ts` — extend existing 287-line file): +- `SkippedFile shape includes filename and reason` — type test using `expectTypeOf`. +- `PRDiffContext shape includes included diffs and skipped list` — type test. + +**Implementation** (`src/agents/shared/prFormatting.ts`): +- Add type: + ```typescript + export interface SkippedFile { + filename: string; + reason: 'over-budget' | 'no-patch' | 'binary' | 'deleted' | 'patch-too-large'; + } + export interface PRDiffContext { + included: Array<{ path: string; status: PRDiffFile['status']; diff: string }>; + skipped: SkippedFile[]; + } + ``` + +### 3. New function: `extractPRDiffs` + +**Tests first** (extend `tests/unit/agents/shared/prFormatting.test.ts`): +- `extractPRDiffs returns a diff for each PR file with a patch` — input: `[{filename: 'a.ts', patch: '@@ ...'}, {filename: 'b.ts', patch: '@@ ...'}]`; expect both in `included`. +- `extractPRDiffs marks files with status "removed" as skipped with reason "deleted"` — verify. +- `extractPRDiffs marks files without a patch (binary, large blob) as skipped with reason "no-patch"` — verify. +- `extractPRDiffs marks files with a patch larger than per-file cap as skipped with reason "patch-too-large"` — verify. +- `extractPRDiffs respects total-budget cap; once exceeded, remaining files go to skipped with reason "over-budget"` — verify ordering and cutoff. +- `extractPRDiffs returns deterministic ordering` — same input → same output (stable sort). +- `extractPRDiffs handles empty PR diff input` — returns `{included: [], skipped: []}`. + +**Implementation** (`src/agents/shared/prFormatting.ts`): +- Replace `readPRFileContents` with `extractPRDiffs(prDiff: PRDiffFile[]): PRDiffContext`. +- Iterate over `prDiff`, applying skip rules in order: deleted → no patch → patch over per-file cap → would exceed budget. Otherwise add to `included`. +- No filesystem reads — operates purely on the API response from `getPRDiff` (which already includes `patch` strings). +- Per-file compact diff format: a short header (`### {filename} ({status}, +N -M)`) followed by the unified diff hunk(s) from `file.patch`. + +### 4. Pre-fetch step rewrite + +**Tests first** (extend `tests/unit/agents/definitions/contextSteps.test.ts` if it exists, else create): +- `fetchPRContextStep injects compact diff context (not full files)` — mock `getPRDiff` to return PR-shaped data; assert one injection labeled "Pre-fetched PR diff context" containing per-file headers. +- `fetchPRContextStep injects a SKIPPED FILES section when files are skipped` — mock with files exceeding budget; assert injection labeled "Skipped files (fetch via Read or gh pr diff if relevant)" containing the filenames + reasons. +- `fetchPRContextStep does NOT inject the SKIPPED FILES section when nothing is skipped` — assert only the diff injection is present. +- `fetchPRContextStep logs included and skipped counts and per-skip reasons` — assert structured `log.info` call. + +**Implementation** (`src/agents/definitions/contextSteps.ts:198-204` and surrounding): +- Remove the `readPRFileContents` call and the loop at lines 206+ that pushes per-file injections. +- Call `extractPRDiffs(prDiff)` once. +- Push **one** injection containing the compact diff context (all included diffs in one block, with per-file headers). +- If `skipped.length > 0`, push a second injection labeled "Skipped files" that lists each filename + reason in a structured format the prompt template can reference. +- Log: `params.logWriter('INFO', 'PR context prepared', { included: ctx.included.length, skipped: ctx.skipped.length, skipReasons: countByReason(ctx.skipped) })`. + +### 5. Agent prompt template updates + +**Tests first** (`tests/unit/agents/definitions/review.yaml.test.ts` — new file, or extend existing prompt-validation tests): +- `review.yaml prompt mentions compact-diff context` — load template, assert it references the diff context name. +- `review.yaml prompt mentions SKIPPED FILES section and how to fetch` — assert presence of guidance text. +- `review.yaml prompt does not reference the old "full file contents" pre-fetch` — assert absence. + +**Implementation** (`src/agents/definitions/review.yaml`): +- Update the agent's instructions section to: + - Describe the new context shape: "You will receive compact per-file diffs labeled `### {filename}`." + - Describe the skipped-file contract: "If a `SKIPPED FILES` injection is present, those files were omitted to keep the context compact. When any of them is relevant to your review, fetch the patch via `gh pr diff -- ` or read the post-PR file content with `Read`." + - Remove any text that references "full file contents" pre-fetch (if present). +- Keep model, output format, and tool list unchanged. + +### 6. Cleanup: remove the old `readPRFileContents` + +**Tests first**: +- `readPRFileContents is no longer exported from prFormatting` — type-only test asserts the export is gone. +- `tests/unit/agents/shared/prFormatting.test.ts` — delete or rewrite the existing tests that exercised `readPRFileContents`; replace with `extractPRDiffs` tests in step 3. + +**Implementation**: +- Remove `readPRFileContents` and `PRFileContents` from `src/agents/shared/prFormatting.ts`. +- `grep -rn "readPRFileContents\|PRFileContents" src/ tests/` — confirm no remaining references; if any, update them. + +### 7. Logging for observability completion + +**Tests first** (extend `tests/unit/agents/definitions/contextSteps.test.ts`): +- `fetchPRContextStep emits a structured log entry containing included, skipped, and skipReasons map` — assert exact structure. + +**Implementation** (`src/agents/definitions/contextSteps.ts`): +- Replace `params.logWriter('INFO', 'File contents loaded', {...})` (the current line ~201) with the new `'PR context prepared'` log entry described in step 4. + +--- + +## Test Plan + +### Unit tests + +- [ ] `tests/unit/config/reviewConfig.test.ts`: 2 tests +- [ ] `tests/unit/agents/shared/prFormatting.test.ts`: ~10 new tests (`extractPRDiffs` happy path + each skip reason + budget cutoff + empty input + ordering) +- [ ] `tests/unit/agents/definitions/contextSteps.test.ts`: 4 new tests (diff injection, skipped-files injection, no-skipped path, logging) +- [ ] `tests/unit/agents/definitions/review.yaml.test.ts`: 3 prompt-template tests +- [ ] Existing `prFormatting.test.ts` tests for `readPRFileContents`: deleted (no longer applicable) + +### Integration tests + +- [ ] _(Optional)_ End-to-end against a fixture PR: paginated diff → `extractPRDiffs` → injection → assert agent context shape includes diff + skipped sections. + +### Acceptance tests + +- [ ] Per-plan AC #1: 200-file PR fixture → diff injection contains all files that fit; remaining listed as skipped with `over-budget` reason. +- [ ] Per-plan AC #2: PR with one binary file → skipped with `no-patch` reason. +- [ ] Per-plan AC #3: load `review.yaml`, assert prompt explicitly names the SKIPPED FILES injection. + +--- + +## Acceptance Criteria (per-plan, testable) + +1. Given a PR with files containing `patch` data, `extractPRDiffs` produces an `included` entry per file containing the patch text, and an empty `skipped` list (assuming the budget is not exceeded). +2. Given a PR file with `status: 'removed'`, `extractPRDiffs` places it in `skipped` with reason `deleted`. +3. Given a PR file with no `patch` field, `extractPRDiffs` places it in `skipped` with reason `no-patch`. +4. Given a sequence of files whose cumulative diff tokens exceed `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT`, `extractPRDiffs` places the overflow into `skipped` with reason `over-budget` in deterministic order. +5. `fetchPRContextStep` injects exactly one diff context block plus, if-and-only-if any files are skipped, a SKIPPED FILES injection containing each skipped filename and its reason. +6. `fetchPRContextStep` logs a single `PR context prepared` entry with `included`, `skipped`, and `skipReasons` fields. +7. `review.yaml`'s prompt text explicitly names the SKIPPED FILES injection and tells the agent to fetch those files via `Read` or `gh pr diff` when relevant. +8. The symbol `readPRFileContents` is no longer exported from `src/agents/shared/prFormatting.ts`, and no production code references `REVIEW_FILE_CONTENT_TOKEN_LIMIT` or `PRFileContents`. +9. All new and modified code has corresponding tests. +10. `npm run build` passes. +11. `npm test` passes (unit suite). +12. `npm run lint` and `npm run typecheck` pass. +13. All documentation listed in Documentation Impact has been updated. + +--- + +## Documentation Impact (this plan only) + +| File | Change | +|---|---| +| `CLAUDE.md` | Update the "Debugging Production Sessions" section: describe the new compact-diff context shape and the SKIPPED FILES injection. Add a note that the review agent will Read or `gh pr diff` skipped files on demand. | +| `docs/ARCHITECTURE.md` and/or `docs/architecture/` | Update the trigger → worker → agent context-flow narrative to reflect compact-diff pre-fetch (replacing the prior full-file-contents description). Include the skipped-file contract in the data-flow description. | +| `docs/adding-engines.md` | If the doc describes the context shape passed to a backend engine, update the section to reference compact diffs + skipped-file injections. If no such section exists, no change. | +| `CHANGELOG.md` | Entry: `feat(review): swap full-file pre-fetch for compact diffs and add SKIPPED FILES contract (#001/2)`. Brief description of agent-visible behavior change. | + +--- + +## Out of Scope (this plan) + +- Adaptive logic that selects diff-vs-full-content per file based on size or change ratio (potential future spec). +- Agentic multi-hop context gathering (spec OOS). +- Codebase-wide indexing (spec OOS). +- Ops-layer detection of historical runs where final HEAD ≠ PR head (spec OOS). +- Running review against PR's merge commit instead of head (spec OOS). +- Multi-agent review orchestration (spec OOS). +- Changes to the review agent's model or output format (spec OOS). +- Backporting the new context shape to other agent types (`implementation`, `respond-to-review`, etc.) — out of scope unless they share the literal `fetchPRContextStep` (plan must check during implementation; if shared, the cross-agent impact is documented but the prompt updates here apply only to `review.yaml`). + +--- + +## Progress + + +- [x] AC #1 — `extractPRDiffs returns an included entry per file with a patch` +- [x] AC #2 — `marks deleted files as skipped with reason "deleted"` +- [x] AC #3 — `marks files without a patch as skipped with reason "no-patch"` +- [x] AC #4 — `respects total-budget cap; overflow files go to skipped with reason "over-budget"` +- [x] AC #5 — `injects compact diff context (not full files)` + `does NOT inject a SKIPPED FILES section when nothing is skipped` +- [x] AC #6 — `logs PR context prepared with included, skipped, and skipReasons map` +- [x] AC #7 — `review.yaml prompt contract` describe block (4 tests) +- [x] AC #8 — `readPRFileContents`, `PRFileContents`, `REVIEW_FILE_CONTENT_TOKEN_LIMIT` removed; grep confirms no `src/` references +- [x] AC #9 — TDD discipline; new tests for every behavior change +- [x] AC #10 — `npm run build` passes +- [x] AC #11 — `npm test` passes (7533 tests, 0 failures; +20 from plan 1's 7513) +- [x] AC #12 — `npm run lint` and `npm run typecheck` clean (3 pre-existing complexity warnings in unrelated files) +- [x] AC #13 — `CLAUDE.md` (Debugging Production Sessions), `CHANGELOG.md` (Unreleased entry), `docs/architecture/03-trigger-system.md` (prContext table row) updated + +**Plan-divergence notes from execution:** +- Cross-agent impact handled inline: the SKIPPED FILES injection payload is self-documenting (lists files + reasons + fetch guidance), so the four other agents that consume `prContext` (respond-to-ci, respond-to-pr-comment, respond-to-review, resolve-conflicts) get the contract automatically without per-YAML prompt edits. Only `review.yaml` was updated explicitly per AC #7. diff --git a/docs/plans/001-pr-review-correctness/_coverage.md b/docs/plans/001-pr-review-correctness/_coverage.md new file mode 100644 index 00000000..d92b94dd --- /dev/null +++ b/docs/plans/001-pr-review-correctness/_coverage.md @@ -0,0 +1,56 @@ +# Coverage map for spec 001-pr-review-correctness + +Auto-generated by /plan. Tracks which plans satisfy which spec ACs. + +## Spec ACs + +| # | Spec AC (short) | Satisfied by | Status | +|---|---|---|---| +| 1 | External-fork PR → working tree at PR head commit | plan 1 (checkout-and-pagination) | full | +| 2 | PR with >100 changed files → agent's enumeration includes every file | plan 1 (checkout-and-pagination) | full | +| 3 | Setup pipeline failure → run marked failed (no silent continuation) | plan 1 (checkout-and-pagination) | full | +| 4 | HEAD SHA mismatch after checkout → run failed before review begins | plan 1 (checkout-and-pagination) | full | +| 5 | Agent's primary view is compact per-file diffs (not full file contents) | plan 2 (context-rework) | full | +| 6 | Skipped files surfaced as structured list with per-file reason + fetch guidance | plan 2 (context-rework) | full | +| 7 | All paginated endpoints in review setup pipeline read to completion | plan 1 (checkout-and-pagination) | full | +| 8 | Operator can determine from logs: ref fetched, HEAD SHA, total file count, included/skipped counts with reasons | plan 1 (ref + SHA + total count) + plan 2 (included/skipped + reasons) | partial chain | +| 9 | PR #1092 reproduction case: external + 129 files → correct HEAD, all files enumerated, no fabrications | plan 1 (correct HEAD + complete enumeration) + plan 2 (correct context shape prevents fabrications on large PRs) | partial chain | + +## Coverage summary + +- **9 spec ACs** mapped to **2 plans** +- **7 plans-AC mappings** are full coverage (one plan delivers the AC end-to-end) +- **2 plans-AC mappings** are partial chains (AC #8 and AC #9 require both plans to be done before the spec AC is fully delivered) +- Every spec AC is mapped to at least one plan; no AC is dropped +- Both plans satisfy at least one full-coverage spec AC, so each is independently testable for forward progress against the spec + +## Plan dependency graph + +``` +1-checkout-and-pagination ──→ 2-context-rework +``` + +Linear DAG, no cycles. Plan 1 ships independent value (substantially mitigates the PR #1092 incident class) before plan 2 lands. Plan 2 cannot be implemented or tested in isolation — it depends on plan 1's complete file enumeration to produce a non-truncated diff context. + +## Strategic decisions traceability (from spec) + +| Spec strategic decision | Implemented in | +|---|---| +| 1. Canonical `refs/pull/N/head` for all PRs | plan 1, task 2 (`fetchAndCheckoutPR`) | +| 2. Fail-loud philosophy across setup | plan 1, tasks 2-4 | +| 3. Mandatory HEAD SHA verification | plan 1, task 2 | +| 4. Compact diffs as default agent view | plan 2, tasks 1-4 | +| 5. Structured skipped-file contract | plan 2, tasks 2-5 | +| 6. Pagination on every paginated endpoint | plan 1, task 7 | +| 7. Scope boundary (ops/agentic/indexing OOS) | both plans (carried through "Out of Scope" sections) | + +## Out-of-scope items (spec-level, repeated for clarity) + +These are not addressed by either plan: + +- Ops-layer detection of historical runs where final HEAD ≠ PR head +- Agentic multi-hop context gathering (e.g., following symbol references) +- Codebase-wide indexing (Greptile-style) +- Running review against PR's merge commit instead of head +- Multi-agent review orchestration +- Changes to the review agent's model or output format diff --git a/docs/specs/001-pr-review-correctness.md.done b/docs/specs/001-pr-review-correctness.md.done new file mode 100644 index 00000000..652e0378 --- /dev/null +++ b/docs/specs/001-pr-review-correctness.md.done @@ -0,0 +1,139 @@ +--- +id: 001 +slug: pr-review-correctness +level: spec +title: PR Review Correctness — External Forks, Large PRs, and Pre-Fetch Strategy +created: 2026-04-15 +status: done +--- + +# 001: PR Review Correctness — External Forks, Large PRs, and Pre-Fetch Strategy + +## Problem & Motivation + +On 2026-04-14, the automated review agent posted a `CHANGES_REQUESTED` review on PR #1092 (CASCADE's first external contribution, from @suda, adding GitLab SCM support). The review flagged **three blocking issues**, all of which were fabricated — the code the bot claimed was broken was in fact correct on the PR branch. The cost was $7.24, 6m 49s of compute, and real reputational damage: an external contributor's first PR was met with a confidently wrong review. + +Root-cause analysis revealed two independent, latent defects that compounded: + +1. **Silent working-tree mismatch.** The worker clones `origin` (our monorepo) on its configured base branch, then attempts to check out the PR branch by name. For internal PRs the branch exists on origin and the checkout succeeds. For external PRs the branch lives on the contributor's fork, not on origin — the checkout silently fails with a non-zero exit code that the surrounding code discards. The worker proceeds on the base branch believing it is on the PR branch, and every file read returns pre-PR content. +2. **Truncated context.** The review agent's pre-fetch loads **full file contents** of each changed PR file up to a 25,000-token cap, and fetches files via a single page of the GitHub files API (capped at 100). PR #1092 had 129 files; only 20 files' content fit under the token cap. The agent's view of "what changed" was a subset of what actually changed, and the files that would have refuted its false claims were among those skipped. + +This never manifested before because no prior PR was both (a) from an external fork **and** (b) large enough to overflow the pre-fetch budget. Both conditions will recur — every external contribution and every sufficiently large internal PR is at risk. + +Beyond the immediate correctness failures, the pre-fetch design is fundamentally mis-shaped. Industry best practice for LLM code review is to feed the model **compact diffs** (scaling with PR size) and let the agent fetch file content on demand through its existing tools. Our current design inverts this: we echo full files (scaling with repo size) and give the agent no signal when its view is incomplete. + +--- + +## Goals + +1. Review runs on external-fork PRs read and reason about the **contributor's actual changes**, not the base branch. +2. Review runs on large PRs see **all** files the PR modifies — no silent truncation of the changed-files list. +3. When the pre-fetch cannot include every file's content, the agent knows **which files were omitted** and has clear guidance to fetch them on demand. +4. When any step of the review-setup pipeline fails (clone, fetch, checkout, verification), the run is marked failed rather than proceeding on a degraded state. +5. The review agent's context budget is spent on high-signal content (diffs) rather than low-signal content (unchanged lines of a changed file). + +--- + +## Non-goals + +- Changing the review agent's output format, prompt style, or model. +- Introducing multi-hop / agentic context gathering beyond what the existing tool-call surface provides. +- Indexing the repository into a searchable graph (Greptile-style). +- Running the review agent against the PR's merge-commit (as opposed to the head commit). +- Changing how internal (same-origin) PR branches are referred to by users, dashboards, or logs. +- Altering any database schema or migration. + +--- + +## Constraints + +- **TDD-first.** Every behavior change must be preceded by a failing test that demonstrates the bug or the new requirement. +- **No hacks, no workarounds, no half-measures.** If a code path silently tolerates errors today, it must fail loudly going forward — no "keep the old path for safety" fallbacks. +- **No regression for the internal-PR happy path.** Runs triggered by same-origin PRs must continue to succeed with equivalent or better fidelity. +- **No breaking change to project credentials, DB schema, or configuration surface.** The contract with operators (environment variables, project config fields, stored credentials) is unchanged. +- **Observable via existing logging and run-tracking.** New failure modes must surface in the normal run logs and the dashboard's run status — no new telemetry pipeline. +- **Latency budget preserved.** The review run's end-to-end duration should not materially increase. Pagination and per-file diff extraction must add no more than a small constant to setup time. + +--- + +## Requirements + +1. **R1 — External-fork checkout.** The review worker must successfully place the working tree at the PR head commit when the PR originates from an external fork, regardless of whether the branch exists on the base-repo remote. +2. **R2 — Fail-loud setup.** Every git operation in the review setup path must surface non-zero exit codes as failed runs. There are no "warn and continue" paths in setup. +3. **R3 — HEAD verification.** After checkout completes, the worker must verify that the resulting HEAD commit matches the PR's head commit as reported by the GitHub API. A mismatch is a failed run. +4. **R4 — Complete changed-files enumeration.** The pre-fetch must enumerate **every** file the PR changes, not only the first page returned by the API. +5. **R5 — Diff-shaped context.** The pre-fetch must feed the agent compact per-file diffs, not full file contents, as the default representation of PR changes. +6. **R6 — Skipped-file transparency.** When content cannot be pre-fetched (e.g., over budget, deleted, binary), the agent receives a structured list of omitted files and explicit guidance to fetch them through its existing tools when relevant. +7. **R7 — Pagination elsewhere in the review pipeline.** Other paginated GitHub endpoints consumed during review setup (PR commits, PR comments, check runs, reviews) are read to completion, not truncated at the first page. +8. **R8 — Post-fix observability.** Run logs make visible: which ref was fetched, the resolved HEAD SHA, the count of changed files discovered, the count of files whose content was included vs skipped, and the reason for any skip. + +--- + +## Research Notes + +- GitHub exposes every pull request as a ref at `refs/pull/N/head` on the base repository, regardless of whether the PR originates from a same-repo branch or an external fork. This ref pattern is the canonical way to check out any PR. Reference: [GitHub Docs — Checking out pull requests locally](https://docs.github.com/articles/checking-out-pull-requests-locally). +- The `actions/checkout` GitHub Action uses exactly this pattern when handling PRs from forks, and serves as a reference implementation for a robust PR checkout sequence. Reference: [actions/checkout](https://github.com/actions/checkout). +- LLM code review quality degrades as input context grows, even when the window is not technically full — "context rot." Loading full file contents when only a handful of lines changed is both wasteful and counterproductive. Reference: [Morph LLM — Context Rot](https://www.morphllm.com/context-rot). +- Current best practice in LLM-driven code review (per Martin Fowler's context-engineering survey) is to use compact per-file diffs with explicit file-path headers, reserving full-file reads for the agent to perform on demand via its tool surface. Reference: [Martin Fowler — Context Engineering for Coding Agents](https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html). +- Commercially deployed AI code review tools split into two architectures: "diff-only" (CodeRabbit-class, faster, lower noise) and "codebase-indexed + multi-hop" (Greptile-class, higher bug-catch, more cost). CASCADE's review agent is diff-class today, with an existing tool surface (Read, Grep, Bash) that enables limited on-demand exploration. Moving to compact diffs fits this class well; adopting codebase indexing is a larger, separate effort. Reference: [Greptile vs CodeRabbit (2025)](https://www.greptile.com/greptile-vs-coderabbit). +- GitHub's `pulls/N/files` endpoint has a documented hard cap of 3000 files per PR and a per-page cap of 100; pagination is mandatory for any PR larger than 100 files. + +--- + +## Open Source Decisions + +| Tool | Solves | Decision | Reason | +|------|--------|----------|--------| +| [Octokit paginate-rest plugin](https://github.com/octokit/plugin-paginate-rest.js) | Iterating all pages of any paginated GitHub REST endpoint | **Use** | Already bundled with `@octokit/rest`; idiomatic, well-tested, replaces per-site hand-rolled pagination | +| [actions/checkout (as reference impl)](https://github.com/actions/checkout) | How a robust PR-fork checkout sequence is structured | **Copy pattern** | Not consumable as a library; but its `refs/pull/N/head` fetch sequence is the industry-standard recipe to mirror | +| `simple-git`, `nodegit`, and other JS git wrappers | Abstracting direct git-command invocation | **Skip** | The codebase already invokes git through a uniform subprocess helper; swapping to a library would be churn without clear benefit | +| Existing "compact diff" libraries (unified-diff renderers, etc.) | Formatting per-file patches for LLM consumption | **Skip, write in-house** | GitHub's PR files API already returns per-file patch strings; formatting is a thin wrapper around that data | + +--- + +## Strategic decisions + +1. **Checkout pattern.** Use GitHub's canonical `refs/pull/N/head` ref for every PR checkout regardless of origin (same-repo or external fork). Rejected: keeping the branch-name path for same-repo PRs with a branch on origin. Reason: one code path, one failure mode, one test surface; eliminates the entire "branch-not-on-origin" class of bug. +2. **Error-handling philosophy.** Fail-loud throughout the review setup pipeline. Any non-zero git exit, any failed API call, any unexpected state transitions the run to failed. Rejected: warn-and-continue with best-effort. Reason: silent partial success is what caused the incident; an honest failure surfaces the problem to operators and retries. +3. **HEAD verification.** Mandatory SHA equality check between `git rev-parse HEAD` and the PR's head commit as reported by the GitHub API, executed after checkout and before any review work. Rejected: optional / debug-gated. Reason: cheap final defense against any future bug that produces the wrong working tree. +4. **Context shape.** Default representation of PR changes fed to the agent is **compact per-file diffs**, not full file contents. Rejected: raising the full-file token cap; rejected: hybrid thresholds. Reason: aligns with industry practice, scales with PR size rather than repo size, mitigates context rot, and leverages the agent's existing tool surface for on-demand file reads. +5. **Skipped-file contract with the agent.** Whenever the pre-fetch omits a file's content or diff for any reason, the agent receives a structured, explicitly labeled list of omitted files with per-file reasons, and prompt guidance that instructs it to fetch omitted files via its tools when they are relevant to the review. Rejected: silent omission. Reason: the silent-omission pattern is precisely what enabled the PR #1092 incident. +6. **Pagination coverage.** Pagination is applied to every paginated GitHub endpoint consumed during review setup, not only the immediate `pulls/N/files` fix. Rejected: spot-fixing only the one endpoint that failed this time. Reason: the same latent bug class exists on neighboring endpoints; one comprehensive pass prevents a future repeat. +7. **Scope boundary.** In scope: git checkout correctness, pre-fetch rework to diffs, skipped-file contract, pagination coverage, HEAD verification, fail-loud setup. Out of scope: ops-layer detection of past `HEAD ≠ PR head` runs; agentic multi-hop context gathering; codebase indexing. Rejected: bundling ops tooling. Reason: different failure class, different owner, different test strategy, different deploy cadence. + +--- + +## Acceptance Criteria (outcome-level) + +1. When an external contributor (from a fork) opens a pull request that triggers a review, the review run's final working tree is at the PR's head commit. +2. When a pull request contains more than 100 changed files, the review agent's context includes every one of those files in its enumeration of what changed. +3. When any step of the review setup pipeline fails (clone, fetch, checkout, API call), the run is marked **failed** in the dashboard, with a log entry identifying the failing step. No such failure silently proceeds to a completed review. +4. When the working-tree SHA after setup differs from the PR's head SHA as reported by the GitHub API, the run is marked failed before any review work begins. +5. The agent's primary view of PR changes is compact per-file diffs. Full unchanged lines of changed files are not echoed into the agent's context by default. +6. When the pre-fetch omits any changed file's content or diff (for any reason), the agent receives an explicit, structured list of omitted filenames with a short reason for each, and prompt guidance telling it to fetch those files via its existing tools when relevant. +7. Other paginated GitHub endpoints consumed during review setup (PR commits, PR reviews, PR issue comments, check runs) are read to completion — the count returned to the agent matches the total the API reports. +8. An operator investigating a review run can determine from the run's log: which PR ref was fetched, the resolved HEAD SHA, the total number of changed files, and the count of included vs skipped content entries with per-skip reasons. +9. A review run on the reproduction case from PR #1092 (an external-fork PR with more than 100 changed files) produces a working tree at the correct head SHA, enumerates all changed files, and does not fabricate missing-export or wrong-import claims about files it has not read. + +--- + +## Documentation Impact (high-level) + +- `CLAUDE.md` — the "Development" and "Debugging Production Sessions" sections reference the review flow; they need updated notes on the new checkout behavior, diff-based pre-fetch, and skipped-file contract. +- `docs/architecture/` and `docs/ARCHITECTURE.md` — the trigger-to-agent data flow diagrams and descriptions need updating to reflect compact-diff context rather than full-file pre-fetch. +- `CHANGELOG.md` — entry describing the incident, the fix, and the behavior change visible to anyone inspecting review logs. +- `docs/adding-engines.md` — if engine onboarding docs describe the context shape passed to the agent, they need updating to describe diffs + skipped-file list rather than full file contents. +- `README.md` — brief mention in the "Dashboard" or "Running the Router" section only if the failure-mode change is visible to a fresh operator; otherwise no change. +- Per-file granular documentation edits are deferred to the downstream plan(s). + +--- + +## Out of Scope + +- Ops-layer detection of historical runs where the final HEAD did not match the PR head — a separate operational-monitoring concern. +- Agentic multi-hop context gathering, where the agent autonomously expands its context by following symbol references, git history, or related files — a larger architectural shift deferred to a future spec. +- Codebase-wide indexing (Greptile-style code graph) — a separate spec, requires distinct infrastructure. +- Running the review against the PR's generated merge commit rather than the head commit — distinct strategic choice with its own trade-offs, not driven by this incident. +- Multi-agent review orchestration (parallel specialized reviewers) — separate concern, future exploration. +- Changes to the review agent's model, prompt template, or output format. +- Changes to non-review agent types. This spec addresses the review agent specifically; if adjacent agent types (implementation, respond-to-review, etc.) share the vulnerable checkout code, those fixes are a natural side effect but their agent-level behavior is not being re-specified here. diff --git a/src/agents/definitions/contextSteps.ts b/src/agents/definitions/contextSteps.ts index 9c59e6b0..bf58b149 100644 --- a/src/agents/definitions/contextSteps.ts +++ b/src/agents/definitions/contextSteps.ts @@ -24,12 +24,15 @@ import type { AgentInput, ProjectConfig } from '../../types/index.js'; import { parseRepoFullName } from '../../utils/repo.js'; import type { ContextInjection, LogWriter } from '../contracts/index.js'; import { + countSkipsByReason, + extractPRDiffs, formatPRComments, formatPRDetails, formatPRDiff, + formatPRDiffContext, formatPRIssueComments, formatPRReviews, - readPRFileContents, + formatSkippedFilesInjection, } from '../shared/prFormatting.js'; import type { ContextFile } from '../utils/setup.js'; @@ -195,20 +198,36 @@ export async function fetchPRContextStep(params: FetchContextParams): Promise 0) { injections.push({ - toolName: 'ReadFile', - params: { comment: `Pre-fetching ${file.path} for review`, filePath: file.path }, - result: `path=${file.path}\n\n${file.content}`, - description: `Pre-fetched ${file.path}`, + toolName: 'SkippedFiles', + params: { + comment: 'PR files omitted from the compact context — fetch on demand if relevant', + prNumber, + }, + result: formatSkippedFilesInjection(diffContext.skipped, prNumber), + description: 'Skipped files', }); } diff --git a/src/agents/definitions/review.yaml b/src/agents/definitions/review.yaml index e1135f63..9839ebdb 100644 --- a/src/agents/definitions/review.yaml +++ b/src/agents/definitions/review.yaml @@ -62,6 +62,27 @@ prompts: taskPrompt: | Review PR #<%= it.prNumber %>. + ## How your context is shaped + + You receive a **compact per-file diff context** — each changed file appears + as a section headed `### (, +N -M)` followed by a unified + diff hunk. Only the changed lines are in your context; unchanged parts of + a file are not echoed. Use `Read ` if you need surrounding code. + + If a **`SKIPPED FILES`** injection is present, those files were omitted from + the compact context (because they were deleted, binary, their patch was too + large, or the cumulative diff budget was reached). When any skipped file is + relevant to your review, fetch it on demand: + + - `gh pr diff <%= it.prNumber %> -- ` to read the file's patch + - `Read ` to read the post-PR file content + - `Grep ` to search inside the file + + You are not expected to review every skipped file — only fetch when the PR + description or another file points you at it. + + ## Your task + Examine the code changes carefully and submit your review using CreatePRReview. hooks: diff --git a/src/agents/shared/prFormatting.ts b/src/agents/shared/prFormatting.ts index a9c4eb67..23752661 100644 --- a/src/agents/shared/prFormatting.ts +++ b/src/agents/shared/prFormatting.ts @@ -1,7 +1,4 @@ -import { readFile } from 'node:fs/promises'; -import { join } from 'node:path'; - -import { estimateTokens, REVIEW_FILE_CONTENT_TOKEN_LIMIT } from '../../config/reviewConfig.js'; +import { estimateTokens, REVIEW_DIFF_CONTEXT_TOKEN_LIMIT } from '../../config/reviewConfig.js'; import type { githubClient } from '../../github/client.js'; type PRDetails = Awaited>; @@ -105,48 +102,142 @@ export function formatPRIssueComments(prIssueComments: PRIssueComments): string } // ============================================================================ -// PR File Contents Reading +// PR Diff Context (compact per-file diffs for review agent) // ============================================================================ -export interface PRFileContents { - included: Array<{ path: string; content: string }>; - skipped: string[]; +/** + * Reason a PR file's diff was omitted from the compact context. + * + * - `deleted`: file was removed in the PR (no meaningful diff to read). + * - `no-patch`: GitHub didn't return a patch (typically binary blobs or files + * too large for the diff API). + * - `patch-too-large`: the individual file's patch would exceed the per-file cap. + * - `over-budget`: the cumulative context budget was reached before this file. + */ +export type SkipReason = 'deleted' | 'no-patch' | 'patch-too-large' | 'over-budget'; + +export interface SkippedFile { + filename: string; + reason: SkipReason; +} + +export interface PRDiffContext { + included: Array<{ path: string; status: PRDiff[number]['status']; diff: string }>; + skipped: SkippedFile[]; +} + +/** Per-file cap: any single file's patch over this is skipped as "patch-too-large". */ +const PER_FILE_TOKEN_CAP = Math.floor(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT / 10); + +/** + * Format a single file's compact diff section with a consistent header. + * Kept internal — callers consume `PRDiffContext.included[].diff` directly. + */ +function formatCompactDiff(file: PRDiff[number]): string { + const header = `### ${file.filename} (${file.status}, +${file.additions} -${file.deletions})`; + return `${header}\n\n\`\`\`diff\n${file.patch ?? ''}\n\`\`\``; } /** - * Read the full contents of changed PR files up to a token limit. + * Produce a compact-diff context for the review agent from the PR's changed-files + * list. Scales with PR size (not repo size). Files that can't be included are + * surfaced explicitly so the agent can decide whether to fetch them on demand + * via `Read` or `gh pr diff`. * - * Shared between the llmist review agent (agents/review.ts) and the - * Claude Code backend (backends/agent-profiles.ts). + * Skip rules, evaluated in order per file: + * 1. `status === 'removed'` → `skipped` with reason `deleted` + * 2. `patch` missing/empty → `skipped` with reason `no-patch` + * 3. patch token estimate > per-file cap → `skipped` with reason `patch-too-large` + * 4. cumulative budget would exceed `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` → `skipped` with reason `over-budget` + * 5. otherwise → `included` * - * @param repoDir - The local repository directory - * @param prDiff - The PR diff file list from GitHub - * @returns Files that fit within the token limit and those that were skipped + * Input order is preserved — output ordering mirrors GitHub's response, which + * is stable for a given PR head SHA. */ -export async function readPRFileContents(repoDir: string, prDiff: PRDiff): Promise { - const included: Array<{ path: string; content: string }> = []; - const skipped: string[] = []; +export function extractPRDiffs(prDiff: PRDiff): PRDiffContext { + const included: PRDiffContext['included'] = []; + const skipped: SkippedFile[] = []; let totalTokens = 0; for (const file of prDiff) { - // Skip deleted/binary files - if (file.status === 'removed' || !file.patch) continue; - - const filePath = join(repoDir, file.filename); - try { - const content = await readFile(filePath, 'utf-8'); - const tokens = estimateTokens(content); - - if (totalTokens + tokens <= REVIEW_FILE_CONTENT_TOKEN_LIMIT) { - included.push({ path: file.filename, content }); - totalTokens += tokens; - } else { - skipped.push(file.filename); - } - } catch { - // File might not exist (renamed from), skip + if (file.status === 'removed') { + skipped.push({ filename: file.filename, reason: 'deleted' }); + continue; } + if (!file.patch) { + skipped.push({ filename: file.filename, reason: 'no-patch' }); + continue; + } + const diff = formatCompactDiff(file); + const tokens = estimateTokens(diff); + if (tokens > PER_FILE_TOKEN_CAP) { + skipped.push({ filename: file.filename, reason: 'patch-too-large' }); + continue; + } + if (totalTokens + tokens > REVIEW_DIFF_CONTEXT_TOKEN_LIMIT) { + skipped.push({ filename: file.filename, reason: 'over-budget' }); + continue; + } + included.push({ path: file.filename, status: file.status, diff }); + totalTokens += tokens; } return { included, skipped }; } + +/** + * Render a `PRDiffContext` as a single string block suitable for a pre-fetch + * injection. Includes all per-file diffs back-to-back; skipped files are the + * responsibility of a separate, structured `SKIPPED FILES` injection. + */ +export function formatPRDiffContext(ctx: PRDiffContext): string { + if (ctx.included.length === 0) { + return 'No file diffs available for this PR.'; + } + return ctx.included.map((f) => f.diff).join('\n\n'); +} + +/** + * Render the skipped-file list as a self-documenting block — includes both the + * filenames/reasons AND the fetch guidance, so agents that receive this + * injection know what to do regardless of their per-agent prompt. + */ +export function formatSkippedFilesInjection( + skipped: SkippedFile[], + prNumber: number | undefined, +): string { + if (skipped.length === 0) return ''; + + const header = [ + `The following ${skipped.length} file(s) from the PR were omitted from the compact-diff context to keep the review focused:`, + '', + ]; + + const lines = skipped.map((s) => `- \`${s.filename}\` (${s.reason})`); + + const guidance = [ + '', + 'If any of these files are relevant to your review, fetch them on demand:', + prNumber !== undefined + ? ` • \`gh pr diff ${prNumber} -- \` to read the patch` + : ' • `gh pr diff -- ` to read the patch', + ' • `Read ` to read the post-PR file content', + ' • `Grep ` to search inside the file', + '', + 'You are not expected to review every skipped file — only fetch when the PR description or another file points you at it.', + ]; + + return [...header, ...lines, ...guidance].join('\n'); +} + +/** Group skipped files by reason into a `{ reason: count }` map for logging. */ +export function countSkipsByReason(skipped: SkippedFile[]): Record { + const counts: Record = { + deleted: 0, + 'no-patch': 0, + 'patch-too-large': 0, + 'over-budget': 0, + }; + for (const s of skipped) counts[s.reason]++; + return counts; +} diff --git a/src/agents/shared/repository.ts b/src/agents/shared/repository.ts index 90b6809f..f0e56489 100644 --- a/src/agents/shared/repository.ts +++ b/src/agents/shared/repository.ts @@ -10,10 +10,84 @@ export interface SetupRepositoryOptions { project: ProjectConfig; log: AgentLogger; agentType: string; + /** PR number — when provided, drives `refs/pull/N/head` checkout instead of branch-name checkout. */ + prNumber?: number; + /** Expected HEAD SHA — when provided, post-checkout verification compares `git rev-parse HEAD` against this. */ + prHeadSha?: string; + /** + * @deprecated Retained for human-readable logging only. PR checkout uses `prNumber` via the canonical + * `refs/pull/N/head` ref. Branch names from forks are not on `origin` and cannot be checked out by name. + */ prBranch?: string; warmTsCache?: boolean; } +/** + * Fetch a PR via its canonical pull-request ref and check out the resulting commit + * on a detached HEAD. This works for PRs from same-repo branches AND from external + * forks — the fork's branch name is irrelevant because GitHub mirrors every PR head + * onto the base repo at `refs/pull/N/head`. + * + * Fail-loud: any non-zero git exit code throws. When `prHeadSha` is provided, the + * post-checkout HEAD is verified to match. + * + * Provider-aware signature: GitHub is implemented today. GitLab support is deferred + * until PR #1092 lands — the parameter is in the signature so the extension is a + * one-line follow-up. + */ +export async function fetchAndCheckoutPR( + repoDir: string, + prNumber: number, + prHeadSha: string | undefined, + scmProvider: 'github' | 'gitlab' | undefined, + log: AgentLogger, +): Promise { + const provider = scmProvider ?? 'github'; + if (provider !== 'github') { + throw new Error( + `fetchAndCheckoutPR: only GitHub is currently supported; GitLab support follows PR #1092 merge`, + ); + } + + const ref = `refs/pull/${prNumber}/head`; + const refspec = `+${ref}:refs/remotes/pr/${prNumber}`; + log.info('Fetching PR ref', { prNumber, ref }); + + const fetchResult = await runCommand('git', ['fetch', 'origin', refspec], repoDir); + if (fetchResult.exitCode !== 0) { + throw new Error( + `git fetch PR ref failed (exit ${fetchResult.exitCode}): ${fetchResult.stderr.slice(-500)}`, + ); + } + + const checkoutResult = await runCommand( + 'git', + ['checkout', '--detach', `pr/${prNumber}`], + repoDir, + ); + if (checkoutResult.exitCode !== 0) { + throw new Error( + `git checkout PR failed (exit ${checkoutResult.exitCode}): ${checkoutResult.stderr.slice(-500)}`, + ); + } + + if (prHeadSha) { + const revParseResult = await runCommand('git', ['rev-parse', 'HEAD'], repoDir); + const actualSha = revParseResult.stdout.trim(); + if (actualSha !== prHeadSha) { + throw new Error( + `HEAD SHA mismatch after PR checkout: expected ${prHeadSha}, got ${actualSha}`, + ); + } + } + + log.info('PR checked out', { + prNumber, + ref, + headSha: prHeadSha ?? '(unverified)', + }); +} + /** * Resolve the path to the existing workspace directory for a snapshot-reuse run. * @@ -51,37 +125,39 @@ async function refreshSnapshotWorkspace( project: ProjectConfig, log: AgentLogger, agentType: string, - prBranch?: string, + prNumber?: number, + prHeadSha?: string, ): Promise { - const branch = prBranch ?? project.baseBranch ?? 'main'; + // PR-driven runs use the canonical refs/pull/N/head ref (works for forks). + if (prNumber) { + log.info('Refreshing snapshot workspace for PR', { repoDir, prNumber, agentType }); + await fetchAndCheckoutPR(repoDir, prNumber, prHeadSha, undefined, log); + return; + } + // Non-PR runs: fetch + reset + checkout the project's base branch. Fail-loud. + const branch = project.baseBranch ?? 'main'; log.info('Refreshing snapshot workspace', { repoDir, branch, agentType }); - // Fetch latest refs from origin (tolerates transient network errors) const fetchResult = await runCommand('git', ['fetch', 'origin'], repoDir); if (fetchResult.exitCode !== 0) { - log.warn('git fetch exited with non-zero code (continuing)', { - exitCode: fetchResult.exitCode, - stderr: fetchResult.stderr.slice(-500), - }); + throw new Error( + `git fetch failed (exit ${fetchResult.exitCode}): ${fetchResult.stderr.slice(-500)}`, + ); } - // Reset to the remote tracking branch to discard any stale local changes const resetResult = await runCommand('git', ['reset', '--hard', `origin/${branch}`], repoDir); if (resetResult.exitCode !== 0) { - log.warn('git reset --hard exited with non-zero code (continuing)', { - exitCode: resetResult.exitCode, - stderr: resetResult.stderr.slice(-500), - }); + throw new Error( + `git reset --hard origin/${branch} failed (exit ${resetResult.exitCode}): ${resetResult.stderr.slice(-500)}`, + ); } - // Checkout the target branch (no-op when already on the correct branch) const checkoutResult = await runCommand('git', ['checkout', branch], repoDir); if (checkoutResult.exitCode !== 0) { - log.warn('git checkout exited with non-zero code (continuing)', { - exitCode: checkoutResult.exitCode, - stderr: checkoutResult.stderr.slice(-500), - }); + throw new Error( + `git checkout ${branch} failed (exit ${checkoutResult.exitCode}): ${checkoutResult.stderr.slice(-500)}`, + ); } log.info('Snapshot workspace refreshed', { repoDir, branch }); @@ -108,7 +184,7 @@ async function maybeWarmTsCache( } export async function setupRepository(options: SetupRepositoryOptions): Promise { - const { project, log, agentType, prBranch, warmTsCache } = options; + const { project, log, agentType, prNumber, prHeadSha, prBranch, warmTsCache } = options; // ── Snapshot-reuse path ──────────────────────────────────────────────────── // When CASCADE_SNAPSHOT_REUSE=true the container image already contains the @@ -122,7 +198,7 @@ export async function setupRepository(options: SetupRepositoryOptions): Promise< agentType, snapshotDir, }); - await refreshSnapshotWorkspace(snapshotDir, project, log, agentType, prBranch); + await refreshSnapshotWorkspace(snapshotDir, project, log, agentType, prNumber, prHeadSha); await maybeWarmTsCache(snapshotDir, warmTsCache, log); return snapshotDir; } @@ -148,10 +224,14 @@ export async function setupRepository(options: SetupRepositoryOptions): Promise< // Clone repo to temp directory await cloneRepo(project, repoDir); - // Checkout PR branch if provided - if (prBranch) { - log.info('Checking out PR branch', { prBranch }); - await runCommand('git', ['checkout', prBranch], repoDir); + // Checkout PR via canonical refs/pull/N/head ref (works for forks too) + if (prNumber) { + // scmProvider parameter reserved for future GitLab support (PR #1092 follow-up) + await fetchAndCheckoutPR(repoDir, prNumber, prHeadSha, undefined, log); + } else if (prBranch) { + // prBranch without prNumber is a no-op now; log for visibility so misconfigured + // callers can be diagnosed (the field is deprecated for checkout — pass prNumber). + log.info('prBranch provided without prNumber — skipping branch-name checkout', { prBranch }); } // Run project-specific setup script if it exists (handles dependency installation) diff --git a/src/api/routers/users.ts b/src/api/routers/users.ts index cb26009c..01d7c8a6 100644 --- a/src/api/routers/users.ts +++ b/src/api/routers/users.ts @@ -11,6 +11,54 @@ import { } from '../../db/repositories/usersRepository.js'; import { adminProcedure, router } from '../trpc.js'; +type Role = 'member' | 'admin' | 'superadmin'; +type ActorContext = { id: string; role: Role; effectiveOrgId: string }; +type TargetUser = { id: string; orgId: string; role: string }; + +/** + * Centralizes the permission rules for editing/deleting a user. Throws TRPCError + * with the appropriate code on any policy violation; returns silently if allowed. + * + * Rules (apply to both `update` and `delete`): + * 1. Cross-org access is hidden as `NOT_FOUND` (don't leak existence) unless + * the actor is a superadmin (who can act across orgs). + * 2. Only superadmins can act on a superadmin target user. + * + * Update-specific extras (when `input.role` is set): + * 3. No self-role-change (prevent self-demotion). + * 4. Only superadmins can assign or change the superadmin role. + */ +function assertUserAccessAllowed( + actor: ActorContext, + target: TargetUser, + options: { newRole?: Role; verb: 'edit' | 'delete' } = { verb: 'edit' }, +): void { + if (target.orgId !== actor.effectiveOrgId && actor.role !== 'superadmin') { + throw new TRPCError({ code: 'NOT_FOUND' }); + } + if (target.role === 'superadmin' && actor.role !== 'superadmin') { + throw new TRPCError({ + code: 'FORBIDDEN', + message: `Only superadmins can ${options.verb} superadmin users`, + }); + } + if (options.newRole !== undefined) { + if (actor.id === target.id) { + throw new TRPCError({ code: 'FORBIDDEN', message: 'Cannot change your own role' }); + } + const wouldElevate = options.newRole === 'superadmin'; + const wouldDemoteSuper = target.role === 'superadmin' && options.newRole !== 'superadmin'; + if ((wouldElevate || wouldDemoteSuper) && actor.role !== 'superadmin') { + throw new TRPCError({ + code: 'FORBIDDEN', + message: wouldElevate + ? 'Only superadmins can assign superadmin role' + : 'Only superadmins can change a superadmin user role', + }); + } + } +} + export const usersRouter = router({ list: adminProcedure.query(async ({ ctx }) => { if (ctx.user.role === 'superadmin') { @@ -61,52 +109,20 @@ export const usersRouter = router({ }), ) .mutation(async ({ ctx, input }) => { - // Verify target user belongs to same org const targetUser = await getUserById(input.id); - if (!targetUser) { throw new TRPCError({ code: 'NOT_FOUND' }); } - if (targetUser.orgId !== ctx.effectiveOrgId && ctx.user.role !== 'superadmin') { - throw new TRPCError({ code: 'NOT_FOUND' }); - } - - // Non-superadmins cannot edit any field on a superadmin user - if (targetUser.role === 'superadmin' && ctx.user.role !== 'superadmin') { - throw new TRPCError({ - code: 'FORBIDDEN', - message: 'Only superadmins can edit superadmin users', - }); - } - - // Prevent self-demotion (can't change own role) - if (input.role !== undefined && ctx.user.id === input.id) { - throw new TRPCError({ - code: 'FORBIDDEN', - message: 'Cannot change your own role', - }); - } - - // Only superadmins can set role to superadmin - if (input.role === 'superadmin' && ctx.user.role !== 'superadmin') { - throw new TRPCError({ - code: 'FORBIDDEN', - message: 'Only superadmins can assign superadmin role', - }); - } - - // Only superadmins can change a superadmin's role - if ( - targetUser.role === 'superadmin' && - input.role !== 'superadmin' && - ctx.user.role !== 'superadmin' - ) { - throw new TRPCError({ - code: 'FORBIDDEN', - message: 'Only superadmins can change a superadmin user role', - }); - } + assertUserAccessAllowed( + { + id: ctx.user.id, + role: ctx.user.role as Role, + effectiveOrgId: ctx.effectiveOrgId, + }, + targetUser, + { newRole: input.role, verb: 'edit' }, + ); const updates: { name?: string; @@ -140,24 +156,16 @@ export const usersRouter = router({ }); } - // Verify org ownership const targetUser = await getUserById(input.id); - if (!targetUser) { throw new TRPCError({ code: 'NOT_FOUND' }); } - if (targetUser.orgId !== ctx.effectiveOrgId && ctx.user.role !== 'superadmin') { - throw new TRPCError({ code: 'NOT_FOUND' }); - } - - // Only superadmins can delete superadmin users - if (targetUser.role === 'superadmin' && ctx.user.role !== 'superadmin') { - throw new TRPCError({ - code: 'FORBIDDEN', - message: 'Only superadmins can delete superadmin users', - }); - } + assertUserAccessAllowed( + { id: ctx.user.id, role: ctx.user.role as Role, effectiveOrgId: ctx.effectiveOrgId }, + targetUser, + { verb: 'delete' }, + ); await deleteUser(input.id); }), diff --git a/src/api/routers/webhooks.ts b/src/api/routers/webhooks.ts index 4c4096ae..950f44a0 100644 --- a/src/api/routers/webhooks.ts +++ b/src/api/routers/webhooks.ts @@ -23,6 +23,103 @@ import type { export type { GitHubWebhook, JiraWebhookInfo, LinearWebhookInfo, SentryWebhookInfo, TrelloWebhook }; +type CreateInput = { + trelloOnly?: boolean; + githubOnly?: boolean; + jiraOnly?: boolean; +}; +type ProjectContext = Awaited>; + +/** + * Per-provider create-or-skip helpers. Each: + * - Decides whether this provider should run given the input toggles + context. + * - Detects existing webhooks at the canonical or legacy callback URL. + * - Returns the created webhook, the duplicate marker, or `undefined` to skip. + * + * Extracted from the `create` mutation to keep that handler within the cognitive + * complexity budget — the policy is identical, the per-provider field shapes + * differ only in detail (callbackURL vs url vs config.url). + */ + +async function maybeCreateTrelloWebhook( + pctx: ProjectContext, + input: CreateInput, + baseUrl: string, +): Promise { + if (input.githubOnly || input.jiraOnly) return undefined; + if (!pctx.trelloApiKey || !pctx.trelloToken || !pctx.boardId) return undefined; + + const callbackUrl = `${baseUrl}/trello/webhook`; + const existing = await trelloListWebhooks(pctx); + const duplicate = existing.find( + (w) => w.callbackURL === callbackUrl || w.callbackURL === `${baseUrl}/webhook/trello`, + ); + if (duplicate) return `Already exists: ${duplicate.id}`; + return trelloCreateWebhook(pctx, callbackUrl); +} + +async function maybeCreateJiraWebhook( + pctx: ProjectContext, + input: CreateInput, + baseUrl: string, +): Promise<{ jira?: JiraWebhookInfo | string; labelsEnsured?: string[] }> { + if (input.trelloOnly || input.githubOnly) return {}; + if (!pctx.jiraEmail || !pctx.jiraApiToken || !pctx.jiraBaseUrl) return {}; + + const callbackUrl = `${baseUrl}/jira/webhook`; + const existing = await jiraListWebhooks(pctx); + const duplicate = existing.find( + (w) => w.url === callbackUrl || w.url === `${baseUrl}/webhook/jira`, + ); + const jira = duplicate + ? `Already exists: ${duplicate.id}` + : await jiraCreateWebhook(pctx, callbackUrl); + const labelsEnsured = await jiraEnsureLabels(pctx); + return { jira, labelsEnsured }; +} + +async function maybeCreateGitHubWebhook( + pctx: ProjectContext, + input: CreateInput, + baseUrl: string, +): Promise { + if (input.trelloOnly || input.jiraOnly) return undefined; + if (!pctx.githubToken) return undefined; + + const callbackUrl = `${baseUrl}/github/webhook`; + const existing = await githubListWebhooks(pctx); + const duplicate = existing.find( + (w) => w.config.url === callbackUrl || w.config.url === `${baseUrl}/webhook/github`, + ); + if (duplicate) return `Already exists: ${duplicate.id}`; + return githubCreateWebhook(pctx, callbackUrl); +} + +function buildSentryDisplayInfo( + pctx: ProjectContext, + projectId: string, + baseUrl: string, +): SentryWebhookInfo | undefined { + if (!pctx.sentryConfigured) return undefined; + return { + url: `${baseUrl}/sentry/webhook/${projectId}`, + webhookSecretSet: pctx.sentryWebhookSecretSet ?? false, + note: 'Configure this URL manually in your Sentry Internal Integration webhook settings.', + }; +} + +function buildLinearDisplayInfo( + pctx: ProjectContext, + baseUrl: string, +): LinearWebhookInfo | undefined { + if (pctx.pmType !== 'linear' || !pctx.linearApiKey) return undefined; + return { + url: `${baseUrl}/linear/webhook`, + webhookSecretSet: pctx.linearWebhookSecretSet ?? false, + note: 'Configure this URL manually in your Linear team settings under API > Webhooks.', + }; +} + export const webhooksRouter = router({ list: adminProcedure .input( @@ -94,6 +191,7 @@ export const webhooksRouter = router({ const pctx = await resolveProjectContext(input.projectId, ctx.effectiveOrgId); applyOneTimeTokens(pctx, input.oneTimeTokens); const baseUrl = input.callbackBaseUrl.replace(/\/$/, ''); + const results: { trello?: TrelloWebhook | string; github?: GitHubWebhook | string; @@ -103,84 +201,21 @@ export const webhooksRouter = router({ labelsEnsured?: string[]; } = {}; - // Trello webhook (skip for JIRA-only projects) - if ( - !input.githubOnly && - !input.jiraOnly && - pctx.trelloApiKey && - pctx.trelloToken && - pctx.boardId - ) { - const trelloCallbackUrl = `${baseUrl}/trello/webhook`; - const existing = await trelloListWebhooks(pctx); - const duplicate = existing.find( - (w) => - w.callbackURL === trelloCallbackUrl || w.callbackURL === `${baseUrl}/webhook/trello`, - ); + const trello = await maybeCreateTrelloWebhook(pctx, input, baseUrl); + if (trello !== undefined) results.trello = trello; - if (duplicate) { - results.trello = `Already exists: ${duplicate.id}`; - } else { - results.trello = await trelloCreateWebhook(pctx, trelloCallbackUrl); - } - } + const { jira, labelsEnsured } = await maybeCreateJiraWebhook(pctx, input, baseUrl); + if (jira !== undefined) results.jira = jira; + if (labelsEnsured !== undefined) results.labelsEnsured = labelsEnsured; - // JIRA webhook (skip for Trello-only projects) - if ( - !input.trelloOnly && - !input.githubOnly && - pctx.jiraEmail && - pctx.jiraApiToken && - pctx.jiraBaseUrl - ) { - const jiraCallbackUrl = `${baseUrl}/jira/webhook`; - const existing = await jiraListWebhooks(pctx); - const duplicate = existing.find( - (w) => w.url === jiraCallbackUrl || w.url === `${baseUrl}/webhook/jira`, - ); + const github = await maybeCreateGitHubWebhook(pctx, input, baseUrl); + if (github !== undefined) results.github = github; - if (duplicate) { - results.jira = `Already exists: ${duplicate.id}`; - } else { - results.jira = await jiraCreateWebhook(pctx, jiraCallbackUrl); - } - - // Seed CASCADE labels in JIRA autocomplete - results.labelsEnsured = await jiraEnsureLabels(pctx); - } + const sentry = buildSentryDisplayInfo(pctx, input.projectId, baseUrl); + if (sentry !== undefined) results.sentry = sentry; - // GitHub webhook - if (!input.trelloOnly && !input.jiraOnly && pctx.githubToken) { - const githubCallbackUrl = `${baseUrl}/github/webhook`; - const existing = await githubListWebhooks(pctx); - const duplicate = existing.find( - (w) => w.config.url === githubCallbackUrl || w.config.url === `${baseUrl}/webhook/github`, - ); - - if (duplicate) { - results.github = `Already exists: ${duplicate.id}`; - } else { - results.github = await githubCreateWebhook(pctx, githubCallbackUrl); - } - } - - // Sentry — display-only (cannot create programmatically) - if (pctx.sentryConfigured) { - results.sentry = { - url: `${baseUrl}/sentry/webhook/${input.projectId}`, - webhookSecretSet: pctx.sentryWebhookSecretSet ?? false, - note: 'Configure this URL manually in your Sentry Internal Integration webhook settings.', - }; - } - - // Linear — display-only (cannot create programmatically) - if (pctx.pmType === 'linear' && pctx.linearApiKey) { - results.linear = { - url: `${baseUrl}/linear/webhook`, - webhookSecretSet: pctx.linearWebhookSecretSet ?? false, - note: 'Configure this URL manually in your Linear team settings under API > Webhooks.', - }; - } + const linear = buildLinearDisplayInfo(pctx, baseUrl); + if (linear !== undefined) results.linear = linear; return results; }), diff --git a/src/backends/adapter.ts b/src/backends/adapter.ts index 6745ab65..a3523d51 100644 --- a/src/backends/adapter.ts +++ b/src/backends/adapter.ts @@ -28,6 +28,8 @@ async function resolveRepoDir( project: input.project, log, agentType, + prNumber: input.prNumber, + prHeadSha: input.headSha, prBranch: input.prBranch, warmTsCache: true, }); diff --git a/src/config/reviewConfig.ts b/src/config/reviewConfig.ts index 2a1de8a7..1e1afe64 100644 --- a/src/config/reviewConfig.ts +++ b/src/config/reviewConfig.ts @@ -1,9 +1,12 @@ /** - * Maximum estimated tokens for synthetic file content injection. - * Files are injected until this limit is reached. - * Remaining files are noted as "available on request". + * Maximum estimated tokens budgeted for the review agent's PR diff context. + * + * Applied to compact per-file diffs (not full file contents). Files whose + * cumulative diff tokens would push the context above this ceiling are + * surfaced to the agent in a structured `SKIPPED FILES` injection and can + * be fetched on demand via `Read` or `gh pr diff`. */ -export const REVIEW_FILE_CONTENT_TOKEN_LIMIT = 25_000; +export const REVIEW_DIFF_CONTEXT_TOKEN_LIMIT = 200_000; /** * Rough token estimation: ~4 characters per token. diff --git a/src/github/client.ts b/src/github/client.ts index b82b31b1..00bacfe3 100644 --- a/src/github/client.ts +++ b/src/github/client.ts @@ -144,7 +144,8 @@ export const githubClient = { prNumber: number, ): Promise { logger.debug('Fetching PR review comments', { owner, repo, prNumber }); - const { data } = await getClient().pulls.listReviewComments({ + const client = getClient(); + const data = await client.paginate(client.pulls.listReviewComments, { owner, repo, pull_number: prNumber, @@ -242,10 +243,12 @@ export const githubClient = { async getPRReviews(owner: string, repo: string, prNumber: number): Promise { logger.debug('Fetching PR reviews', { owner, repo, prNumber }); - const { data } = await getClient().pulls.listReviews({ + const client = getClient(); + const data = await client.paginate(client.pulls.listReviews, { owner, repo, pull_number: prNumber, + per_page: 100, }); return data.map((r) => ({ id: r.id, @@ -265,7 +268,8 @@ export const githubClient = { prNumber: number, ): Promise { logger.debug('Fetching PR issue comments', { owner, repo, prNumber }); - const { data } = await getClient().issues.listComments({ + const client = getClient(); + const data = await client.paginate(client.issues.listComments, { owner, repo, issue_number: prNumber, @@ -288,7 +292,7 @@ export const githubClient = { // Use Actions API (workflow runs + jobs) instead of Checks API, // because fine-grained PATs cannot access the Checks API. - const { data: runsData } = await client.actions.listWorkflowRunsForRepo({ + const workflowRuns = await client.paginate(client.actions.listWorkflowRunsForRepo, { owner, repo, head_sha: ref, @@ -297,8 +301,8 @@ export const githubClient = { // Fetch jobs for each workflow run to get per-job granularity const jobResults = await Promise.all( - runsData.workflow_runs.map((run) => - client.actions.listJobsForWorkflowRun({ + workflowRuns.map((run) => + client.paginate(client.actions.listJobsForWorkflowRun, { owner, repo, run_id: run.id, @@ -307,8 +311,8 @@ export const githubClient = { ), ); - const checkRuns = jobResults.flatMap(({ data }) => - data.jobs.map((job) => ({ + const checkRuns = jobResults.flatMap((jobs) => + jobs.map((job) => ({ name: job.name, status: job.status, conclusion: job.conclusion, @@ -335,7 +339,8 @@ export const githubClient = { async getPRDiff(owner: string, repo: string, prNumber: number): Promise { logger.debug('Fetching PR diff', { owner, repo, prNumber }); - const { data } = await getClient().pulls.listFiles({ + const client = getClient(); + const data = await client.paginate(client.pulls.listFiles, { owner, repo, pull_number: prNumber, @@ -421,14 +426,14 @@ export const githubClient = { logger.debug('Fetching failed workflow run jobs', { owner, repo, ref }); const client = getClient(); - const { data: runsData } = await client.actions.listWorkflowRunsForRepo({ + const workflowRuns = await client.paginate(client.actions.listWorkflowRunsForRepo, { owner, repo, head_sha: ref, per_page: 100, }); - const failedRuns = runsData.workflow_runs.filter( + const failedRuns = workflowRuns.filter( (run) => run.conclusion === 'failure' || run.conclusion === 'timed_out', ); @@ -438,14 +443,14 @@ export const githubClient = { const jobResults = await Promise.all( failedRuns.map((run) => - client.actions - .listJobsForWorkflowRun({ + client + .paginate(client.actions.listJobsForWorkflowRun, { owner, repo, run_id: run.id, per_page: 100, }) - .then((res) => ({ run, jobs: res.data.jobs })), + .then((jobs) => ({ run, jobs })), ), ); diff --git a/src/triggers/github/pr-comment-mention.ts b/src/triggers/github/pr-comment-mention.ts index 748af8b6..a97c6103 100644 --- a/src/triggers/github/pr-comment-mention.ts +++ b/src/triggers/github/pr-comment-mention.ts @@ -64,6 +64,7 @@ export class PRCommentMentionTrigger implements TriggerHandler { let commentAuthor: string; let prNumber: number; let prBranch: string; + let headSha: string; let prUrl: string; let prTitle: string; let repoFullName: string; @@ -82,6 +83,7 @@ export class PRCommentMentionTrigger implements TriggerHandler { const { owner, repo } = parseRepoFullName(repoFullName); const prDetails = await githubClient.getPR(owner, repo, prNumber); prBranch = prDetails.headRef; + headSha = prDetails.headSha; prUrl = prDetails.htmlUrl; prTitle = prDetails.title; } else if (isGitHubPRReviewCommentPayload(ctx.payload)) { @@ -93,6 +95,7 @@ export class PRCommentMentionTrigger implements TriggerHandler { commentAuthor = payload.comment.user.login; prNumber = payload.pull_request.number; prBranch = payload.pull_request.head.ref; + headSha = payload.pull_request.head.sha; prUrl = payload.pull_request.html_url; prTitle = payload.pull_request.title; repoFullName = payload.repository.full_name; @@ -129,6 +132,7 @@ export class PRCommentMentionTrigger implements TriggerHandler { prNumber, prBranch, repoFullName, + headSha, triggerEvent: 'scm:pr-comment-mention', triggerCommentId: commentId, triggerCommentBody: commentBody, diff --git a/src/triggers/github/pr-review-submitted.ts b/src/triggers/github/pr-review-submitted.ts index 6eede1a2..89ce36c3 100644 --- a/src/triggers/github/pr-review-submitted.ts +++ b/src/triggers/github/pr-review-submitted.ts @@ -72,6 +72,7 @@ export class PRReviewSubmittedTrigger implements TriggerHandler { prNumber, prBranch: reviewPayload.pull_request.head.ref, repoFullName: reviewPayload.repository.full_name, + headSha: reviewPayload.pull_request.head.sha, triggerEvent: 'scm:pr-review-submitted', triggerCommentId: reviewPayload.review.id, triggerCommentBody: reviewPayload.review.body || `Review: ${reviewPayload.review.state}`, diff --git a/src/types/index.ts b/src/types/index.ts index d9493116..3a07266a 100644 --- a/src/types/index.ts +++ b/src/types/index.ts @@ -10,9 +10,14 @@ export interface AgentInput { prNumber?: number; repoDir?: string; - // PR context fields for check-failure flow + // PR context fields — populated by trigger handlers from webhook payload prBranch?: string; repoFullName?: string; + /** + * The PR's head SHA at trigger time. Sourced from `pull_request.head.sha` (or + * `check_suite.head_sha` for check-failure events). Used both for the original + * check-failure flow AND for post-checkout HEAD verification in `setupRepository`. + */ headSha?: string; triggerType?: | 'check-failure' diff --git a/tests/unit/agents/definitions/contextSteps.test.ts b/tests/unit/agents/definitions/contextSteps.test.ts index c14d2566..3d0fd5cd 100644 --- a/tests/unit/agents/definitions/contextSteps.test.ts +++ b/tests/unit/agents/definitions/contextSteps.test.ts @@ -32,16 +32,29 @@ vi.mock('../../../../src/gadgets/pm/core/readWorkItem.js', () => ({ readWorkItemWithMedia: vi.fn(), })); +vi.mock('../../../../src/github/client.js', () => ({ + githubClient: { + getPR: vi.fn(), + getPRDiff: vi.fn(), + getCheckSuiteStatus: vi.fn(), + }, +})); + import type { FetchContextParams } from '../../../../src/agents/definitions/contextSteps.js'; import { + fetchPRContextStep, fetchWorkItemStep, prepopulateTodosStep, } from '../../../../src/agents/definitions/contextSteps.js'; import { readWorkItemWithMedia } from '../../../../src/gadgets/pm/core/readWorkItem.js'; import { initTodoSession, saveTodos } from '../../../../src/gadgets/todo/storage.js'; +import { githubClient } from '../../../../src/github/client.js'; import { getPMProviderOrNull } from '../../../../src/pm/index.js'; import type { AgentInput } from '../../../../src/types/index.js'; +const mockGetPR = vi.mocked(githubClient.getPR); +const mockGetPRDiff = vi.mocked(githubClient.getPRDiff); +const mockGetCheckSuiteStatus = vi.mocked(githubClient.getCheckSuiteStatus); const mockGetPMProviderOrNull = vi.mocked(getPMProviderOrNull); const mockReadWorkItemWithMedia = vi.mocked(readWorkItemWithMedia); const mockInitTodoSession = vi.mocked(initTodoSession); @@ -379,3 +392,155 @@ describe('fetchWorkItemStep', () => { expect(mockTrelloDownload).toHaveBeenCalledTimes(10); }); }); + +describe('fetchPRContextStep — compact diffs + SKIPPED FILES contract', () => { + beforeEach(() => { + mockGetPR.mockResolvedValue({ + number: 1092, + title: 'Test PR', + body: 'Description', + state: 'open', + htmlUrl: 'https://github.com/o/r/pull/1092', + headRef: 'claude/cranky', + headSha: 'abc123', + baseRef: 'dev', + merged: false, + mergeable: true, + user: { login: 'suda' }, + }); + mockGetCheckSuiteStatus.mockResolvedValue({ + totalCount: 0, + checkRuns: [], + allPassing: false, + }); + }); + + function makePRParams(): FetchContextParams { + return makeParams({ + prNumber: 1092, + repoFullName: 'o/r', + }); + } + + it('injects compact diff context (not full file contents)', async () => { + mockGetPRDiff.mockResolvedValue([ + { + filename: 'src/a.ts', + status: 'modified', + additions: 5, + deletions: 2, + changes: 7, + patch: '@@ -1,3 +1,5 @@\n context\n+added', + }, + { + filename: 'src/b.ts', + status: 'added', + additions: 10, + deletions: 0, + changes: 10, + patch: '@@ -0,0 +1,10 @@\n+new', + }, + ]); + + const injections = await fetchPRContextStep(makePRParams()); + + const diffInjection = injections.find((i) => i.description === 'Pre-fetched PR diff context'); + expect(diffInjection).toBeDefined(); + expect(diffInjection?.result as string).toContain('### src/a.ts (modified, +5 -2)'); + expect(diffInjection?.result as string).toContain('### src/b.ts (added, +10 -0)'); + // No per-file "Pre-fetched " injections — that was the old shape. + expect(injections.some((i) => (i.description as string).startsWith('Pre-fetched src/'))).toBe( + false, + ); + }); + + it('injects a SKIPPED FILES section when any files are skipped', async () => { + mockGetPRDiff.mockResolvedValue([ + { + filename: 'src/a.ts', + status: 'modified', + additions: 1, + deletions: 0, + changes: 1, + patch: '@@ -1 +1 @@\n+x', + }, + { + filename: 'legacy.ts', + status: 'removed', + additions: 0, + deletions: 42, + changes: 42, + patch: undefined, + }, + ]); + + const injections = await fetchPRContextStep(makePRParams()); + + const skipped = injections.find((i) => i.description === 'Skipped files'); + expect(skipped).toBeDefined(); + expect(skipped?.result as string).toContain('legacy.ts'); + expect(skipped?.result as string).toContain('deleted'); + expect(skipped?.result as string).toContain('gh pr diff 1092'); + expect(skipped?.result as string).toContain('Read'); + }); + + it('does NOT inject a SKIPPED FILES section when nothing is skipped', async () => { + mockGetPRDiff.mockResolvedValue([ + { + filename: 'src/a.ts', + status: 'modified', + additions: 1, + deletions: 0, + changes: 1, + patch: '@@ -1 +1 @@\n+x', + }, + ]); + + const injections = await fetchPRContextStep(makePRParams()); + + const skipped = injections.find((i) => i.description === 'Skipped files'); + expect(skipped).toBeUndefined(); + }); + + it('logs PR context prepared with included, skipped, and skipReasons map', async () => { + mockGetPRDiff.mockResolvedValue([ + { + filename: 'src/a.ts', + status: 'modified', + additions: 1, + deletions: 0, + changes: 1, + patch: '@@ -1 +1 @@\n+x', + }, + { + filename: 'removed.ts', + status: 'removed', + additions: 0, + deletions: 5, + changes: 5, + patch: undefined, + }, + { + filename: 'binary.png', + status: 'added', + additions: 0, + deletions: 0, + changes: 0, + patch: undefined, + }, + ]); + + const params = makePRParams(); + await fetchPRContextStep(params); + + expect(params.logWriter).toHaveBeenCalledWith( + 'INFO', + 'PR context prepared', + expect.objectContaining({ + included: 1, + skipped: 2, + skipReasons: expect.objectContaining({ deleted: 1, 'no-patch': 1 }), + }), + ); + }); +}); diff --git a/tests/unit/agents/definitions/review.yaml.test.ts b/tests/unit/agents/definitions/review.yaml.test.ts new file mode 100644 index 00000000..2acaea56 --- /dev/null +++ b/tests/unit/agents/definitions/review.yaml.test.ts @@ -0,0 +1,31 @@ +import { readFileSync } from 'node:fs'; +import { join } from 'node:path'; +import { describe, expect, it } from 'vitest'; + +/** + * The review agent's task prompt lives in `src/agents/definitions/review.yaml`. + * These tests guard the SKIPPED FILES contract: when the pre-fetch omits files, + * the agent needs to know how to fetch them on demand. + */ +describe('review.yaml prompt contract', () => { + const yamlPath = join(__dirname, '../../../../src/agents/definitions/review.yaml'); + const yamlText = readFileSync(yamlPath, 'utf-8'); + + it('names the SKIPPED FILES injection (AC #7)', () => { + expect(yamlText).toMatch(/SKIPPED FILES/i); + }); + + it('tells the agent to fetch skipped files via `gh pr diff` or `Read`', () => { + expect(yamlText).toMatch(/gh pr diff/); + expect(yamlText).toMatch(/\bRead\b/); + }); + + it('describes the compact-diff context shape', () => { + // Agent should know it will see per-file diffs (not full file contents). + expect(yamlText.toLowerCase()).toMatch(/compact|per-file diff|diff\s+context/); + }); + + it('does not reference the old "full file contents" pre-fetch', () => { + expect(yamlText.toLowerCase()).not.toContain('full file contents'); + }); +}); diff --git a/tests/unit/agents/shared/prFormatting.test.ts b/tests/unit/agents/shared/prFormatting.test.ts index 0584b357..7268b088 100644 --- a/tests/unit/agents/shared/prFormatting.test.ts +++ b/tests/unit/agents/shared/prFormatting.test.ts @@ -1,12 +1,15 @@ import { describe, expect, it } from 'vitest'; - import { + extractPRDiffs, formatPRComments, formatPRDetails, formatPRDiff, formatPRIssueComments, formatPRReviews, + type PRDiff, + type SkippedFile, } from '../../../../src/agents/shared/prFormatting.js'; +import { REVIEW_DIFF_CONTEXT_TOKEN_LIMIT } from '../../../../src/config/reviewConfig.js'; describe('formatPRDetails', () => { it('formats PR details with all fields', () => { @@ -285,3 +288,134 @@ describe('formatPRIssueComments', () => { expect(result).toContain('Comment #2 by @bob'); }); }); + +// ============================================================================ +// extractPRDiffs +// ============================================================================ + +function makePRFile(overrides: Partial): PRDiff[number] { + return { + filename: 'src/example.ts', + status: 'modified', + additions: 10, + deletions: 5, + changes: 15, + patch: '@@ -1,5 +1,10 @@\n context\n+added\n-removed', + ...overrides, + } as PRDiff[number]; +} + +describe('extractPRDiffs', () => { + it('returns an included entry per file with a patch', () => { + const prDiff: PRDiff = [ + makePRFile({ filename: 'a.ts', patch: '@@ -1 +1 @@' }), + makePRFile({ filename: 'b.ts', patch: '@@ -2 +2 @@' }), + ]; + + const result = extractPRDiffs(prDiff); + + expect(result.included).toHaveLength(2); + expect(result.included[0].path).toBe('a.ts'); + expect(result.included[1].path).toBe('b.ts'); + expect(result.skipped).toHaveLength(0); + }); + + it('marks deleted files as skipped with reason "deleted"', () => { + const prDiff: PRDiff = [makePRFile({ filename: 'gone.ts', status: 'removed' })]; + + const result = extractPRDiffs(prDiff); + + expect(result.included).toHaveLength(0); + expect(result.skipped).toEqual([{ filename: 'gone.ts', reason: 'deleted' }]); + }); + + it('marks files without a patch as skipped with reason "no-patch"', () => { + const prDiff: PRDiff = [makePRFile({ filename: 'binary.png', patch: undefined })]; + + const result = extractPRDiffs(prDiff); + + expect(result.included).toHaveLength(0); + expect(result.skipped).toEqual([{ filename: 'binary.png', reason: 'no-patch' }]); + }); + + it('marks files with a patch over the per-file cap as skipped with reason "patch-too-large"', () => { + // per-file cap is 10% of total budget by default — 20k tokens → 80k chars + // Construct a patch beyond that cap. + const hugePatch = 'x'.repeat(100_000); // 25k tokens + const prDiff: PRDiff = [makePRFile({ filename: 'huge.ts', patch: hugePatch })]; + + const result = extractPRDiffs(prDiff); + + expect(result.included).toHaveLength(0); + expect(result.skipped).toEqual([ + { filename: 'huge.ts', reason: 'patch-too-large' }, + ]); + }); + + it('respects total-budget cap; overflow files go to skipped with reason "over-budget"', () => { + // Per-file cap is 10% of total budget (20k tokens). Use ~19k-token patches + // (76k chars) — each individually fits, but 11 of them exceed the 200k cap. + const mediumPatch = 'x'.repeat(76_000); + const prDiff: PRDiff = Array.from({ length: 12 }, (_, i) => + makePRFile({ filename: `file-${i}.ts`, patch: mediumPatch }), + ); + + const result = extractPRDiffs(prDiff); + + // Expect some to be included, remainder over-budget, all reasons correct. + expect(result.included.length).toBeGreaterThan(0); + expect(result.skipped.length).toBeGreaterThan(0); + expect(result.included.length + result.skipped.length).toBe(12); + for (const s of result.skipped) { + expect(s.reason).toBe('over-budget'); + } + // The first included file is the first input; overflow preserves input order. + expect(result.included[0].path).toBe('file-0.ts'); + }); + + it('returns deterministic ordering for same input', () => { + const prDiff: PRDiff = [ + makePRFile({ filename: 'c.ts' }), + makePRFile({ filename: 'a.ts' }), + makePRFile({ filename: 'b.ts' }), + ]; + + const r1 = extractPRDiffs(prDiff); + const r2 = extractPRDiffs(prDiff); + + expect(r1.included.map((f) => f.path)).toEqual(r2.included.map((f) => f.path)); + // Preserves input order (not alphabetical) — stable w.r.t. GitHub's returned order + expect(r1.included.map((f) => f.path)).toEqual(['c.ts', 'a.ts', 'b.ts']); + }); + + it('handles empty PR diff input', () => { + const result = extractPRDiffs([]); + + expect(result).toEqual({ included: [], skipped: [] }); + }); + + it('per-file diff contains a header with filename, status, and line-change counts', () => { + const prDiff: PRDiff = [ + makePRFile({ filename: 'src/a.ts', status: 'modified', additions: 12, deletions: 3 }), + ]; + + const result = extractPRDiffs(prDiff); + + expect(result.included[0].diff).toContain('### src/a.ts'); + expect(result.included[0].diff).toContain('modified'); + expect(result.included[0].diff).toMatch(/\+12.*-3/); + }); + + it('applies skip rules in the documented order (deleted before no-patch)', () => { + // A removed file with no patch should be reported as "deleted", not "no-patch" + const prDiff: PRDiff = [makePRFile({ filename: 'x.ts', status: 'removed', patch: undefined })]; + + const result = extractPRDiffs(prDiff); + + expect(result.skipped[0].reason).toBe('deleted'); + }); + + it('sanity: budget constant matches imported value (guards against drift)', () => { + expect(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT).toBeGreaterThan(0); + }); +}); diff --git a/tests/unit/agents/shared/repository.test.ts b/tests/unit/agents/shared/repository.test.ts index de02283c..52b39c23 100644 --- a/tests/unit/agents/shared/repository.test.ts +++ b/tests/unit/agents/shared/repository.test.ts @@ -20,6 +20,7 @@ vi.mock('node:fs', () => ({ import { existsSync, readdirSync } from 'node:fs'; import { + fetchAndCheckoutPR, findSnapshotWorkspaceDir, setupRepository, } from '../../../../src/agents/shared/repository.js'; @@ -76,6 +77,153 @@ afterEach(() => { delete process.env.CASCADE_SNAPSHOT_REUSE; }); +// ── fetchAndCheckoutPR ──────────────────────────────────────────────────────── + +describe('fetchAndCheckoutPR', () => { + const repoDir = '/workspace/cascade-test-project-12345'; + + it('fetches refs/pull/N/head with the + force-update prefix on origin (github)', async () => { + const log = makeLog(); + mockRunCommand.mockResolvedValue({ stdout: 'abc123\n', stderr: '', exitCode: 0 }); + + await fetchAndCheckoutPR(repoDir, 1092, undefined, 'github', log); + + expect(mockRunCommand).toHaveBeenNthCalledWith( + 1, + 'git', + ['fetch', 'origin', '+refs/pull/1092/head:refs/remotes/pr/1092'], + repoDir, + ); + }); + + it('checks out detached pr/N after fetch', async () => { + const log = makeLog(); + mockRunCommand.mockResolvedValue({ stdout: 'abc123\n', stderr: '', exitCode: 0 }); + + await fetchAndCheckoutPR(repoDir, 1092, undefined, 'github', log); + + expect(mockRunCommand).toHaveBeenNthCalledWith( + 2, + 'git', + ['checkout', '--detach', 'pr/1092'], + repoDir, + ); + }); + + it('throws when git fetch returns non-zero exit code', async () => { + const log = makeLog(); + mockRunCommand.mockResolvedValueOnce({ + stdout: '', + stderr: 'fatal: couldn’t find remote ref refs/pull/1092/head', + exitCode: 128, + }); + + await expect(fetchAndCheckoutPR(repoDir, 1092, undefined, 'github', log)).rejects.toThrow( + /git fetch PR ref failed.*128.*couldn.t find remote ref/s, + ); + }); + + it('throws when git checkout returns non-zero exit code', async () => { + const log = makeLog(); + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // fetch ok + .mockResolvedValueOnce({ stdout: '', stderr: 'unmerged paths', exitCode: 1 }); // checkout fail + + await expect(fetchAndCheckoutPR(repoDir, 1092, undefined, 'github', log)).rejects.toThrow( + /git checkout PR failed.*1.*unmerged paths/s, + ); + }); + + it('verifies HEAD SHA matches expected when prHeadSha is provided', async () => { + const log = makeLog(); + const expectedSha = '96f5136213d7a435e4b6e27b3d868f7b622b3dc0'; + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // fetch + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // checkout + .mockResolvedValueOnce({ stdout: `${expectedSha}\n`, stderr: '', exitCode: 0 }); // rev-parse + + await expect( + fetchAndCheckoutPR(repoDir, 1092, expectedSha, 'github', log), + ).resolves.toBeUndefined(); + + expect(mockRunCommand).toHaveBeenNthCalledWith(3, 'git', ['rev-parse', 'HEAD'], repoDir); + }); + + it('throws on HEAD SHA mismatch with both SHAs in the message', async () => { + const log = makeLog(); + const expectedSha = '96f5136213d7a435e4b6e27b3d868f7b622b3dc0'; + const actualSha = 'deadbeefdeadbeefdeadbeefdeadbeefdeadbeef'; + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) + .mockResolvedValueOnce({ stdout: `${actualSha}\n`, stderr: '', exitCode: 0 }); + + await expect(fetchAndCheckoutPR(repoDir, 1092, expectedSha, 'github', log)).rejects.toThrow( + new RegExp(`HEAD SHA mismatch.*${expectedSha}.*${actualSha}`), + ); + }); + + it('skips SHA check when prHeadSha is undefined', async () => { + const log = makeLog(); + mockRunCommand.mockResolvedValue({ stdout: '', stderr: '', exitCode: 0 }); + + await fetchAndCheckoutPR(repoDir, 1092, undefined, 'github', log); + + // Only fetch + checkout — no rev-parse call + expect(mockRunCommand).toHaveBeenCalledTimes(2); + expect(mockRunCommand).not.toHaveBeenCalledWith( + 'git', + ['rev-parse', 'HEAD'], + expect.any(String), + ); + }); + + it('logs fetched ref and resolved HEAD on success', async () => { + const log = makeLog(); + const expectedSha = '96f5136213d7a435e4b6e27b3d868f7b622b3dc0'; + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) + .mockResolvedValueOnce({ stdout: `${expectedSha}\n`, stderr: '', exitCode: 0 }); + + await fetchAndCheckoutPR(repoDir, 1092, expectedSha, 'github', log); + + expect(log.info).toHaveBeenCalledWith( + 'PR checked out', + expect.objectContaining({ + prNumber: 1092, + ref: 'refs/pull/1092/head', + headSha: expectedSha, + }), + ); + }); + + it('throws when scmProvider is gitlab (not yet supported)', async () => { + const log = makeLog(); + + await expect(fetchAndCheckoutPR(repoDir, 1092, undefined, 'gitlab', log)).rejects.toThrow( + /only GitHub is currently supported.*GitLab support follows PR #1092/, + ); + // No git commands run + expect(mockRunCommand).not.toHaveBeenCalled(); + }); + + it('defaults to github when scmProvider is undefined', async () => { + const log = makeLog(); + mockRunCommand.mockResolvedValue({ stdout: '', stderr: '', exitCode: 0 }); + + await fetchAndCheckoutPR(repoDir, 1092, undefined, undefined, log); + + // First call should be the GitHub-style fetch + expect(mockRunCommand).toHaveBeenNthCalledWith( + 1, + 'git', + ['fetch', 'origin', '+refs/pull/1092/head:refs/remotes/pr/1092'], + repoDir, + ); + }); +}); + // ── findSnapshotWorkspaceDir ─────────────────────────────────────────────────── describe('findSnapshotWorkspaceDir', () => { @@ -200,26 +348,111 @@ describe('setupRepository', () => { expect(mockRunCommand).not.toHaveBeenCalled(); }); - it('checks out PR branch when prBranch option is provided', async () => { + it('fetches and checks out PR via refs/pull/N/head when prNumber is provided', async () => { const project = makeProject(); const log = makeLog(); - await setupRepository({ project, log, agentType: 'coder', prBranch: 'feature/my-branch' }); + await setupRepository({ project, log, agentType: 'coder', prNumber: 1092 }); + // fetch refs/pull/N/head, then detached checkout pr/N + expect(mockRunCommand).toHaveBeenCalledWith( + 'git', + ['fetch', 'origin', '+refs/pull/1092/head:refs/remotes/pr/1092'], + '/tmp/cascade-test-project-12345', + ); expect(mockRunCommand).toHaveBeenCalledWith( 'git', - ['checkout', 'feature/my-branch'], + ['checkout', '--detach', 'pr/1092'], '/tmp/cascade-test-project-12345', ); }); - it('does not call git checkout when prBranch is not provided', async () => { + it('verifies HEAD SHA when both prNumber and prHeadSha are provided', async () => { + const project = makeProject(); + const log = makeLog(); + const sha = '96f5136213d7a435e4b6e27b3d868f7b622b3dc0'; + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // fetch + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // checkout + .mockResolvedValueOnce({ stdout: `${sha}\n`, stderr: '', exitCode: 0 }); // rev-parse + + await setupRepository({ project, log, agentType: 'coder', prNumber: 1092, prHeadSha: sha }); + + expect(mockRunCommand).toHaveBeenCalledWith( + 'git', + ['rev-parse', 'HEAD'], + '/tmp/cascade-test-project-12345', + ); + }); + + it('throws and stops when PR fetch fails (no silent continuation)', async () => { + const project = makeProject(); + const log = makeLog(); + mockRunCommand.mockResolvedValueOnce({ + stdout: '', + stderr: 'fatal: couldn’t find ref', + exitCode: 128, + }); + + await expect( + setupRepository({ project, log, agentType: 'coder', prNumber: 1092 }), + ).rejects.toThrow(/git fetch PR ref failed/); + }); + + it('throws on HEAD SHA mismatch', async () => { + const project = makeProject(); + const log = makeLog(); + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) + .mockResolvedValueOnce({ stdout: 'wrongsha\n', stderr: '', exitCode: 0 }); + + await expect( + setupRepository({ + project, + log, + agentType: 'coder', + prNumber: 1092, + prHeadSha: 'expectedsha', + }), + ).rejects.toThrow(/HEAD SHA mismatch/); + }); + + it('does not invoke git when prNumber is not provided (non-PR runs)', async () => { const project = makeProject(); const log = makeLog(); await setupRepository({ project, log, agentType: 'coder' }); - expect(mockRunCommand).not.toHaveBeenCalledWith('git', expect.any(Array), expect.any(String)); + expect(mockRunCommand).not.toHaveBeenCalledWith( + 'git', + expect.arrayContaining(['fetch']), + expect.any(String), + ); + expect(mockRunCommand).not.toHaveBeenCalledWith( + 'git', + expect.arrayContaining(['checkout']), + expect.any(String), + ); + }); + + it('ignores prBranch when prNumber is not provided (legacy field, log-only)', async () => { + const project = makeProject(); + const log = makeLog(); + + await setupRepository({ + project, + log, + agentType: 'coder', + prBranch: 'feature/legacy-branch', + }); + + // Should NOT do `git checkout feature/legacy-branch` — that path is removed. + expect(mockRunCommand).not.toHaveBeenCalledWith( + 'git', + ['checkout', 'feature/legacy-branch'], + expect.any(String), + ); }); it('runs .cascade/setup.sh when it exists', async () => { @@ -385,7 +618,7 @@ describe('setupRepository — snapshot-reuse path', () => { ); }); - it('checks out prBranch instead of baseBranch when prBranch is provided', async () => { + it('uses fetchAndCheckoutPR (refs/pull/N/head) when prNumber is provided, instead of base-branch fetch+reset', async () => { mockReaddirSync.mockReturnValue(['cascade-test-project-99999'] as never); const project = makeProject({ baseBranch: 'main' }); const log = makeLog(); @@ -394,17 +627,50 @@ describe('setupRepository — snapshot-reuse path', () => { project, log, agentType: 'coder', - prBranch: 'feature/my-branch', + prNumber: 1092, }); + // PR ref fetch + detached checkout expect(mockRunCommand).toHaveBeenCalledWith( 'git', - ['reset', '--hard', 'origin/feature/my-branch'], + ['fetch', 'origin', '+refs/pull/1092/head:refs/remotes/pr/1092'], '/workspace/cascade-test-project-99999', ); expect(mockRunCommand).toHaveBeenCalledWith( 'git', - ['checkout', 'feature/my-branch'], + ['checkout', '--detach', 'pr/1092'], + '/workspace/cascade-test-project-99999', + ); + // Should NOT do the base-branch fetch+reset+checkout when PR is set + expect(mockRunCommand).not.toHaveBeenCalledWith('git', ['fetch', 'origin'], expect.any(String)); + expect(mockRunCommand).not.toHaveBeenCalledWith( + 'git', + ['reset', '--hard', 'origin/main'], + expect.any(String), + ); + }); + + it('verifies HEAD SHA on snapshot-reuse path when prNumber + prHeadSha provided', async () => { + mockReaddirSync.mockReturnValue(['cascade-test-project-99999'] as never); + const project = makeProject({ baseBranch: 'main' }); + const log = makeLog(); + const sha = '96f5136213d7a435e4b6e27b3d868f7b622b3dc0'; + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // fetch + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // checkout + .mockResolvedValueOnce({ stdout: `${sha}\n`, stderr: '', exitCode: 0 }); // rev-parse + + await setupRepository({ + project, + log, + agentType: 'coder', + prNumber: 1092, + prHeadSha: sha, + }); + + expect(mockRunCommand).toHaveBeenCalledWith( + 'git', + ['rev-parse', 'HEAD'], '/workspace/cascade-test-project-99999', ); }); @@ -491,38 +757,45 @@ describe('setupRepository — snapshot-reuse path', () => { expect(mockCreateTempDir).toHaveBeenCalled(); }); - it('continues gracefully when git fetch exits non-zero', async () => { + it('throws (no longer warns-and-continues) when git fetch exits non-zero on snapshot-reuse path', async () => { mockReaddirSync.mockReturnValue(['cascade-test-project-99999'] as never); - mockRunCommand - .mockResolvedValueOnce({ stdout: '', stderr: 'network error', exitCode: 128 }) // fetch fails - .mockResolvedValue({ stdout: '', stderr: '', exitCode: 0 }); // reset+checkout succeed + mockRunCommand.mockResolvedValueOnce({ + stdout: '', + stderr: 'network error', + exitCode: 128, + }); const project = makeProject(); const log = makeLog(); - const result = await setupRepository({ project, log, agentType: 'coder' }); - - expect(result).toBe('/workspace/cascade-test-project-99999'); - expect(log.warn).toHaveBeenCalledWith( - 'git fetch exited with non-zero code (continuing)', - expect.objectContaining({ exitCode: 128 }), + await expect(setupRepository({ project, log, agentType: 'coder' })).rejects.toThrow( + /git fetch.*128.*network error/s, ); }); - it('continues gracefully when git reset --hard exits non-zero', async () => { + it('throws (no longer warns-and-continues) when git reset --hard exits non-zero on snapshot-reuse path', async () => { mockReaddirSync.mockReturnValue(['cascade-test-project-99999'] as never); mockRunCommand .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // fetch ok - .mockResolvedValueOnce({ stdout: '', stderr: 'conflict', exitCode: 1 }) // reset fails - .mockResolvedValue({ stdout: '', stderr: '', exitCode: 0 }); // checkout ok + .mockResolvedValueOnce({ stdout: '', stderr: 'conflict', exitCode: 1 }); // reset fails const project = makeProject(); const log = makeLog(); - const result = await setupRepository({ project, log, agentType: 'coder' }); + await expect(setupRepository({ project, log, agentType: 'coder' })).rejects.toThrow( + /git reset.*1.*conflict/s, + ); + }); - expect(result).toBe('/workspace/cascade-test-project-99999'); - expect(log.warn).toHaveBeenCalledWith( - 'git reset --hard exited with non-zero code (continuing)', - expect.objectContaining({ exitCode: 1 }), + it('throws when git checkout exits non-zero on snapshot-reuse path', async () => { + mockReaddirSync.mockReturnValue(['cascade-test-project-99999'] as never); + mockRunCommand + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // fetch ok + .mockResolvedValueOnce({ stdout: '', stderr: '', exitCode: 0 }) // reset ok + .mockResolvedValueOnce({ stdout: '', stderr: 'pathspec', exitCode: 128 }); // checkout fails + const project = makeProject(); + const log = makeLog(); + + await expect(setupRepository({ project, log, agentType: 'coder' })).rejects.toThrow( + /git checkout.*128.*pathspec/s, ); }); }); diff --git a/tests/unit/backends/adapter.test.ts b/tests/unit/backends/adapter.test.ts index 420bb3fc..2bc97e30 100644 --- a/tests/unit/backends/adapter.test.ts +++ b/tests/unit/backends/adapter.test.ts @@ -335,6 +335,22 @@ describe('executeWithEngine', () => { expect(mockSetupRepository).not.toHaveBeenCalled(); }); + it('forwards prNumber and headSha (as prHeadSha) from input to setupRepository', async () => { + setupMocks(); + const engine = makeMockBackend(); + const sha = '96f5136213d7a435e4b6e27b3d868f7b622b3dc0'; + const input = makeInput({ prNumber: 1092, headSha: sha }); + + await executeWithEngine(engine, 'implementation', input); + + expect(mockSetupRepository).toHaveBeenCalledWith( + expect.objectContaining({ + prNumber: 1092, + prHeadSha: sha, + }), + ); + }); + it('cleans up resources in finally block', async () => { setupMocks(); const engine = makeMockBackend(); diff --git a/tests/unit/backends/agent-profiles.test.ts b/tests/unit/backends/agent-profiles.test.ts index ddede2d7..97b2857b 100644 --- a/tests/unit/backends/agent-profiles.test.ts +++ b/tests/unit/backends/agent-profiles.test.ts @@ -8,11 +8,19 @@ vi.mock('../../../src/agents/shared/prFormatting.js', () => ({ formatPRComments: vi.fn(() => 'formatted-pr-comments'), formatPRReviews: vi.fn(() => 'formatted-pr-reviews'), formatPRIssueComments: vi.fn(() => 'formatted-pr-issue-comments'), - readPRFileContents: vi.fn(() => Promise.resolve({ included: [], skipped: [] })), + formatPRDiffContext: vi.fn(() => 'formatted-diff-context'), + formatSkippedFilesInjection: vi.fn(() => ''), + countSkipsByReason: vi.fn(() => ({ + deleted: 0, + 'no-patch': 0, + 'patch-too-large': 0, + 'over-budget': 0, + })), + extractPRDiffs: vi.fn(() => ({ included: [], skipped: [] })), })); vi.mock('../../../src/config/reviewConfig.js', () => ({ - REVIEW_FILE_CONTENT_TOKEN_LIMIT: 50000, + REVIEW_DIFF_CONTEXT_TOKEN_LIMIT: 200_000, estimateTokens: vi.fn(() => 100), })); @@ -158,7 +166,6 @@ import { formatPRDiff, formatPRIssueComments, formatPRReviews, - readPRFileContents, } from '../../../src/agents/shared/prFormatting.js'; import { readWorkItem, readWorkItemWithMedia } from '../../../src/gadgets/pm/core/readWorkItem.js'; import { githubClient } from '../../../src/github/client.js'; @@ -708,7 +715,6 @@ describe('fetchReviewContext', () => { mockGithub.getPR.mockResolvedValue({ headSha: 'sha123' } as never); mockGithub.getPRDiff.mockResolvedValue([]); mockGithub.getCheckSuiteStatus.mockResolvedValue({ checks: [] } as never); - vi.mocked(readPRFileContents).mockResolvedValue({ included: [], skipped: [] }); }); it('includes PR injections (Details, Diff, Checks)', async () => { @@ -763,11 +769,7 @@ describe('fetchReviewContext', () => { expect(mockReadWorkItemWithMedia).not.toHaveBeenCalled(); }); - it('includes file content injections for included PR files', async () => { - vi.mocked(readPRFileContents).mockResolvedValue({ - included: [{ path: 'src/index.ts', content: 'file content' }], - skipped: [], - }); + it('includes a compact GetPRDiffContext injection', async () => { const profile = await getAgentProfile('review'); const params = makeContextParams({ triggerEvent: 'scm:check-suite-success', @@ -779,13 +781,8 @@ describe('fetchReviewContext', () => { params as Parameters[0], ); - const fileInjections = injections.filter( - (i) => - i.toolName === 'ReadFile' && - typeof i.result === 'string' && - i.result.includes('src/index.ts'), - ); - expect(fileInjections).toHaveLength(1); + const diffContextInjections = injections.filter((i) => i.toolName === 'GetPRDiffContext'); + expect(diffContextInjections).toHaveLength(1); }); it('calls formatting functions', async () => { @@ -808,7 +805,6 @@ describe('fetchCIContext', () => { mockGithub.getPR.mockResolvedValue({ headSha: 'sha456' } as never); mockGithub.getPRDiff.mockResolvedValue([]); mockGithub.getCheckSuiteStatus.mockResolvedValue({ checks: [] } as never); - vi.mocked(readPRFileContents).mockResolvedValue({ included: [], skipped: [] }); }); it('includes PR injections, dirListing, contextFiles, and workItem', async () => { @@ -861,7 +857,6 @@ describe('fetchPRCommentResponseContext', () => { mockGithub.getPRReviewComments.mockResolvedValue([] as never); mockGithub.getPRReviews.mockResolvedValue([] as never); mockGithub.getPRIssueComments.mockResolvedValue([] as never); - vi.mocked(readPRFileContents).mockResolvedValue({ included: [], skipped: [] }); }); it('includes PR injections and 3 conversation injections', async () => { diff --git a/tests/unit/config/reviewConfig.test.ts b/tests/unit/config/reviewConfig.test.ts index c544f38b..d4e768a9 100644 --- a/tests/unit/config/reviewConfig.test.ts +++ b/tests/unit/config/reviewConfig.test.ts @@ -2,21 +2,30 @@ import { describe, expect, it } from 'vitest'; import { estimateTokens, - REVIEW_FILE_CONTENT_TOKEN_LIMIT, + REVIEW_DIFF_CONTEXT_TOKEN_LIMIT, } from '../../../src/config/reviewConfig.js'; describe.concurrent('config/reviewConfig', () => { - describe('REVIEW_FILE_CONTENT_TOKEN_LIMIT', () => { + describe('REVIEW_DIFF_CONTEXT_TOKEN_LIMIT', () => { it('is defined as a number', () => { - expect(typeof REVIEW_FILE_CONTENT_TOKEN_LIMIT).toBe('number'); + expect(typeof REVIEW_DIFF_CONTEXT_TOKEN_LIMIT).toBe('number'); }); - it('is set to 25000 tokens', () => { - expect(REVIEW_FILE_CONTENT_TOKEN_LIMIT).toBe(25_000); + it('is a positive value', () => { + expect(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT).toBeGreaterThan(0); }); - it('is a positive value', () => { - expect(REVIEW_FILE_CONTENT_TOKEN_LIMIT).toBeGreaterThan(0); + it('is sized for diff content (not full files) — at least 100k tokens', () => { + // Diffs are dense (only changed lines). Combined with claude-opus-4-6's + // 1M-token window, the budget should comfortably fit the median PR's + // full diff content without forcing skip-for-budget. + expect(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT).toBeGreaterThanOrEqual(100_000); + }); + + it('is not so large that it overwhelms the agent context (< 500k tokens)', () => { + // Soft sanity cap: reviews consume additional context (instructions, + // conversation history, PR metadata). Leave headroom in the 1M window. + expect(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT).toBeLessThan(500_000); }); }); @@ -127,34 +136,17 @@ This is line 3`; expect(longTokens).toBe(shortTokens * 10); }); - - it('approximates typical file within limit', () => { - // A file with ~100k characters should be ~25k tokens - const largeFile = 'x'.repeat(100_000); - const tokens = estimateTokens(largeFile); - - expect(tokens).toBe(25_000); - expect(tokens).toBe(REVIEW_FILE_CONTENT_TOKEN_LIMIT); - }); }); describe('integration', () => { - it('can use estimateTokens to check against limit', () => { - const smallFile = 'a'.repeat(50_000); // ~12.5k tokens - const largeFile = 'a'.repeat(150_000); // ~37.5k tokens - - expect(estimateTokens(smallFile)).toBeLessThan(REVIEW_FILE_CONTENT_TOKEN_LIMIT); - expect(estimateTokens(largeFile)).toBeGreaterThan(REVIEW_FILE_CONTENT_TOKEN_LIMIT); - }); - - it('limit allows for reasonable amount of file content', () => { - // 25k tokens * 4 chars = 100k characters - // This is enough for ~3-5 medium TypeScript files - const estimatedChars = REVIEW_FILE_CONTENT_TOKEN_LIMIT * 4; - - expect(estimatedChars).toBe(100_000); - expect(estimatedChars).toBeGreaterThan(50_000); // Minimum reasonable - expect(estimatedChars).toBeLessThan(200_000); // Maximum to avoid context overflow + it('can use estimateTokens to check against the diff-context limit', () => { + // A 400k-char diff ≈ 100k tokens — should fit comfortably under the limit. + const mediumDiff = 'a'.repeat(400_000); + // A 2M-char diff ≈ 500k tokens — should exceed the 200k limit. + const hugeDiff = 'a'.repeat(2_000_000); + + expect(estimateTokens(mediumDiff)).toBeLessThan(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT); + expect(estimateTokens(hugeDiff)).toBeGreaterThan(REVIEW_DIFF_CONTEXT_TOKEN_LIMIT); }); }); }); diff --git a/tests/unit/github/client.test.ts b/tests/unit/github/client.test.ts index d434d47d..a6e8423d 100644 --- a/tests/unit/github/client.test.ts +++ b/tests/unit/github/client.test.ts @@ -33,6 +33,24 @@ const mockUsers = { getAuthenticated: vi.fn(), }; +// By default, mockPaginate routes calls through to the underlying mocked method +// and unwraps the response shape octokit's real `paginate` would produce. +// Tests can override per-call via mockPaginate.mockResolvedValue([...]) when +// they want to assert paginate was invoked directly with specific data. +const mockPaginate = vi.fn( + async (method: (params: unknown) => Promise, params: unknown) => { + const res = (await method(params)) as { data?: unknown }; + const data = res?.data; + if (data && typeof data === 'object') { + // listWorkflowRunsForRepo wraps results in { workflow_runs: [...] } + if ('workflow_runs' in data) return (data as { workflow_runs: unknown[] }).workflow_runs; + // listJobsForWorkflowRun wraps in { jobs: [...] } + if ('jobs' in data) return (data as { jobs: unknown[] }).jobs; + } + return Array.isArray(data) ? data : []; + }, +); + vi.mock('@octokit/rest', () => ({ Octokit: vi.fn().mockImplementation(() => ({ pulls: mockPulls, @@ -40,6 +58,7 @@ vi.mock('@octokit/rest', () => ({ checks: mockChecks, actions: mockActions, users: mockUsers, + paginate: mockPaginate, })), })); @@ -542,6 +561,118 @@ describe('githubClient', () => { patch: '@@ -1,5 +1,10 @@', }); }); + + it('uses octokit paginate (paginates beyond 100 files)', async () => { + // Override default routing — assert paginate was called and return 129 entries + const allFiles = Array.from({ length: 129 }, (_, i) => ({ + filename: `file-${i}.ts`, + status: 'modified', + additions: 1, + deletions: 0, + changes: 1, + patch: '@@', + })); + mockPaginate.mockResolvedValueOnce(allFiles); + + const result = await withGitHubToken('test-token', () => + githubClient.getPRDiff('owner', 'repo', 1092), + ); + + expect(result).toHaveLength(129); + expect(mockPaginate).toHaveBeenCalledWith(mockPulls.listFiles, { + owner: 'owner', + repo: 'repo', + pull_number: 1092, + per_page: 100, + }); + }); + }); + + describe('pagination — all paginated endpoints use octokit paginate', () => { + it('getPRReviewComments uses paginate', async () => { + mockPaginate.mockResolvedValueOnce([]); + await withGitHubToken('test-token', () => + githubClient.getPRReviewComments('owner', 'repo', 1), + ); + expect(mockPaginate).toHaveBeenCalledWith( + mockPulls.listReviewComments, + expect.objectContaining({ per_page: 100 }), + ); + }); + + it('getPRReviews uses paginate', async () => { + mockPaginate.mockResolvedValueOnce([]); + await withGitHubToken('test-token', () => githubClient.getPRReviews('owner', 'repo', 1)); + expect(mockPaginate).toHaveBeenCalledWith( + mockPulls.listReviews, + expect.objectContaining({ per_page: 100 }), + ); + }); + + it('getPRIssueComments uses paginate', async () => { + mockPaginate.mockResolvedValueOnce([]); + await withGitHubToken('test-token', () => + githubClient.getPRIssueComments('owner', 'repo', 1), + ); + expect(mockPaginate).toHaveBeenCalledWith( + mockIssues.listComments, + expect.objectContaining({ per_page: 100 }), + ); + }); + + it('getCheckSuiteStatus uses paginate for both workflow runs and jobs', async () => { + mockPaginate.mockResolvedValueOnce([{ id: 1 }, { id: 2 }]); // 2 runs + mockPaginate.mockResolvedValueOnce([ + { name: 'job1', status: 'completed', conclusion: 'success' }, + ]); + mockPaginate.mockResolvedValueOnce([ + { name: 'job2', status: 'completed', conclusion: 'success' }, + ]); + + const result = await withGitHubToken('test-token', () => + githubClient.getCheckSuiteStatus('owner', 'repo', 'sha123'), + ); + + expect(mockPaginate).toHaveBeenCalledWith( + mockActions.listWorkflowRunsForRepo, + expect.objectContaining({ per_page: 100 }), + ); + expect(mockPaginate).toHaveBeenCalledWith( + mockActions.listJobsForWorkflowRun, + expect.objectContaining({ run_id: 1, per_page: 100 }), + ); + expect(mockPaginate).toHaveBeenCalledWith( + mockActions.listJobsForWorkflowRun, + expect.objectContaining({ run_id: 2, per_page: 100 }), + ); + expect(result.totalCount).toBe(2); + }); + + it('getFailedWorkflowRunJobs uses paginate for both workflow runs and jobs', async () => { + mockPaginate.mockResolvedValueOnce([ + { id: 1, name: 'CI', conclusion: 'failure' }, + { id: 2, name: 'OK', conclusion: 'success' }, + ]); + mockPaginate.mockResolvedValueOnce([ + { name: 'lint', conclusion: 'failure', steps: [{ name: 'eslint', conclusion: 'failure' }] }, + ]); + + const result = await withGitHubToken('test-token', () => + githubClient.getFailedWorkflowRunJobs('owner', 'repo', 'sha123'), + ); + + expect(mockPaginate).toHaveBeenCalledWith( + mockActions.listWorkflowRunsForRepo, + expect.objectContaining({ per_page: 100 }), + ); + // Only failed run (id:1) is paginated for jobs + expect(mockPaginate).toHaveBeenCalledWith( + mockActions.listJobsForWorkflowRun, + expect.objectContaining({ run_id: 1, per_page: 100 }), + ); + expect(result.failedJobs).toHaveLength(1); + expect(result.failedJobs[0].jobName).toBe('lint'); + }); }); describe('createPRReview', () => { diff --git a/tests/unit/triggers/github-pr-comment-mention.test.ts b/tests/unit/triggers/github-pr-comment-mention.test.ts index 02749b94..2ea38bb7 100644 --- a/tests/unit/triggers/github-pr-comment-mention.test.ts +++ b/tests/unit/triggers/github-pr-comment-mention.test.ts @@ -289,6 +289,17 @@ describe('PRCommentMentionTrigger', () => { expect(result).not.toBeNull(); }); + + it('includes headSha from fetched PR details (for post-checkout HEAD verification)', async () => { + mockGetPR.mockResolvedValue({ + headRef: 'feature/my-feature', + headSha: 'cafebabecafebabecafebabecafebabecafebabe', + }); + + const result = await trigger.handle(buildCtx()); + + expect(result?.agentInput.headSha).toBe('cafebabecafebabecafebabecafebabecafebabe'); + }); }); describe('handle — review_comment path', () => { @@ -332,5 +343,12 @@ describe('PRCommentMentionTrigger', () => { expect(result).toBeNull(); }); + + it('includes headSha from the review-comment payload (for post-checkout HEAD verification)', async () => { + // buildReviewCommentPayload sets head.sha = 'abc123' + const result = await trigger.handle(buildCtx({ payload: buildReviewCommentPayload() })); + + expect(result?.agentInput.headSha).toBe('abc123'); + }); }); }); diff --git a/tests/unit/triggers/pr-review-submitted.test.ts b/tests/unit/triggers/pr-review-submitted.test.ts index 8cc91c29..8f1817cb 100644 --- a/tests/unit/triggers/pr-review-submitted.test.ts +++ b/tests/unit/triggers/pr-review-submitted.test.ts @@ -163,6 +163,7 @@ describe('PRReviewSubmittedTrigger', () => { prNumber: 42, prBranch: 'feature/test', repoFullName: 'owner/repo', + headSha: 'sha123', triggerCommentId: 100, triggerCommentBody: 'Please fix the bug', triggerCommentPath: '', @@ -299,6 +300,7 @@ describe('PRReviewSubmittedTrigger', () => { prNumber: 42, prBranch: 'feature/test', repoFullName: 'owner/repo', + headSha: 'sha123', triggerCommentId: 200, triggerCommentBody: 'Left some inline comments', triggerCommentPath: '', diff --git a/tests/unit/web/pm-wizard-state.test.ts b/tests/unit/web/pm-wizard-state.test.ts index c7ff9fce..5cdc532c 100644 --- a/tests/unit/web/pm-wizard-state.test.ts +++ b/tests/unit/web/pm-wizard-state.test.ts @@ -7,6 +7,7 @@ import { areCredentialsReady, buildEditState, createInitialState, + deriveActiveWebhooks, INITIAL_JIRA_LABELS, isStep1Complete, isStep2Complete, @@ -44,6 +45,7 @@ describe('createInitialState', () => { expect(state.jiraLabels).toEqual(INITIAL_JIRA_LABELS); expect(state.jiraCostFieldId).toBe(''); expect(state.isEditing).toBe(false); + expect(state.previousProvider).toBeUndefined(); }); }); @@ -60,7 +62,7 @@ describe('wizardReducer', () => { return wizardReducer(state, action); } - it('SET_PROVIDER resets to initial state with new provider', () => { + it('SET_PROVIDER (not editing) resets to initial state with new provider', () => { const state = { ...initialState(), trelloApiKey: 'my-api-key', @@ -71,6 +73,57 @@ describe('wizardReducer', () => { // Should have been reset expect(next.trelloApiKey).toBe(''); expect(next.trelloBoardId).toBe(''); + expect(next.isEditing).toBe(false); + expect(next.previousProvider).toBeUndefined(); + }); + + it('SET_PROVIDER (editing) preserves isEditing + previousProvider and clears provider-specific fields', () => { + const state: WizardState = { + ...initialState(), + provider: 'trello', + isEditing: true, + previousProvider: 'trello', + hasStoredCredentials: true, + trelloApiKey: 'key', + trelloToken: 'tok', + trelloBoardId: 'board-1', + trelloListMappings: { todo: 'list-1' }, + trelloLabelMappings: { processing: 'label-1' }, + trelloCostFieldId: 'cf-1', + verificationResult: { provider: 'trello', display: '@user' }, + verifyError: null, + }; + const next = dispatch(state, { type: 'SET_PROVIDER', provider: 'linear' }); + + // New provider is set, edit mode carries over + expect(next.provider).toBe('linear'); + expect(next.isEditing).toBe(true); + expect(next.previousProvider).toBe('trello'); + + // Credentials + verification + hasStoredCredentials are cleared + expect(next.trelloApiKey).toBe(''); + expect(next.trelloToken).toBe(''); + expect(next.linearApiKey).toBe(''); + expect(next.verificationResult).toBeNull(); + expect(next.verifyError).toBeNull(); + expect(next.hasStoredCredentials).toBe(false); + + // Provider-specific fields cleared + expect(next.trelloBoardId).toBe(''); + expect(next.trelloListMappings).toEqual({}); + expect(next.trelloLabelMappings).toEqual({}); + expect(next.trelloCostFieldId).toBe(''); + }); + + it('SET_PROVIDER (editing) with no previousProvider set leaves previousProvider undefined', () => { + const state: WizardState = { + ...initialState(), + provider: 'trello', + isEditing: true, + }; + const next = dispatch(state, { type: 'SET_PROVIDER', provider: 'jira' }); + expect(next.isEditing).toBe(true); + expect(next.previousProvider).toBeUndefined(); }); it('SET_TRELLO_API_KEY clears verification', () => { @@ -278,6 +331,15 @@ describe('wizardReducer', () => { expect(next.jiraBaseUrl).toBe('https://example.atlassian.net'); }); + it('INIT_EDIT records previousProvider matching the loaded provider', () => { + const state = initialState(); + const next = dispatch(state, { + type: 'INIT_EDIT', + state: { provider: 'trello', trelloBoardId: 'board-1' }, + }); + expect(next.previousProvider).toBe('trello'); + }); + it('ADD_TRELLO_BOARD_LABEL appends a label to trelloBoardDetails.labels', () => { const state = { ...initialState(), @@ -645,3 +707,54 @@ describe('buildEditState', () => { expect(Object.keys(result).length).toBe(1); }); }); + +// ============================================================================ +// deriveActiveWebhooks +// ============================================================================ + +describe('deriveActiveWebhooks', () => { + it('maps Trello webhooks via callbackURL + active', () => { + const result = deriveActiveWebhooks('trello', { + trello: [ + { id: 1, callbackURL: 'https://hook/trello', active: true }, + { id: 2, callbackURL: 'https://hook/trello-2', active: false }, + ], + }); + + expect(result).toEqual([ + { id: '1', url: 'https://hook/trello', active: true }, + { id: '2', url: 'https://hook/trello-2', active: false }, + ]); + }); + + it('maps JIRA webhooks via url + enabled (renamed to active)', () => { + const result = deriveActiveWebhooks('jira', { + jira: [{ id: 'abc', url: 'https://hook/jira', enabled: true }], + }); + + expect(result).toEqual([{ id: 'abc', url: 'https://hook/jira', active: true }]); + }); + + it('returns empty array for Linear (manual webhook setup)', () => { + const result = deriveActiveWebhooks('linear', { + trello: [{ id: 1, callbackURL: 'irrelevant', active: true }], + jira: [{ id: 1, url: 'irrelevant', enabled: true }], + }); + + expect(result).toEqual([]); + }); + + it('returns empty array when webhooksData is undefined', () => { + expect(deriveActiveWebhooks('trello', undefined)).toEqual([]); + expect(deriveActiveWebhooks('jira', undefined)).toEqual([]); + expect(deriveActiveWebhooks('linear', undefined)).toEqual([]); + }); + + it('coerces numeric IDs to strings', () => { + const result = deriveActiveWebhooks('trello', { + trello: [{ id: 42, callbackURL: 'u', active: true }], + }); + expect(result[0].id).toBe('42'); + expect(typeof result[0].id).toBe('string'); + }); +}); diff --git a/web/src/components/projects/pm-wizard-hooks.ts b/web/src/components/projects/pm-wizard-hooks.ts index 836f1709..73353ee6 100644 --- a/web/src/components/projects/pm-wizard-hooks.ts +++ b/web/src/components/projects/pm-wizard-hooks.ts @@ -7,6 +7,7 @@ import { useMutation, useQueryClient } from '@tanstack/react-query'; import { useEffect } from 'react'; import { API_URL } from '@/lib/api.js'; import { trpc, trpcClient } from '@/lib/trpc.js'; +import { getCredentialRoles } from '../../../../src/config/integrationRoles.js'; import type { LinearTeamDetails, LinearTeamOption, @@ -677,6 +678,16 @@ export function useSaveMutation(projectId: string, state: WizardState) { }); } + // If the user switched provider mid-edit, clean up the old provider's credentials. + if (state.previousProvider && state.previousProvider !== state.provider) { + const oldKeys = getCredentialRoles(state.previousProvider).map((r) => r.envVarKey); + await Promise.all( + oldKeys.map((envVarKey) => + trpcClient.projects.credentials.delete.mutate({ projectId, envVarKey }), + ), + ); + } + return result; }, onSuccess: () => { diff --git a/web/src/components/projects/pm-wizard-state.ts b/web/src/components/projects/pm-wizard-state.ts index 1a6bbe3d..0214728e 100644 --- a/web/src/components/projects/pm-wizard-state.ts +++ b/web/src/components/projects/pm-wizard-state.ts @@ -81,6 +81,12 @@ export interface WizardState { // Editing mode isEditing: boolean; hasStoredCredentials: boolean; // true in edit mode when provider credentials exist in project_credentials + /** + * Provider that was loaded from the server at INIT_EDIT time. Used by the save flow + * to clean up the prior provider's credentials when the user switches provider + * mid-edit. Undefined on first-time setup. + */ + previousProvider?: Provider; } export type WizardAction = @@ -183,9 +189,13 @@ export function createInitialState(): WizardState { export const wizardReducer: Reducer = (state, action) => { switch (action.type) { case 'SET_PROVIDER': + // Preserve edit-mode flags so a provider switch on an existing integration + // still knows which provider to clean up at save time. return { ...createInitialState(), provider: action.provider, + isEditing: state.isEditing, + previousProvider: state.previousProvider, }; case 'SET_TRELLO_API_KEY': return { @@ -302,8 +312,12 @@ export const wizardReducer: Reducer = (state, action) ...state, linearLabels: { ...state.linearLabels, [action.key]: action.value }, }; - case 'INIT_EDIT': - return { ...state, ...action.state, isEditing: true }; + case 'INIT_EDIT': { + const merged = { ...state, ...action.state, isEditing: true }; + // Snapshot the loaded provider so a later SET_PROVIDER knows what to clean up. + merged.previousProvider = merged.provider; + return merged; + } case 'ADD_TRELLO_BOARD_LABEL': if (!state.trelloBoardDetails) return state; return { @@ -441,3 +455,35 @@ export function areCredentialsReady(state: WizardState): boolean { return !!(state.jiraEmail && state.jiraApiToken && state.jiraBaseUrl); return !!state.linearApiKey; } + +/** + * Map the provider's webhook listing into the shape expected by `WebhookStep`. + * Linear webhooks are configured manually outside the wizard; Trello/JIRA come + * from the corresponding API listing. + */ +export function deriveActiveWebhooks( + provider: Provider, + webhooksData: + | { + trello?: ReadonlyArray<{ id: string | number; callbackURL: string; active: boolean }>; + jira?: ReadonlyArray<{ id: string | number; url: string; enabled: boolean }>; + } + | undefined, +): Array<{ id: string; url: string; active: boolean }> { + if (provider === 'trello') { + return (webhooksData?.trello ?? []).map((w) => ({ + id: String(w.id), + url: w.callbackURL, + active: w.active, + })); + } + if (provider === 'jira') { + return (webhooksData?.jira ?? []).map((w) => ({ + id: String(w.id), + url: w.url, + active: w.enabled, + })); + } + // Linear: webhooks are configured manually + return []; +} diff --git a/web/src/components/projects/pm-wizard.tsx b/web/src/components/projects/pm-wizard.tsx index 7a9c6c49..40be1bed 100644 --- a/web/src/components/projects/pm-wizard.tsx +++ b/web/src/components/projects/pm-wizard.tsx @@ -30,6 +30,7 @@ import { areCredentialsReady, buildEditState, createInitialState, + deriveActiveWebhooks, isStep1Complete, isStep2Complete, isStep3Complete, @@ -57,6 +58,23 @@ const STEP_TITLES = [ 'Save', ] as const; +const PROVIDER_LABELS: Record<'trello' | 'jira' | 'linear', string> = { + trello: 'Trello', + jira: 'JIRA', + linear: 'Linear', +}; + +function confirmProviderSwitch( + from: 'trello' | 'jira' | 'linear', + to: 'trello' | 'jira' | 'linear', +): boolean { + return window.confirm( + `Switch PM provider from ${PROVIDER_LABELS[from]} to ${PROVIDER_LABELS[to]}?\n\n` + + `You'll need to re-enter credentials and re-map fields for ${PROVIDER_LABELS[to]}. ` + + `The old provider's credentials will be deleted when you save.`, + ); +} + // ============================================================================ // Main PMWizard Component // ============================================================================ @@ -205,20 +223,7 @@ export function PMWizard({ } // ---- Active webhooks for this provider ---- - const activeWebhooks = - state.provider === 'trello' - ? (webhooksQuery.data?.trello ?? []).map((w) => ({ - id: String(w.id), - url: w.callbackURL, - active: w.active, - })) - : state.provider === 'jira' - ? (webhooksQuery.data?.jira ?? []).map((w) => ({ - id: String(w.id), - url: w.url, - active: w.enabled, - })) - : []; // Linear: webhooks are configured manually + const activeWebhooks = deriveActiveWebhooks(state.provider, webhooksQuery.data); // ---- Render ---- @@ -239,8 +244,9 @@ export function PMWizard({ ))} diff --git a/web/tsconfig.json b/web/tsconfig.json index f5e26c89..4ba2276c 100644 --- a/web/tsconfig.json +++ b/web/tsconfig.json @@ -17,5 +17,11 @@ "@/*": ["./src/*"] } }, - "include": ["src", "../src/api/**/*", "../src/db/**/*", "../src/openrouter/**/*"] + "include": [ + "src", + "../src/api/**/*", + "../src/config/**/*", + "../src/db/**/*", + "../src/openrouter/**/*" + ] }