feat(code-reviews): incremental reviews on follow-up pushes#927
feat(code-reviews): incremental reviews on follow-up pushes#927alex-alecu merged 25 commits intomainfrom
Conversation
Code Review SummaryStatus: 2 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)CRITICAL
WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments: None. Files Reviewed (15 files)
Fix these issues in Kilo Cloud Reviewed by gpt-5.4-20260305 · 2,478,044 tokens |
84d0d37 to
f787f24
Compare
| // 4. Generate auth token for cloud agent with bot identifier | ||
| // 4. Check for previous completed review (incremental review optimization) | ||
| let previousHeadSha: string | null = null; | ||
| const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW); |
There was a problem hiding this comment.
WARNING: Incremental-review rollout is evaluated against a shared PostHog bucket
isFeatureFlagEnabled() defaults distinctId to 'server-config-fetch', so this call ignores the current review owner. Any percentage rollout or targeted rule for code-review-incremental will therefore turn incremental mode on for everyone or no one instead of the intended subset. Pass the same distinct ID you already use for code-review flags here.
| const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW); | |
| const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW, owner.userId); |
There was a problem hiding this comment.
I couldn't apply this fix automatically this time.
The auto-fix run failed before it could push an updated commit.
Next steps:
- Retry with more guidance, or apply the fix manually.
- Session ID:
agent_4ab0cb7c-0039-49da-a9ea-b4bd6a84015e
Technical details
Network connection lost.
There was a problem hiding this comment.
Implemented the requested fix around src/lib/code-reviews/triggers/prepare-review-payload.ts:318 and pushed 622ed7e3.
When a follow-up push occurs on a PR that already has a completed review, the new review now receives the previous review's head SHA. The prompt includes an incremental review strategy section that instructs the agent to: - Focus on the incremental diff (git diff <previous_sha>..HEAD) - Check if previously flagged issues were fixed - Only flag new issues in new/modified code - Re-check touched files holistically - Update the summary to reflect current state Changes: - Add findLastCompletedReviewForPR() DB query - Thread previousReviewHeadSha through dispatch → payload → prompt - Add incremental review context section to generated prompt - Add 4 tests for incremental review prompt generation Refs: #927
Allow follow-up executions to override the stored callback URL. The DO updates its metadata before executing, so completion notifications route to the new review's endpoint.
Add findPreviousCompletedReviewSession query that returns the cloud-agent session ID from the most recent completed review for the same PR. Include previousCloudAgentSessionId in the CodeReviewPayload so the orchestrator can reuse it.
On follow-up pushes, reuse the existing cloud-agent session via sendMessageV2 instead of creating a fresh one. This preserves conversation context and avoids re-cloning the repo. Falls back to prepareSession + initiate if sendMessageV2 fails (session expired, sandbox evicted, execution in progress, etc).
# Conflicts: # src/lib/code-reviews/db/code-reviews.ts
callbackTarget on the protectedProcedure sendMessageV2 endpoint allowed any authenticated user to override the session callback URL, creating an SSRF vector. Move callback setup to the internal-only updateSession endpoint (requires x-internal-api-key). The orchestrator now calls updateSession before sendMessageV2 to set the callback securely. Also fixes merge conflicts: duplicate isFeatureFlagEnabled import and missing platform field in test helper.
| let previousCloudAgentSessionId: string | undefined; | ||
| if (incrementalEnabled) { | ||
| try { | ||
| const previousSession = await findPreviousCompletedReviewSession( |
There was a problem hiding this comment.
WARNING: Session continuation can target a different review than the incremental diff
findPreviousCompletedReview() above picks the newest completed review, but this second lookup can fall back to an older row that merely has a session_id (including legacy v1 sessions). That means the prompt diffs against one SHA while sendMessageV2 resumes a different session, so the reused workspace/conversation no longer matches the incremental context. Derive both values from the same review row, or only enable continuation when the exact review chosen for previousHeadSha has a resumable v2 session.
…ame review row Previously findPreviousCompletedReview and findPreviousCompletedReviewSession were separate DB queries that could return different review rows when the most recent completed review lacked a session_id. This caused the incremental diff to target one SHA while session continuation resumed a workspace from a different, older review. Consolidate into a single findPreviousCompletedReview that returns both head_sha and session_id (nullable), ensuring both values always come from the same row.
…into worker-utils The code-review orchestrator Durable Object used raw fetch() calls to cloud-agent-next tRPC endpoints, duplicating request/response parsing that the Next.js CloudAgentNextClient already handled. Any API shape change required updating both codebases independently. Add a lightweight, fetch-based createCloudAgentNextFetchClient() to @kilocode/worker-utils with typed methods for prepareSession, initiateFromPreparedSession, updateSession, sendMessageV2, and interruptSession. Refactor the orchestrator to use the shared client, eliminating ~120 lines of manual tRPC envelope parsing.
| }, | ||
| }; | ||
|
|
||
| await client.updateSession(internalHeaders, { |
There was a problem hiding this comment.
CRITICAL: Callback target is reassigned before the reused session is known to be idle
updateSession() runs before sendMessageV2(). If another follow-up review is already using previousSessionId, this call can successfully rewrite that shared session's callbackTarget to the new reviewId, then sendMessageV2() returns 409 and we fall back to a fresh session. When the in-flight execution on the reused session finishes, its terminal callback is delivered to the wrong review record. Please avoid mutating callbackTarget until you've confirmed this session can actually start the follow-up, or restore the old target on fallback.
E2E Test ResultsPublic repo PR https://github.com/alex-alecu/stories-canvas/pull/18 Test 1: Full Review (PR Opened) —
|
| Metric | Value |
|---|---|
| Review ID | 43eae168-9c1e-4116-8273-71e45263d59f |
| Status | completed |
| Head SHA | ed1afc8e |
| Session ID | agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd |
| CLI Session | ses_32303b14fffeBCURYFPSIeAtmo |
| Tokens In | 222,994 |
| Tokens Out | 9,295 |
| Cost (musd) | 350,744 |
| Duration | ~3 min 26s |
Test 2a: Follow-up Review Without Incremental Flag — fresh session, no session continuation
| Metric | Value |
|---|---|
| Review ID | 71838edd-39fa-4d29-9522-598e3a54e0bd |
| Status | completed |
| Head SHA | b90f8e64 |
| Session ID | agent_dfddb931-14d2-4d13-a50e-c3a3c483473e (fresh) |
| CLI Session | ses_322ffd210ffeW08ttZfQf0HrIu |
| Tokens In | 433,133 |
| Tokens Out | 8,009 |
| Cost (musd) | 412,396 |
This was a full review from scratch (no incremental mode), showing the old behavior: the follow-up review used ~2x the tokens of the initial review.
Test 2b: Follow-up Review With Incremental Flag ON — session continuation attempted
| Metric | Value |
|---|---|
| Review ID | bac2a9a8-e486-4d46-9358-3f8371dfa0f6 |
| Status | failed (due to callback race bug) |
| Head SHA | b90f8e64 |
| Session ID (attempted) | agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd (reuse) |
| Fallback Session ID | agent_0a2976af-efb2-4150-89b6-32c62c790a12 |
Key Findings
1. Incremental Review Detection Works:
prepareReviewPayloadcorrectly found the previous completed review'shead_shaandsession_id- Log:
[prepareReviewPayload] Found previous completed review for incremental mode { previousHeadSha: 'ed1afc8e', previousCloudAgentSessionId: 'agent_9ce034ba-...' }
2. Session Continuation Attempted:
- The orchestrator correctly routed to
sendMessageV2with the previous session ID - Log:
[CodeReviewOrchestrator] Attempting session continuation via sendMessageV2 { previousCloudAgentSessionId: 'agent_9ce034ba-...' }
3. Fallback Path Worked (partially):
sendMessageV2failed with "Cold-start session restore failed" (expected local Docker limitation - snapshots aren't persisted between container lifecycles in Wrangler dev)- The orchestrator correctly fell back to
prepareSession+initiateFromKilocodeSessionV2with a fresh session
4. Bug Found: Callback target race condition:
This is the same CRITICAL issue the kilo-code-bot review identified on the PR. When updateSession is called before sendMessageV2, it rewrites the callback target on the old session. When sendMessageV2 fails, the old session's failure callback gets delivered to the new review (not the old one), setting the new review's status to failed before the fallback session can update it. The fallback session's running status is then rejected because the review is already in a terminal state.
5. Token Usage Comparison (for completed reviews):
| Initial Review | Follow-up (no incremental) | |
|---|---|---|
| Tokens In | 222,994 | 433,133 (+94%) |
| Tokens Out | 9,295 | 8,009 |
| Cost (musd) | 350,744 | 412,396 (+18%) |
The follow-up review without incremental mode used nearly 2x the input tokens of the initial review, confirming the value of the incremental approach for token savings.
When sendMessageV2 fails during session continuation (e.g. workspace restore), executeDirectly was enqueuing a failure callback using the already-updated callback target — pointing at the new review. The orchestrator catches this error synchronously and falls back to a fresh session, but the stale failure callback races ahead and marks the new review as failed permanently. Thread a suppressCallbackOnError flag through executeDirectly → failExecution → updateExecutionStatus so that followup-path failures still persist terminal state in the DO but skip the callback enqueue. The initiate and initiatePrepared paths are unchanged since they have no synchronous caller.
Defense-in-depth guard in the code-review-status handler: before accepting a status update, check whether the callback's sessionId differs from the review's current session_id. If both are non-null and don't match, the callback belongs to a superseded session and is silently discarded. This prevents race conditions where a failure callback from an old session corrupts a review that has already been picked up by a new fallback session.
E2E Test ResultsTarget PR: alex-alecu/stories-canvas#18 Test 1: Full Review (PR Opened)
Test 2: Follow-up Review (PR Synchronize) — Incremental Mode
Comparison: Incremental vs Non-Incremental Follow-up
Key Findings
Cleanup
|
## Summary Follow-up for #927 The incremental review feature flag was evaluated with a hardcoded `'server-config-fetch'` distinctId instead of the actual owner's userId, making per-user/org targeting in PostHog non-functional. This meant the flag always resolved the same way regardless of which user or org triggered the review. Additionally, the entire incremental review decision path had zero logging, making it impossible to debug from Axiom why a review used full mode instead of incremental. This PR passes `owner.userId` to the flag evaluation (matching the pattern already used for the PR gate flag in the same file) and adds log lines at each decision point: flag evaluation result, previous review lookup outcome, and the prompt workflow gate that selects incremental vs full mode. ## Verification - [x] `pnpm typecheck` — passes cleanly across all workspace projects ## Visual Changes N/A ## Reviewer Notes - The flag fix on line 325 mirrors the existing correct pattern at line 410 (`isFeatureFlagEnabled('code-review-pr-gate', owner.userId)`). - `owner.userId` is always present — it's a required field in both variants of the `OwnerSchema` discriminated union. For orgs, it's a synthetic bot ID like `'bot-code-review-{orgId}'`. - Logging uses `logExceptInTest` consistently and truncates SHAs to 8 characters, matching existing conventions. - Issue #3 from the investigation (rapid push cancellation preventing incremental base) is intentionally not addressed here — it's an architectural design constraint that needs product discussion.
Summary
On follow-up pushes, code reviews were starting completely from scratch — re-reading every file, re-analyzing everything. This PR adds incremental mode: when a previous completed review exists for the same PR, the agent gets the old summary + inline comments + previous HEAD SHA so it can
git diffonly what changed.This feature is gated behind the
code-review-incrementalPostHog feature flag (checked inprepareReviewPayload). When the flag is off, all follow-up pushes continue to trigger full reviews — no behavior change for existing users.findPreviousCompletedReviewto find the last completed review SHA and session ID (single row, avoids mismatch)incrementalReviewWorkflowprompt section in both GitHub and GitLab templatesgenerateReviewPromptswaps in the incremental workflow when conditions are metprepareReviewPayloadchecks thecode-review-incrementalfeature flag, looks up the previous SHA, and passes it throughsendMessageV2with fallback to fresh sessionsuppressCallbackOnErrorflag to prevent callback race during fallbackcreateCloudAgentNextFetchClient()in worker-utils to eliminate duplicated tRPC parsingVerification
sendMessageV2, fell back to fresh session (expected local Docker limitation — snapshots not persisted in wrangler dev)suppressCallbackOnErrorprevents stale failure callback from corrupting new review record during fallbackhead_shaandsession_idfrom the same DB rowfindPreviousCompletedReview,findPreviousCompletedReviewSession, and incremental prompt generationVisual Changes
N/A
Reviewer Notes
code-review-incrementalfeature flag (PostHog,isFeatureFlagEnabledatsrc/lib/code-reviews/triggers/prepare-review-payload.ts:325) controls the entire incremental path. With the flag off, the code is completely inert.sendMessageV2will only work in production (Cloudflare durable storage for snapshots). Locally it always falls back to a fresh session with the incremental prompt, which still achieves the token savings.suppressCallbackOnErrorflag threaded throughexecuteDirectly.