feat(code-reviews): incremental reviews on follow-up pushes by alex-alecu · Pull Request #927 · Kilo-Org/cloud

alex-alecu · 2026-03-09T08:32:46Z

Summary

On follow-up pushes, code reviews were starting completely from scratch — re-reading every file, re-analyzing everything. This PR adds incremental mode: when a previous completed review exists for the same PR, the agent gets the old summary + inline comments + previous HEAD SHA so it can git diff only what changed.

This feature is gated behind the code-review-incremental PostHog feature flag (checked in prepareReviewPayload). When the flag is off, all follow-up pushes continue to trigger full reviews — no behavior change for existing users.

New DB query findPreviousCompletedReview to find the last completed review SHA and session ID (single row, avoids mismatch)
New incrementalReviewWorkflow prompt section in both GitHub and GitLab templates
generateReviewPrompt swaps in the incremental workflow when conditions are met
prepareReviewPayload checks the code-review-incremental feature flag, looks up the previous SHA, and passes it through
Session continuation via sendMessageV2 with fallback to fresh session
suppressCallbackOnError flag to prevent callback race during fallback
Shared createCloudAgentNextFetchClient() in worker-utils to eliminate duplicated tRPC parsing
Falls back to full review when: flag is off, no prior completed review, no summary comment, or force push broke history

Verification

E2E test: PR opened (full review) — completed, 222,994 tokens in, 350,744 musd cost
E2E test: PR synchronize (incremental review, flag ON) — completed, 31,390 tokens in (-86%), 66,204 musd cost (-81%)
E2E test: PR synchronize (non-incremental, flag OFF) — completed, 372,368 tokens in (+67% vs initial), confirming old behavior unchanged
Session continuation attempted via sendMessageV2, fell back to fresh session (expected local Docker limitation — snapshots not persisted in wrangler dev)
No callback race bug — suppressCallbackOnError prevents stale failure callback from corrupting new review record during fallback
Incremental detection correctly finds previous completed review's head_sha and session_id from the same DB row
Unit tests pass for findPreviousCompletedReview, findPreviousCompletedReviewSession, and incremental prompt generation

Visual Changes

N/A

Reviewer Notes

The code-review-incremental feature flag (PostHog, isFeatureFlagEnabled at src/lib/code-reviews/triggers/prepare-review-payload.ts:325) controls the entire incremental path. With the flag off, the code is completely inert.
Session continuation via sendMessageV2 will only work in production (Cloudflare durable storage for snapshots). Locally it always falls back to a fresh session with the incremental prompt, which still achieves the token savings.
The CRITICAL issue from bot review (callback target reassigned before session confirmed idle) is addressed by the suppressCallbackOnError flag threaded through executeDirectly.
Token savings on incremental follow-up: 86% fewer input tokens, 81% lower cost compared to the initial full review.

kilo-code-bot · 2026-03-09T08:36:19Z

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	1
WARNING	1
SUGGESTION	0

Issue Details (click to expand)

CRITICAL

File	Line	Issue
`cloudflare-code-review-infra/src/code-review-orchestrator.ts`	726	Reassigning the reused session's callback target before the session is known to be idle can still route a stale callback into the new review.

WARNING

File	Line	Issue
`src/lib/code-reviews/triggers/prepare-review-payload.ts`	325	Incremental-review rollout still uses the shared PostHog bucket instead of an owner-specific distinct ID, so enablement can flip globally across unrelated reviews.

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

None.

Files Reviewed (15 files)

cloud-agent-next/src/persistence/CloudAgentSession.ts - 0 issues
cloudflare-code-review-infra/src/code-review-orchestrator.ts - 1 issue
cloudflare-code-review-infra/src/index.ts - 0 issues
cloudflare-code-review-infra/src/types.ts - 0 issues
packages/worker-utils/src/cloud-agent-next-client.ts - 0 issues
packages/worker-utils/src/index.ts - 0 issues
src/app/api/internal/code-review-status/[reviewId]/route.ts - 0 issues
src/lib/code-reviews/core/constants.ts - 0 issues
src/lib/code-reviews/db/code-reviews.test.ts - 0 issues
src/lib/code-reviews/db/code-reviews.ts - 0 issues
src/lib/code-reviews/prompts/default-prompt-template-gitlab.json - 0 issues
src/lib/code-reviews/prompts/default-prompt-template.json - 0 issues
src/lib/code-reviews/prompts/generate-prompt.test.ts - 0 issues
src/lib/code-reviews/prompts/generate-prompt.ts - 0 issues
src/lib/code-reviews/triggers/prepare-review-payload.ts - 1 issue

Fix these issues in Kilo Cloud

_{Reviewed by gpt-5.4-20260305 · 2,478,044 tokens}

…its it

kilo-code-bot · 2026-03-09T11:56:54Z

-    // 4. Generate auth token for cloud agent with bot identifier
+    // 4. Check for previous completed review (incremental review optimization)
+    let previousHeadSha: string | null = null;
+    const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW);


WARNING: Incremental-review rollout is evaluated against a shared PostHog bucket

isFeatureFlagEnabled() defaults distinctId to 'server-config-fetch', so this call ignores the current review owner. Any percentage rollout or targeted rule for code-review-incremental will therefore turn incremental mode on for everyone or no one instead of the intended subset. Pass the same distinct ID you already use for code-review flags here.

Suggested change

const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW);

const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW, owner.userId);

@Kilo fix it

I couldn't apply this fix automatically this time.

The auto-fix run failed before it could push an updated commit.

Next steps:

Retry with more guidance, or apply the fix manually.

Session ID: agent_4ab0cb7c-0039-49da-a9ea-b4bd6a84015e

Technical details

Network connection lost.

@Kilo fix it

Implemented the requested fix around src/lib/code-reviews/triggers/prepare-review-payload.ts:318 and pushed 622ed7e3.

When a follow-up push occurs on a PR that already has a completed review, the new review now receives the previous review's head SHA. The prompt includes an incremental review strategy section that instructs the agent to: - Focus on the incremental diff (git diff <previous_sha>..HEAD) - Check if previously flagged issues were fixed - Only flag new issues in new/modified code - Re-check touched files holistically - Update the summary to reflect current state Changes: - Add findLastCompletedReviewForPR() DB query - Thread previousReviewHeadSha through dispatch → payload → prompt - Add incremental review context section to generated prompt - Add 4 tests for incremental review prompt generation Refs: #927

Allow follow-up executions to override the stored callback URL. The DO updates its metadata before executing, so completion notifications route to the new review's endpoint.

Add findPreviousCompletedReviewSession query that returns the cloud-agent session ID from the most recent completed review for the same PR. Include previousCloudAgentSessionId in the CodeReviewPayload so the orchestrator can reuse it.

On follow-up pushes, reuse the existing cloud-agent session via sendMessageV2 instead of creating a fresh one. This preserves conversation context and avoids re-cloning the repo. Falls back to prepareSession + initiate if sendMessageV2 fails (session expired, sandbox evicted, execution in progress, etc).

# Conflicts: # src/lib/code-reviews/db/code-reviews.ts

callbackTarget on the protectedProcedure sendMessageV2 endpoint allowed any authenticated user to override the session callback URL, creating an SSRF vector. Move callback setup to the internal-only updateSession endpoint (requires x-internal-api-key). The orchestrator now calls updateSession before sendMessageV2 to set the callback securely. Also fixes merge conflicts: duplicate isFeatureFlagEnabled import and missing platform field in test helper.

kilo-code-bot · 2026-03-11T10:18:14Z

+    let previousCloudAgentSessionId: string | undefined;
+    if (incrementalEnabled) {
+      try {
+        const previousSession = await findPreviousCompletedReviewSession(


WARNING: Session continuation can target a different review than the incremental diff

findPreviousCompletedReview() above picks the newest completed review, but this second lookup can fall back to an older row that merely has a session_id (including legacy v1 sessions). That means the prompt diffs against one SHA while sendMessageV2 resumes a different session, so the reused workspace/conversation no longer matches the incremental context. Derive both values from the same review row, or only enable continuation when the exact review chosen for previousHeadSha has a resumable v2 session.

…ame review row Previously findPreviousCompletedReview and findPreviousCompletedReviewSession were separate DB queries that could return different review rows when the most recent completed review lacked a session_id. This caused the incremental diff to target one SHA while session continuation resumed a workspace from a different, older review. Consolidate into a single findPreviousCompletedReview that returns both head_sha and session_id (nullable), ensuring both values always come from the same row.

…into worker-utils The code-review orchestrator Durable Object used raw fetch() calls to cloud-agent-next tRPC endpoints, duplicating request/response parsing that the Next.js CloudAgentNextClient already handled. Any API shape change required updating both codebases independently. Add a lightweight, fetch-based createCloudAgentNextFetchClient() to @kilocode/worker-utils with typed methods for prepareSession, initiateFromPreparedSession, updateSession, sendMessageV2, and interruptSession. Refactor the orchestrator to use the shared client, eliminating ~120 lines of manual tRPC envelope parsing.

kilo-code-bot · 2026-03-11T12:25:28Z

+        },
+      };
+
+      await client.updateSession(internalHeaders, {


CRITICAL: Callback target is reassigned before the reused session is known to be idle

updateSession() runs before sendMessageV2(). If another follow-up review is already using previousSessionId, this call can successfully rewrite that shared session's callbackTarget to the new reviewId, then sendMessageV2() returns 409 and we fall back to a fresh session. When the in-flight execution on the reused session finishes, its terminal callback is delivered to the wrong review record. Please avoid mutating callbackTarget until you've confirmed this session can actually start the follow-up, or restore the old target on fallback.

alex-alecu · 2026-03-11T13:21:42Z

E2E Test Results

Public repo PR https://github.com/alex-alecu/stories-canvas/pull/18

Test 1: Full Review (PR Opened) — `code-review-incremental` flag OFF

Metric	Value
Review ID	`43eae168-9c1e-4116-8273-71e45263d59f`
Status	completed
Head SHA	`ed1afc8e`
Session ID	`agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd`
CLI Session	`ses_32303b14fffeBCURYFPSIeAtmo`
Tokens In	222,994
Tokens Out	9,295
Cost (musd)	350,744
Duration	~3 min 26s

Test 2a: Follow-up Review Without Incremental Flag — fresh session, no session continuation

Metric	Value
Review ID	`71838edd-39fa-4d29-9522-598e3a54e0bd`
Status	completed
Head SHA	`b90f8e64`
Session ID	`agent_dfddb931-14d2-4d13-a50e-c3a3c483473e` (fresh)
CLI Session	`ses_322ffd210ffeW08ttZfQf0HrIu`
Tokens In	433,133
Tokens Out	8,009
Cost (musd)	412,396

This was a full review from scratch (no incremental mode), showing the old behavior: the follow-up review used ~2x the tokens of the initial review.

Test 2b: Follow-up Review With Incremental Flag ON — session continuation attempted

Metric	Value
Review ID	`bac2a9a8-e486-4d46-9358-3f8371dfa0f6`
Status	failed (due to callback race bug)
Head SHA	`b90f8e64`
Session ID (attempted)	`agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd` (reuse)
Fallback Session ID	`agent_0a2976af-efb2-4150-89b6-32c62c790a12`

Key Findings

1. Incremental Review Detection Works:

prepareReviewPayload correctly found the previous completed review's head_sha and session_id
Log: [prepareReviewPayload] Found previous completed review for incremental mode { previousHeadSha: 'ed1afc8e', previousCloudAgentSessionId: 'agent_9ce034ba-...' }

2. Session Continuation Attempted:

The orchestrator correctly routed to sendMessageV2 with the previous session ID
Log: [CodeReviewOrchestrator] Attempting session continuation via sendMessageV2 { previousCloudAgentSessionId: 'agent_9ce034ba-...' }

3. Fallback Path Worked (partially):

sendMessageV2 failed with "Cold-start session restore failed" (expected local Docker limitation - snapshots aren't persisted between container lifecycles in Wrangler dev)
The orchestrator correctly fell back to prepareSession + initiateFromKilocodeSessionV2 with a fresh session

4. Bug Found: Callback target race condition:
This is the same CRITICAL issue the kilo-code-bot review identified on the PR. When updateSession is called before sendMessageV2, it rewrites the callback target on the old session. When sendMessageV2 fails, the old session's failure callback gets delivered to the new review (not the old one), setting the new review's status to failed before the fallback session can update it. The fallback session's running status is then rejected because the review is already in a terminal state.

5. Token Usage Comparison (for completed reviews):

	Initial Review	Follow-up (no incremental)
Tokens In	222,994	433,133 (+94%)
Tokens Out	9,295	8,009
Cost (musd)	350,744	412,396 (+18%)

The follow-up review without incremental mode used nearly 2x the input tokens of the initial review, confirming the value of the incremental approach for token savings.

When sendMessageV2 fails during session continuation (e.g. workspace restore), executeDirectly was enqueuing a failure callback using the already-updated callback target — pointing at the new review. The orchestrator catches this error synchronously and falls back to a fresh session, but the stale failure callback races ahead and marks the new review as failed permanently. Thread a suppressCallbackOnError flag through executeDirectly → failExecution → updateExecutionStatus so that followup-path failures still persist terminal state in the DO but skip the callback enqueue. The initiate and initiatePrepared paths are unchanged since they have no synchronous caller.

Defense-in-depth guard in the code-review-status handler: before accepting a status update, check whether the callback's sessionId differs from the review's current session_id. If both are non-null and don't match, the callback belongs to a superseded session and is silently discarded. This prevents race conditions where a failure callback from an old session corrupts a review that has already been picked up by a new fallback session.

alex-alecu · 2026-03-11T15:26:02Z

E2E Test Results

Target PR: alex-alecu/stories-canvas#18

Test 1: Full Review (PR Opened)

Metric	Value
Review ID	`43eae168-9c1e-4116-8273-71e45263d59f`
Status	completed
Head SHA	`ed1afc8e`
Session ID	`agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd`
Tokens In	222,994
Tokens Out	9,295
Cost (musd)	350,744
Duration	~3m 26s

Test 2: Follow-up Review (PR Synchronize) — Incremental Mode

Metric	Value
Review ID	`affbd625-a59d-448f-a9af-e8c5c7bce261`
Status	completed
Head SHA	`b90f8e64`
Session ID	`agent_74979a28-2f7f-4b1e-bdcd-2d0bf8ef2ad8` (fresh, via fallback)
Previous Session Attempted	`agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd`
Tokens In	31,390
Tokens Out	892
Cost (musd)	66,204
Duration	~40s

Comparison: Incremental vs Non-Incremental Follow-up

Metric	Full Review (opened)	Non-Incremental Follow-up (deleted)	Incremental Follow-up
Tokens In	222,994	372,368 (+67%)	31,390 (-86%)
Tokens Out	9,295	11,138	892 (-90%)
Cost (musd)	350,744	434,826 (+24%)	66,204 (-81%)

Key Findings

Incremental detection works: prepareReviewPayload correctly found the previous completed review's head_sha and session_id.
Session continuation attempted: The orchestrator correctly routed to sendMessageV2 with the previous session ID.
Expected local fallback: sendMessageV2 failed with "Cold-start session restore failed" (local Docker limitation — container snapshots aren't persisted in wrangler dev). The orchestrator correctly fell back to a fresh session with the incremental prompt.
Massive token savings: The incremental follow-up review used 86% fewer input tokens and 81% less cost compared to the initial full review. The non-incremental follow-up (old behavior) used 67% more tokens than the initial review, confirming the value proposition.
No callback race bug: The suppressCallbackOnError fix from the latest commits prevented the failure callback from corrupting the new review record during fallback.

Cleanup

All services stopped (ports 3000, 8787, 8789, 8794 freed)
No sandbox Docker containers remaining (only dev-postgres-1 running)
Temporary FORCE_INCREMENTAL_REVIEW code change reverted

## Summary Follow-up for #927 The incremental review feature flag was evaluated with a hardcoded `'server-config-fetch'` distinctId instead of the actual owner's userId, making per-user/org targeting in PostHog non-functional. This meant the flag always resolved the same way regardless of which user or org triggered the review. Additionally, the entire incremental review decision path had zero logging, making it impossible to debug from Axiom why a review used full mode instead of incremental. This PR passes `owner.userId` to the flag evaluation (matching the pattern already used for the PR gate flag in the same file) and adds log lines at each decision point: flag evaluation result, previous review lookup outcome, and the prompt workflow gate that selects incremental vs full mode. ## Verification - [x] `pnpm typecheck` — passes cleanly across all workspace projects ## Visual Changes N/A ## Reviewer Notes - The flag fix on line 325 mirrors the existing correct pattern at line 410 (`isFeatureFlagEnabled('code-review-pr-gate', owner.userId)`). - `owner.userId` is always present — it's a required field in both variants of the `OwnerSchema` discriminated union. For orgs, it's a synthetic bot ID like `'bot-code-review-{orgId}'`. - Logging uses `logExceptInTest` consistently and truncates SHAs to 8 characters, matching existing conventions. - Issue #3 from the investigation (rapid push cancellation preventing incremental base) is intentionally not addressed here — it's an architectural design constraint that needs product discussion.

kilo-code-bot Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread src/lib/code-reviews/prompts/generate-prompt.ts

Comment thread src/lib/code-reviews/db/code-reviews.ts Outdated

kilo-code-bot Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread src/lib/code-reviews/db/code-reviews.ts Outdated

alex-alecu added 7 commits March 9, 2026 10:48

add findPreviousCompletedReview DB query

42c6d01

add incremental workflow to prompt templates

5d502ab

add feature flag constant for incremental reviews

4263356

refactor generateReviewPrompt to use options object

ea839cd

add incremental workflow to prompt generation

02e0b11

wire up incremental review in payload preparation

d909df0

add tests for incremental review prompt generation

f787f24

alex-alecu force-pushed the feat/code-review-on-push branch 2 times, most recently from 84d0d37 to f787f24 Compare March 9, 2026 08:51

kilo-code-bot Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread src/lib/code-reviews/prompts/default-prompt-template.json Outdated

Comment thread src/lib/code-reviews/prompts/default-prompt-template-gitlab.json Outdated

Comment thread src/lib/code-reviews/triggers/prepare-review-payload.ts Outdated

alex-alecu added 7 commits March 9, 2026 10:59

backfill incrementalReviewWorkflow from local template when remote om…

b2bff33

…its it

filter by platform in findPreviousCompletedReview

9ab2dbb

order by created_at instead of completed_at in previous review lookup

8940acd

use git pull instead of git fetch in incremental workflow

48f8455

exclude live PR head SHA instead of stored review SHA

2f1a7b7

fix duplicate step numbering in prepareReviewPayload

ce6cc9d

format prepare-review-payload.ts

622ed7e

kilo-code-bot Bot reviewed Mar 9, 2026

View reviewed changes

alex-alecu mentioned this pull request Mar 9, 2026

fix(auto-fix): guard create-pr endpoint against review-comment tickets #946

Merged

3 tasks

alex-alecu added 4 commits March 9, 2026 19:23

accept callbackTarget in sendMessageV2

8ed952f

Allow follow-up executions to override the stored callback URL. The DO updates its metadata before executing, so completion notifications route to the new review's endpoint.

add tests for findPreviousCompletedReviewSession

aaaea85

kilo-code-bot Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread cloud-agent-next/src/router/handlers/session-execution.ts Outdated

alex-alecu added 2 commits March 11, 2026 12:00

Merge remote-tracking branch 'origin/main' into feat/code-review-on-push

e1f9639

# Conflicts: # src/lib/code-reviews/db/code-reviews.ts

Kilo-Org deleted a comment from kilo-code-bot Bot Mar 11, 2026

kilo-code-bot Bot reviewed Mar 11, 2026

View reviewed changes

eshurakov reviewed Mar 11, 2026

View reviewed changes

Comment thread cloudflare-code-review-infra/src/code-review-orchestrator.ts Outdated

eshurakov approved these changes Mar 11, 2026

View reviewed changes

alex-alecu added 3 commits March 11, 2026 14:08

fix: resolve lint errors in cloud-agent-next client and orchestrator

8c85efe

kilo-code-bot Bot reviewed Mar 11, 2026

View reviewed changes

alex-alecu added 2 commits March 11, 2026 15:59

alex-alecu merged commit 24cb496 into main Mar 11, 2026
18 checks passed

alex-alecu deleted the feat/code-review-on-push branch March 11, 2026 15:35

alex-alecu mentioned this pull request Mar 18, 2026

fix(code-reviews): fix incremental review flag and add logging #1207

Merged

1 task

	const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW);
	const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW, owner.userId);

Conversation

alex-alecu commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Visual Changes

Reviewer Notes

Uh oh!

Uh oh!

Uh oh!

kilo-code-bot Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

CRITICAL

WARNING

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kilo-code-bot Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

alex-alecu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

alex-alecu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kilo-code-bot Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kilo-code-bot Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

alex-alecu commented Mar 11, 2026

E2E Test Results

Test 1: Full Review (PR Opened) — code-review-incremental flag OFF

Test 2a: Follow-up Review Without Incremental Flag — fresh session, no session continuation

Test 2b: Follow-up Review With Incremental Flag ON — session continuation attempted

Key Findings

Uh oh!

alex-alecu commented Mar 11, 2026

E2E Test Results

Test 1: Full Review (PR Opened)

Test 2: Follow-up Review (PR Synchronize) — Incremental Mode

Comparison: Incremental vs Non-Incremental Follow-up

Key Findings

Cleanup

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-alecu commented Mar 9, 2026 •

edited

Loading

kilo-code-bot Bot commented Mar 9, 2026 •

edited

Loading

Test 1: Full Review (PR Opened) — `code-review-incremental` flag OFF