Skip to content

feat(code-reviews): incremental reviews on follow-up pushes#927

Merged
alex-alecu merged 25 commits intomainfrom
feat/code-review-on-push
Mar 11, 2026
Merged

feat(code-reviews): incremental reviews on follow-up pushes#927
alex-alecu merged 25 commits intomainfrom
feat/code-review-on-push

Conversation

@alex-alecu
Copy link
Copy Markdown
Contributor

@alex-alecu alex-alecu commented Mar 9, 2026

Summary

On follow-up pushes, code reviews were starting completely from scratch — re-reading every file, re-analyzing everything. This PR adds incremental mode: when a previous completed review exists for the same PR, the agent gets the old summary + inline comments + previous HEAD SHA so it can git diff only what changed.

This feature is gated behind the code-review-incremental PostHog feature flag (checked in prepareReviewPayload). When the flag is off, all follow-up pushes continue to trigger full reviews — no behavior change for existing users.

  • New DB query findPreviousCompletedReview to find the last completed review SHA and session ID (single row, avoids mismatch)
  • New incrementalReviewWorkflow prompt section in both GitHub and GitLab templates
  • generateReviewPrompt swaps in the incremental workflow when conditions are met
  • prepareReviewPayload checks the code-review-incremental feature flag, looks up the previous SHA, and passes it through
  • Session continuation via sendMessageV2 with fallback to fresh session
  • suppressCallbackOnError flag to prevent callback race during fallback
  • Shared createCloudAgentNextFetchClient() in worker-utils to eliminate duplicated tRPC parsing
  • Falls back to full review when: flag is off, no prior completed review, no summary comment, or force push broke history

Verification

  • E2E test: PR opened (full review) — completed, 222,994 tokens in, 350,744 musd cost
  • E2E test: PR synchronize (incremental review, flag ON) — completed, 31,390 tokens in (-86%), 66,204 musd cost (-81%)
  • E2E test: PR synchronize (non-incremental, flag OFF) — completed, 372,368 tokens in (+67% vs initial), confirming old behavior unchanged
  • Session continuation attempted via sendMessageV2, fell back to fresh session (expected local Docker limitation — snapshots not persisted in wrangler dev)
  • No callback race bug — suppressCallbackOnError prevents stale failure callback from corrupting new review record during fallback
  • Incremental detection correctly finds previous completed review's head_sha and session_id from the same DB row
  • Unit tests pass for findPreviousCompletedReview, findPreviousCompletedReviewSession, and incremental prompt generation

Visual Changes

N/A

Reviewer Notes

  • The code-review-incremental feature flag (PostHog, isFeatureFlagEnabled at src/lib/code-reviews/triggers/prepare-review-payload.ts:325) controls the entire incremental path. With the flag off, the code is completely inert.
  • Session continuation via sendMessageV2 will only work in production (Cloudflare durable storage for snapshots). Locally it always falls back to a fresh session with the incremental prompt, which still achieves the token savings.
  • The CRITICAL issue from bot review (callback target reassigned before session confirmed idle) is addressed by the suppressCallbackOnError flag threaded through executeDirectly.
  • Token savings on incremental follow-up: 86% fewer input tokens, 81% lower cost compared to the initial full review.

Comment thread src/lib/code-reviews/prompts/generate-prompt.ts
Comment thread src/lib/code-reviews/db/code-reviews.ts Outdated
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Mar 9, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 1
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

CRITICAL

File Line Issue
cloudflare-code-review-infra/src/code-review-orchestrator.ts 726 Reassigning the reused session's callback target before the session is known to be idle can still route a stale callback into the new review.

WARNING

File Line Issue
src/lib/code-reviews/triggers/prepare-review-payload.ts 325 Incremental-review rollout still uses the shared PostHog bucket instead of an owner-specific distinct ID, so enablement can flip globally across unrelated reviews.
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

None.

Files Reviewed (15 files)
  • cloud-agent-next/src/persistence/CloudAgentSession.ts - 0 issues
  • cloudflare-code-review-infra/src/code-review-orchestrator.ts - 1 issue
  • cloudflare-code-review-infra/src/index.ts - 0 issues
  • cloudflare-code-review-infra/src/types.ts - 0 issues
  • packages/worker-utils/src/cloud-agent-next-client.ts - 0 issues
  • packages/worker-utils/src/index.ts - 0 issues
  • src/app/api/internal/code-review-status/[reviewId]/route.ts - 0 issues
  • src/lib/code-reviews/core/constants.ts - 0 issues
  • src/lib/code-reviews/db/code-reviews.test.ts - 0 issues
  • src/lib/code-reviews/db/code-reviews.ts - 0 issues
  • src/lib/code-reviews/prompts/default-prompt-template-gitlab.json - 0 issues
  • src/lib/code-reviews/prompts/default-prompt-template.json - 0 issues
  • src/lib/code-reviews/prompts/generate-prompt.test.ts - 0 issues
  • src/lib/code-reviews/prompts/generate-prompt.ts - 0 issues
  • src/lib/code-reviews/triggers/prepare-review-payload.ts - 1 issue

Fix these issues in Kilo Cloud


Reviewed by gpt-5.4-20260305 · 2,478,044 tokens

Comment thread src/lib/code-reviews/db/code-reviews.ts Outdated
@alex-alecu alex-alecu force-pushed the feat/code-review-on-push branch 2 times, most recently from 84d0d37 to f787f24 Compare March 9, 2026 08:51
Comment thread src/lib/code-reviews/prompts/default-prompt-template.json Outdated
Comment thread src/lib/code-reviews/prompts/default-prompt-template-gitlab.json Outdated
Comment thread src/lib/code-reviews/triggers/prepare-review-payload.ts Outdated
// 4. Generate auth token for cloud agent with bot identifier
// 4. Check for previous completed review (incremental review optimization)
let previousHeadSha: string | null = null;
const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Incremental-review rollout is evaluated against a shared PostHog bucket

isFeatureFlagEnabled() defaults distinctId to 'server-config-fetch', so this call ignores the current review owner. Any percentage rollout or targeted rule for code-review-incremental will therefore turn incremental mode on for everyone or no one instead of the intended subset. Pass the same distinct ID you already use for code-review flags here.

Suggested change
const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW);
const incrementalEnabled = await isFeatureFlagEnabled(FEATURE_FLAG_INCREMENTAL_REVIEW, owner.userId);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo fix it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't apply this fix automatically this time.

The auto-fix run failed before it could push an updated commit.

Next steps:

  • Retry with more guidance, or apply the fix manually.
  • Session ID: agent_4ab0cb7c-0039-49da-a9ea-b4bd6a84015e
Technical details
Network connection lost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo fix it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the requested fix around src/lib/code-reviews/triggers/prepare-review-payload.ts:318 and pushed 622ed7e3.

kilo-code-bot Bot added a commit that referenced this pull request Mar 9, 2026
When a follow-up push occurs on a PR that already has a completed review,
the new review now receives the previous review's head SHA. The prompt
includes an incremental review strategy section that instructs the agent to:

- Focus on the incremental diff (git diff <previous_sha>..HEAD)
- Check if previously flagged issues were fixed
- Only flag new issues in new/modified code
- Re-check touched files holistically
- Update the summary to reflect current state

Changes:
- Add findLastCompletedReviewForPR() DB query
- Thread previousReviewHeadSha through dispatch → payload → prompt
- Add incremental review context section to generated prompt
- Add 4 tests for incremental review prompt generation

Refs: #927
Allow follow-up executions to override the stored callback URL.
The DO updates its metadata before executing, so completion
notifications route to the new review's endpoint.
Add findPreviousCompletedReviewSession query that returns the
cloud-agent session ID from the most recent completed review
for the same PR. Include previousCloudAgentSessionId in the
CodeReviewPayload so the orchestrator can reuse it.
On follow-up pushes, reuse the existing cloud-agent session via
sendMessageV2 instead of creating a fresh one. This preserves
conversation context and avoids re-cloning the repo.

Falls back to prepareSession + initiate if sendMessageV2 fails
(session expired, sandbox evicted, execution in progress, etc).
Comment thread cloud-agent-next/src/router/handlers/session-execution.ts Outdated
# Conflicts:
#	src/lib/code-reviews/db/code-reviews.ts
callbackTarget on the protectedProcedure sendMessageV2 endpoint allowed
any authenticated user to override the session callback URL, creating an
SSRF vector. Move callback setup to the internal-only updateSession
endpoint (requires x-internal-api-key). The orchestrator now calls
updateSession before sendMessageV2 to set the callback securely.

Also fixes merge conflicts: duplicate isFeatureFlagEnabled import and
missing platform field in test helper.
@Kilo-Org Kilo-Org deleted a comment from kilo-code-bot Bot Mar 11, 2026
let previousCloudAgentSessionId: string | undefined;
if (incrementalEnabled) {
try {
const previousSession = await findPreviousCompletedReviewSession(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Session continuation can target a different review than the incremental diff

findPreviousCompletedReview() above picks the newest completed review, but this second lookup can fall back to an older row that merely has a session_id (including legacy v1 sessions). That means the prompt diffs against one SHA while sendMessageV2 resumes a different session, so the reused workspace/conversation no longer matches the incremental context. Derive both values from the same review row, or only enable continuation when the exact review chosen for previousHeadSha has a resumable v2 session.

Comment thread cloudflare-code-review-infra/src/code-review-orchestrator.ts Outdated
…ame review row

Previously findPreviousCompletedReview and findPreviousCompletedReviewSession
were separate DB queries that could return different review rows when the most
recent completed review lacked a session_id. This caused the incremental diff
to target one SHA while session continuation resumed a workspace from a
different, older review.

Consolidate into a single findPreviousCompletedReview that returns both
head_sha and session_id (nullable), ensuring both values always come from
the same row.
…into worker-utils

The code-review orchestrator Durable Object used raw fetch() calls to
cloud-agent-next tRPC endpoints, duplicating request/response parsing
that the Next.js CloudAgentNextClient already handled. Any API shape
change required updating both codebases independently.

Add a lightweight, fetch-based createCloudAgentNextFetchClient() to
@kilocode/worker-utils with typed methods for prepareSession,
initiateFromPreparedSession, updateSession, sendMessageV2, and
interruptSession. Refactor the orchestrator to use the shared client,
eliminating ~120 lines of manual tRPC envelope parsing.
},
};

await client.updateSession(internalHeaders, {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: Callback target is reassigned before the reused session is known to be idle

updateSession() runs before sendMessageV2(). If another follow-up review is already using previousSessionId, this call can successfully rewrite that shared session's callbackTarget to the new reviewId, then sendMessageV2() returns 409 and we fall back to a fresh session. When the in-flight execution on the reused session finishes, its terminal callback is delivered to the wrong review record. Please avoid mutating callbackTarget until you've confirmed this session can actually start the follow-up, or restore the old target on fallback.

@alex-alecu
Copy link
Copy Markdown
Contributor Author

E2E Test Results

Public repo PR https://github.com/alex-alecu/stories-canvas/pull/18

Test 1: Full Review (PR Opened) — code-review-incremental flag OFF

Metric Value
Review ID 43eae168-9c1e-4116-8273-71e45263d59f
Status completed
Head SHA ed1afc8e
Session ID agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd
CLI Session ses_32303b14fffeBCURYFPSIeAtmo
Tokens In 222,994
Tokens Out 9,295
Cost (musd) 350,744
Duration ~3 min 26s

Test 2a: Follow-up Review Without Incremental Flag — fresh session, no session continuation

Metric Value
Review ID 71838edd-39fa-4d29-9522-598e3a54e0bd
Status completed
Head SHA b90f8e64
Session ID agent_dfddb931-14d2-4d13-a50e-c3a3c483473e (fresh)
CLI Session ses_322ffd210ffeW08ttZfQf0HrIu
Tokens In 433,133
Tokens Out 8,009
Cost (musd) 412,396

This was a full review from scratch (no incremental mode), showing the old behavior: the follow-up review used ~2x the tokens of the initial review.

Test 2b: Follow-up Review With Incremental Flag ON — session continuation attempted

Metric Value
Review ID bac2a9a8-e486-4d46-9358-3f8371dfa0f6
Status failed (due to callback race bug)
Head SHA b90f8e64
Session ID (attempted) agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd (reuse)
Fallback Session ID agent_0a2976af-efb2-4150-89b6-32c62c790a12

Key Findings

1. Incremental Review Detection Works:

  • prepareReviewPayload correctly found the previous completed review's head_sha and session_id
  • Log: [prepareReviewPayload] Found previous completed review for incremental mode { previousHeadSha: 'ed1afc8e', previousCloudAgentSessionId: 'agent_9ce034ba-...' }

2. Session Continuation Attempted:

  • The orchestrator correctly routed to sendMessageV2 with the previous session ID
  • Log: [CodeReviewOrchestrator] Attempting session continuation via sendMessageV2 { previousCloudAgentSessionId: 'agent_9ce034ba-...' }

3. Fallback Path Worked (partially):

  • sendMessageV2 failed with "Cold-start session restore failed" (expected local Docker limitation - snapshots aren't persisted between container lifecycles in Wrangler dev)
  • The orchestrator correctly fell back to prepareSession + initiateFromKilocodeSessionV2 with a fresh session

4. Bug Found: Callback target race condition:
This is the same CRITICAL issue the kilo-code-bot review identified on the PR. When updateSession is called before sendMessageV2, it rewrites the callback target on the old session. When sendMessageV2 fails, the old session's failure callback gets delivered to the new review (not the old one), setting the new review's status to failed before the fallback session can update it. The fallback session's running status is then rejected because the review is already in a terminal state.

5. Token Usage Comparison (for completed reviews):

Initial Review Follow-up (no incremental)
Tokens In 222,994 433,133 (+94%)
Tokens Out 9,295 8,009
Cost (musd) 350,744 412,396 (+18%)

The follow-up review without incremental mode used nearly 2x the input tokens of the initial review, confirming the value of the incremental approach for token savings.

When sendMessageV2 fails during session continuation (e.g. workspace
restore), executeDirectly was enqueuing a failure callback using the
already-updated callback target — pointing at the new review.  The
orchestrator catches this error synchronously and falls back to a fresh
session, but the stale failure callback races ahead and marks the new
review as failed permanently.

Thread a suppressCallbackOnError flag through executeDirectly →
failExecution → updateExecutionStatus so that followup-path failures
still persist terminal state in the DO but skip the callback enqueue.
The initiate and initiatePrepared paths are unchanged since they have
no synchronous caller.
Defense-in-depth guard in the code-review-status handler: before
accepting a status update, check whether the callback's sessionId
differs from the review's current session_id.  If both are non-null
and don't match, the callback belongs to a superseded session and is
silently discarded.  This prevents race conditions where a failure
callback from an old session corrupts a review that has already been
picked up by a new fallback session.
@alex-alecu
Copy link
Copy Markdown
Contributor Author

E2E Test Results

Target PR: alex-alecu/stories-canvas#18

Test 1: Full Review (PR Opened)

Metric Value
Review ID 43eae168-9c1e-4116-8273-71e45263d59f
Status completed
Head SHA ed1afc8e
Session ID agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd
Tokens In 222,994
Tokens Out 9,295
Cost (musd) 350,744
Duration ~3m 26s

Test 2: Follow-up Review (PR Synchronize) — Incremental Mode

Metric Value
Review ID affbd625-a59d-448f-a9af-e8c5c7bce261
Status completed
Head SHA b90f8e64
Session ID agent_74979a28-2f7f-4b1e-bdcd-2d0bf8ef2ad8 (fresh, via fallback)
Previous Session Attempted agent_9ce034ba-9f35-4b41-a334-ff5213fc52bd
Tokens In 31,390
Tokens Out 892
Cost (musd) 66,204
Duration ~40s

Comparison: Incremental vs Non-Incremental Follow-up

Metric Full Review (opened) Non-Incremental Follow-up (deleted) Incremental Follow-up
Tokens In 222,994 372,368 (+67%) 31,390 (-86%)
Tokens Out 9,295 11,138 892 (-90%)
Cost (musd) 350,744 434,826 (+24%) 66,204 (-81%)

Key Findings

  1. Incremental detection works: prepareReviewPayload correctly found the previous completed review's head_sha and session_id.

  2. Session continuation attempted: The orchestrator correctly routed to sendMessageV2 with the previous session ID.

  3. Expected local fallback: sendMessageV2 failed with "Cold-start session restore failed" (local Docker limitation — container snapshots aren't persisted in wrangler dev). The orchestrator correctly fell back to a fresh session with the incremental prompt.

  4. Massive token savings: The incremental follow-up review used 86% fewer input tokens and 81% less cost compared to the initial full review. The non-incremental follow-up (old behavior) used 67% more tokens than the initial review, confirming the value proposition.

  5. No callback race bug: The suppressCallbackOnError fix from the latest commits prevented the failure callback from corrupting the new review record during fallback.

Cleanup

  • All services stopped (ports 3000, 8787, 8789, 8794 freed)
  • No sandbox Docker containers remaining (only dev-postgres-1 running)
  • Temporary FORCE_INCREMENTAL_REVIEW code change reverted

@alex-alecu alex-alecu merged commit 24cb496 into main Mar 11, 2026
18 checks passed
@alex-alecu alex-alecu deleted the feat/code-review-on-push branch March 11, 2026 15:35
alex-alecu added a commit that referenced this pull request Mar 18, 2026
## Summary

Follow-up for #927

The incremental review feature flag was evaluated with a hardcoded
`'server-config-fetch'` distinctId instead of the actual owner's userId,
making per-user/org targeting in PostHog non-functional. This meant the
flag always resolved the same way regardless of which user or org
triggered the review. Additionally, the entire incremental review
decision path had zero logging, making it impossible to debug from Axiom
why a review used full mode instead of incremental.

This PR passes `owner.userId` to the flag evaluation (matching the
pattern already used for the PR gate flag in the same file) and adds log
lines at each decision point: flag evaluation result, previous review
lookup outcome, and the prompt workflow gate that selects incremental vs
full mode.

## Verification

- [x] `pnpm typecheck` — passes cleanly across all workspace projects

## Visual Changes

N/A

## Reviewer Notes

- The flag fix on line 325 mirrors the existing correct pattern at line
410 (`isFeatureFlagEnabled('code-review-pr-gate', owner.userId)`).
- `owner.userId` is always present — it's a required field in both
variants of the `OwnerSchema` discriminated union. For orgs, it's a
synthetic bot ID like `'bot-code-review-{orgId}'`.
- Logging uses `logExceptInTest` consistently and truncates SHAs to 8
characters, matching existing conventions.
- Issue #3 from the investigation (rapid push cancellation preventing
incremental base) is intentionally not addressed here — it's an
architectural design constraint that needs product discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants