Conversation
…resurrection ClearProcessingState is called by CompleteResponse, AbortSessionAsync, and error recovery paths to reset all processing state. Previously it did NOT increment ProcessingGeneration, which meant any InvokeOnUI callback that captured the generation before ClearProcessingState ran would see a matching generation and could re-set IsProcessing=true on a completed session. This was the root cause of the resurrection bug found in PR #600 (took 3 in the InvokeOnUI lambda. This change closes the entire class of bugs at the source: now the generation guard itself catches completed turns, and There are 4 InvokeOnUI-with-generation-guard sites today (sibling reconnect, checks, so this is a no-op for current code — but it protects any future callback that follows the capture-gen → InvokeOnUI → check-gen pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Multi-Model Code Review — PR #612 (Final Re-Review)fix: ClearProcessingState increments ProcessingGeneration to prevent resurrection
Previous Review Findings — All Addressed
New Questions From This RoundIs the missing overflow guard on STEER-ERROR/MCP-reload a problem?No (3/3 reviewers agree). Is
|
…yFailure + add resurrection test 1. ClearProcessingStateForRecoveryFailure now increments ProcessingGeneration before IsProcessing=false, matching ClearProcessingState. This closes the same resurrection gap for the 3 permission-recovery failure paths. 2. Added GenerationGuard_PreventsResurrection_AfterCompletion test that demonstrates the actual resurrection scenario: captures gen before completion, completes the turn, verifies the stale callback would see a generation mismatch. All 3477 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
✅ Expert Code Review completed successfully! |
This comment has been minimized.
This comment has been minimized.
… guard overflow Addresses expert review findings: 1. STEER-ERROR path (~line 4484) and MCP-reload path (~line 4764) both set IsProcessing=false inline without incrementing ProcessingGeneration. Added the increment before IsProcessing=false in both paths. 2. All 4 increment sites now guard against long.MaxValue overflow on orphaned states (ProcessingGeneration set to long.MaxValue during orphaning). Incrementing would wrap to long.MinValue — unreachable in practice but worth guarding defensively. All 3477 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
✅ Expert Code Review completed successfully! |
🟡 Design-Level Findings (Outside Diff)Two 1.
|
There was a problem hiding this comment.
Expert Code Review — PR #612
Methodology: 3 independent reviewers with adversarial consensus. Findings required 2/3+ agreement to be included. Single-reviewer findings were escalated to the other 2 for follow-up before inclusion/discard.
CI Status
Tests pass (66/66 in ChatExperienceSafetyTests). Build succeeds.
Findings Summary
| # | Severity | Finding | Consensus | Location |
|---|---|---|---|---|
| 1 | 🟡 MODERATE | STEER-ERROR path missing long.MaxValue overflow guard |
3/3 | CopilotService.cs:4486 (in diff) |
| 2 | 🟡 MODERATE | MCP-reload path missing long.MaxValue overflow guard |
3/3 | CopilotService.cs:4766 (in diff) |
| 3 | 🟡 MODERATE | SessionErrorEvent handler missing generation increment |
3/3 | Events.cs:1189 (outside diff) |
| 4 | 🟡 MODERATE | Permission recovery missing generation increment | 3/3 | Events.cs:3261 (outside diff) |
| 5 | 🟢 MINOR | TOCTOU window in read-then-increment pattern | 2/3 | CopilotService.cs:794 (in diff) |
| 6 | 🟢 MINOR | Second test verifies arithmetic, not behavior; 3/4 paths untested | 2/3 | ChatExperienceSafetyTests.cs:335 (in diff) |
Discarded: Missing ClearFlushedReplayDedup in STEER-ERROR (1/3 only — follow-up reviewer confirmed FlushedResponse.Clear() prevents the issue).
Assessment
The core fix is correct and well-motivated — incrementing ProcessingGeneration before IsProcessing = false eliminates the fragile single-point-of-failure where only the !IsProcessing check prevented resurrection.
Blocking issues (1–2): The two bare Interlocked.Increment calls in the STEER-ERROR and MCP-reload paths are inconsistent with the guarded pattern used everywhere else. If a concurrent reconnect orphans the session, these will overflow long.MaxValue to long.MinValue, breaking the monotonicity invariant. The fix is a one-line change each.
Recommended improvements (3–4): Two IsProcessing = false paths outside the diff (SessionErrorEvent and permission recovery) were missed. They suffer from the exact same vulnerability this PR fixes. Adding the increment there would complete the defense-in-depth story.
Non-blocking (5–6): The TOCTOU window and test quality are real observations but low-risk and acceptable as follow-up items.
Generated by Expert Code Review for issue #612
| state.IsReconnectedSend = false; // INV-1 item 8: prevent stale 35s timeout on next watchdog start | ||
| Interlocked.Exchange(ref state.SendingFlag, 0); | ||
| // Clear IsProcessing BEFORE completing TCS (INV-O3) | ||
| Interlocked.Increment(ref state.ProcessingGeneration); // Invalidate stale callbacks |
There was a problem hiding this comment.
🟡 MODERATE — Missing long.MaxValue overflow guard (Flagged by: 3/3 reviewers)
This bare Interlocked.Increment lacks the long.MaxValue sentinel check that ClearProcessingState (line 794) and ClearProcessingStateForRecoveryFailure (line 3120) both have. If a concurrent reconnect orphans this session (Exchange(..., long.MaxValue)) while the STEER-ERROR catch is in-flight, the increment wraps to long.MinValue, breaking the monotonicity invariant for all future generation comparisons.
Suggested fix:
if (Interlocked.Read(ref state.ProcessingGeneration) != long.MaxValue)
Interlocked.Increment(ref state.ProcessingGeneration); // Invalidate stale callbacks| await InvokeOnUIAsync(() => | ||
| { | ||
| FlushCurrentResponse(state); | ||
| Interlocked.Increment(ref state.ProcessingGeneration); // Invalidate stale callbacks |
There was a problem hiding this comment.
🟡 MODERATE — Missing long.MaxValue overflow guard (Flagged by: 3/3 reviewers)
Same issue as the STEER-ERROR path. This unconditional Interlocked.Increment can wrap to long.MinValue if a concurrent reconnect has already set the sentinel. The session is orphaned at line 4789, after this block runs — so a concurrent reconnect at lines 3966/4000/4011 can set long.MaxValue between the IsProcessing check (line 4760) and this increment.
Suggested fix:
if (Interlocked.Read(ref state.ProcessingGeneration) != long.MaxValue)
Interlocked.Increment(ref state.ProcessingGeneration); // Invalidate stale callbacks| // bail out — preventing resurrection of a completed turn. Without this, | ||
| // the generation guard passes and only the !IsProcessing check saves us. | ||
| // Skip if orphaned (long.MaxValue) — incrementing would overflow to long.MinValue. | ||
| if (Interlocked.Read(ref state.ProcessingGeneration) != long.MaxValue) |
There was a problem hiding this comment.
🟢 MINOR — TOCTOU window in read-then-increment (Flagged by: 2/3 reviewers)
Theoretical race between Interlocked.Read and Interlocked.Increment: a background thread could Exchange(..., long.MaxValue) between them. Some ClearProcessingState callers (SendPromptAsync catch blocks, AbortSessionAsync) may run on thread-pool threads, making the window wider than "UI thread only" implies.
Practically low-risk. A CompareExchange loop would close it definitively. Not blocking — acceptable as follow-up.
| /// generation before CompleteResponse ran must see a mismatch and bail out, | ||
| /// rather than re-setting IsProcessing=true on a completed session. | ||
| /// </summary> | ||
| [Fact] |
There was a problem hiding this comment.
🟢 MINOR — Test verifies arithmetic, not behavior (Flagged by: 2/3 reviewers)
This test checks that 6 ≠ 5 after incrementing — tautologically true regardless of whether the production guard works. A stronger test would create a closure simulating a stale callback, complete the session, execute the closure, and assert IsProcessing was NOT changed.
Also: only the ClearProcessingState path (via CompleteResponse) is tested. The other 3 modified paths have zero generation increment verification.
Not blocking — the first test (ClearProcessingState_IncrementsGeneration) is solid.
…d paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssion history 7 new behavioral tests for GetLastEventType (the static method used by the TurnEnd fallback's last-event-type check): - tool.execution_start as last event → detected - session.idle as terminal event → detected - file doesn't exist → returns null - empty file → returns null - invalid JSON → returns null (fail-open) - trailing blank lines → ignored - FileShare.ReadWrite → concurrent read works Update regression history: add #612 and #619. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What
ClearProcessingStateis called byCompleteResponse,AbortSessionAsync, and error recovery paths to reset all processing state. Previously it did not incrementProcessingGeneration, which meant anyInvokeOnUIcallback that captured the generation beforeClearProcessingStateran would see a matching generation and could re-setIsProcessing=trueon a completed session — a "resurrection" bug.Why
This was the root cause of the resurrection bug found in PR #600 (took 3 review rounds to identify). The fix there was a manual
!IsProcessingguard in theInvokeOnUIlambda. This change closes the entire class of bugs at the source.Before (fragile)
After (defense in depth)
Scope
ClearProcessingState:Interlocked.Increment(ref state.ProcessingGeneration)Testing
All 3476 tests pass.