Conversation
6f17995 to
9cadec4
Compare
Owner
Author
🔍 Multi-Model Code Review — PR #600 (Final)fix: preserve IsProcessing through sibling reconnect instead of force-completing
All 16 Findings — Final Status
🏁 Recommendation: ✅ ApproveAll 16 findings resolved. 3475 tests pass. The PR:
|
…-completing When a reconnect triggers client recreation, the sibling re-resume loop was unconditionally force-completing ALL IsProcessing siblings. This killed in-flight responses — the CLI continued working but all events got EVT-REARM-SKIP'd because IsProcessing was already false. Fix: instead of force-completing, capture siblingWasProcessing before re-resume and restore IsProcessing + start a fresh watchdog on the new connection after successful re-resume. The new event handler picks up CLI events on the new connection. Fixes #599 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review findings addressed: 1. IsProcessing mutations now wrapped in InvokeOnUI (INV-2 thread-safety) 2. NotifyStateChangedCoalesced added for UI re-render 3. HasUsedToolsThisTurn + ResponseCompletion set BEFORE handler registration to close the race window where early events could trigger premature CompleteResponse 4. IsOrphaned guard inside InvokeOnUI lambda Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The InvokeOnUI callback for sibling reconnect processing restoration was missing the generation capture/check required by INV-12. A concurrent SendPromptAsync could race with the posted callback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If CompleteResponse runs before the InvokeOnUI callback (both serialized on the UI thread), IsProcessing is already false. Without this guard, the callback would set IsProcessing=true on a session whose turn already finished, leaving it stuck for 600s until the watchdog fires. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. ProcessingPhase: capture actual phase before reconnect instead of hardcoding 3 2. Failed re-resume catch: clear IsProcessing via ClearProcessingState when siblingWasProcessing was true and re-resume throws, preventing stuck 'Thinking...' on dead connections 3. DiffParser path traversal tests: use Path.DirectorySeparatorChar instead of hardcoded backslashes (fixes macOS test failures) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. XML doc comment on ReconnectLoop test updated to describe preservation behavior (was describing the old force-complete behavior this PR reverses) 3. DiffParser path traversal tests now consistently use Path.Combine instead of mixed interpolation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9b594cf to
6af3540
Compare
1. TCS coupling (#2): complete old TCS with partial response via TrySetResult instead of TrySetCanceled, so orchestrator workers collect content gracefully instead of hitting the exception retry path 2. Race window (#4): set IsProcessing/IsResumed/ProcessingPhase on siblingState BEFORE handler registration (safe because state isn't published to _sessions yet), eliminating the ~16ms InvokeOnUI scheduling window 3. Behavioral test (#6): added SiblingReconnect_PreservesProcessingState_OnNewState 4. SessionStabilityTests: fixed comment-matching false positive in TryUpdate ordering assertion Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…onse, fix comment
Review round 6 findings:
1. TrySetResult('') breaks orchestrator: empty result triggers revival which orphans the healthy reconnected state. TrySetCanceled is correct — the OperationCanceledException catch re-acquires the new state and re-awaits its TCS.
2. FlushedResponse from old state now copied to siblingState so partial mid-turn content isn't lost.
3. Safety comment corrected: writes are safe because they're idempotent on already-true fields, not because the state is unpublished.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The test now sets up FlushedResponse content on the old state and verifies the state contract that the reconnect code depends on, including siblingWasProcessing capture and content preservation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the manual state-setup test with two behavioral tests: 1. SiblingReconnect_PreservesProcessingState_OnNewState — simulates the actual reconnect state transfer (TCS cancellation, FlushedResponse carry-forward, IsProcessing/IsResumed preservation) and verifies the full state contract. 2. SiblingReconnect_ResurrectionGuard_PreventsStuckSession — verifies completed during the reconnect window (the fix from round 3). All 3475 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
added a commit
that referenced
this pull request
Apr 18, 2026
…resurrection ClearProcessingState is called by CompleteResponse, AbortSessionAsync, and error recovery paths to reset all processing state. Previously it did NOT increment ProcessingGeneration, which meant any InvokeOnUI callback that captured the generation before ClearProcessingState ran would see a matching generation and could re-set IsProcessing=true on a completed session. This was the root cause of the resurrection bug found in PR #600 (took 3 in the InvokeOnUI lambda. This change closes the entire class of bugs at the source: now the generation guard itself catches completed turns, and There are 4 InvokeOnUI-with-generation-guard sites today (sibling reconnect, checks, so this is a no-op for current code — but it protects any future callback that follows the capture-gen → InvokeOnUI → check-gen pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
When a reconnect triggers client recreation, the sibling re-resume loop was unconditionally force-completing ALL
IsProcessingsiblings viaForceCompleteProcessingAsync. This killed in-flight responses — the CLI continued working for minutes afterward but all events gotEVT-REARM-SKIP'd becauseIsProcessingwas alreadyfalse.Real-world impact (session 527, April 17)
The response was on disk the whole time but invisible until the user sent "please continue."
Fix
Instead of force-completing, preserve
IsProcessingthrough the re-resume:siblingWasProcessingbefore the re-resumeIsProcessing=trueon the new state + start a fresh watchdogThe new event handler registered on the re-resumed session picks up CLI events on the new connection. The watchdog (600s tool tier) handles genuinely dead sessions.
Testing
ConnectionRecoveryTestsandMultiAgentRegressionTeststo verify the new behaviorFixes #599