Skip to content

Fix parallel session crash: coalesce state notifications + dispose Process handles#337

Merged
PureWeen merged 2 commits intoPureWeen:mainfrom
btessiau:fix/parallel-session-stability
Mar 10, 2026
Merged

Fix parallel session crash: coalesce state notifications + dispose Process handles#337
PureWeen merged 2 commits intoPureWeen:mainfrom
btessiau:fix/parallel-session-stability

Conversation

@btessiau
Copy link
Copy Markdown
Contributor

Problem

When running many CLI sessions in parallel (5+), the app progressively freezes and crashes.

Root causes

  1. Render flood — Each session fires OnStateChanged on every SDK event. With N parallel sessions, the WebView WebContent process gets starved by rapid Blazor re-renders (241 updateFreezerStatus errors in 30min).

  2. Process handle leaks — ServerManager never disposed Process objects. OS handles accumulated across reconnect cycles (468 unobserved exceptions in crash log).

Fix

Coalesced state notifications

Added NotifyStateChangedCoalesced() — a 150ms timer-based coalescer for high-frequency, non-critical state updates. Applied to:

  • ToolExecutionStartEvent phase change (Working)
  • AssistantTurnStartEvent phase change (Thinking)
  • ToolExecutionCompleteEvent permission denial notification
  • WsBridgeClient.OnStateChanged remote session sync

Critical events remain immediate: session completion, errors, session switches, watchdog.

Process handle disposal

  • ServerManager.StartServerAsync: dispose Process after stdout/stderr streams close
  • ServerManager.StopServer: dispose after stopping
  • DevTunnelService.Stop/TryHostTunnelAsync: dispose before nulling

Testing

Build: 0 errors. Tests: 2378 passed (4 pre-existing failures).

Fixes #336

…ocess handles

Under heavy parallel session load (5+ sessions with tool calls), the app
freezes due to two compounding issues:

1. Render flood: Each session fires OnStateChanged on every SDK event
   (tool start, phase change, turn start). With N parallel sessions,
   the WebView WebContent process gets starved by rapid Blazor re-renders.

   Fix: Add NotifyStateChangedCoalesced() with a 150ms timer that batches
   rapid-fire state notifications. Applied to high-frequency, non-critical
   events (ToolExecutionStart phase change, AssistantTurnStart phase change,
   ToolExecutionComplete permission check, WsBridge state sync). Critical
   events (completion, errors, session switches) remain immediate.

2. Process handle leaks: ServerManager never disposed Process objects from
   StartServerAsync/StopServer. OS handles accumulated across reconnect
   cycles, contributing to 'No process is associated' exceptions.

   Fix: Dispose Process after stdout/stderr streams close in StartServerAsync.
   Add process.Dispose() in StopServer and DevTunnelService.Stop/TryHost.

Fixes PureWeen#336

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 PR #337 Review — Fix parallel session crash: coalesce state notifications + dispose Process handles

CI Status: ⚠️ No checks configured
Prior reviews: None
Files changed: 5 (CopilotService.cs, CopilotService.Events.cs, CopilotService.Bridge.cs, ServerManager.cs, DevTunnelService.cs)


Findings

🟡 MODERATE — ServerManager.cs:100-104 — Sequential stdout/stderr draining can deadlock

The original code used two separate Task.Run calls draining stdout and stderr in parallel:

// BEFORE — parallel (correct)
_ = Task.Run(async () => { try { while (await process.StandardOutput.ReadLineAsync() != null) { } } catch { } });
_ = Task.Run(async () => { try { while (await process.StandardError.ReadLineAsync() != null) { } } catch { } });

The new code drains them sequentially in a single task:

// AFTER — sequential (regression)
_ = Task.Run(async () =>
{
    try { while (await process.StandardOutput.ReadLineAsync() != null) { } } catch { }
    try { while (await process.StandardError.ReadLineAsync() != null) { } } catch { }
    process.Dispose();
});

For a long-running server process, ReadLineAsync() on stdout blocks until the process exits (stdout never closes while the server is alive). This means the stderr drain loop never starts while the server is running. If the server writes more than the OS pipe buffer (~64KB) to stderr — common under error conditions or verbose logging — the process blocks waiting for the pipe to drain, the drain task is stuck on stdout, and you get a deadlock.

Fix: Keep two separate tasks, use Task.WhenAll + .ContinueWith to dispose after both complete:

var t1 = Task.Run(async () => { try { while (await process.StandardOutput.ReadLineAsync() != null) { } } catch { } });
var t2 = Task.Run(async () => { try { while (await process.StandardError.ReadLineAsync() != null) { } } catch { } });
_ = Task.WhenAll(t1, t2).ContinueWith(_ => process.Dispose());

What looks correct ✅

Coalescing logic (CopilotService.cs:344-361): The _stateChangedPending / Interlocked.CompareExchange pattern is correct. Only the first caller in a burst arms the timer; subsequent callers see _stateChangedPending=1 and do nothing. FireCoalescedStateChanged resets the flag before calling InvokeOnUI, so events arriving during dispatch get a new timer cycle. ObjectDisposedException is properly caught. Timer disposal is in DisposeAsync alongside the other debounce timers — consistent with existing pattern.

Coalesced call sites: The three Events.cs sites (tool start phase change, turn start phase change, permission denial) and the bridge sync are all genuinely high-frequency, non-critical updates. Critical completions, errors, session switches, and watchdog notifications correctly remain on the immediate OnStateChanged?.Invoke() path.

DevTunnelService.cs process disposal: Disposing before nulling in TryHostTunnelAsync is correct. The case where the process has already exited (previously not cleaned up) is now also handled. In Stop(), Dispose() is inside the try block with _hostProcess = null outside — if Dispose throws it's caught and the null assignment still runs.

ServerManager.cs StopServer(): Disposing the process handle after Kill() at line 142 is correct — Process.GetProcessById creates a new handle object that should be disposed.


Test coverage

The coalescer has no dedicated unit test. A test asserting that N rapid calls to NotifyStateChangedCoalesced() result in at most 1 OnStateChanged invocation within 150ms would be valuable (similar to RenderThrottleTests.cs). Not a blocker, but worth adding.


Recommendation: ⚠️ Request changes

One targeted fix needed: restore parallel stream draining in ServerManager.StartServerAsync so stderr is drained concurrently with stdout. The dispose can then be chained with Task.WhenAll. Everything else in this PR is correct and well-structured.

Restore parallel stdout/stderr draining in ServerManager.StartServerAsync.
Sequential draining deadlocks if stderr fills its OS pipe buffer (~64KB)
while the stdout drain blocks waiting for the long-running server to exit.
Use Task.WhenAll + ContinueWith to dispose after both streams close.

Add StateChangeCoalescerTests (4 tests):
- Rapid calls coalesce into single notification
- Single call fires exactly once
- Separate bursts fire separately
- Immediate NotifyStateChanged still works synchronously

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 1a5f808:

🟡 Sequential stream drain deadlock → Fixed

Restored parallel stdout/stderr draining with Task.WhenAll + ContinueWith for disposal:

var t1 = Task.Run(async () => { try { while (await process.StandardOutput.ReadLineAsync() != null) { } } catch { } });
var t2 = Task.Run(async () => { try { while (await process.StandardError.ReadLineAsync() != null) { } } catch { } });
_ = Task.WhenAll(t1, t2).ContinueWith(_ => process.Dispose());

Coalescer tests → Added

4 tests in StateChangeCoalescerTests.cs:

  • 20 rapid calls coalesce into 1-3 notifications (not 20)
  • Single call fires exactly once
  • Separate bursts fire separately
  • Immediate NotifyStateChanged() still works synchronously

Build: 0 errors. Tests: all pass.

Copy link
Copy Markdown
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔄 PR #337 Re-Review (Round 2) — Fix parallel session crash: coalesce state notifications + dispose Process handles

CI Status: ⚠️ No checks configured (draft branch)
Previous reviews: 2 (Round 1 flagged sequential drain deadlock; author self-commented fix in 1a5f808)


Previous Findings Status

# Finding Status
1 🟡 MODERATE — ServerManager.cs:100-104 sequential stdout/stderr drain causes deadlock FIXED

Finding #1 — FIXED

ServerManager.cs:102-104 now uses exactly the recommended pattern:

var t1 = Task.Run(async () => { try { while (await process.StandardOutput.ReadLineAsync() != null) { } } catch { } });
var t2 = Task.Run(async () => { try { while (await process.StandardError.ReadLineAsync() != null) { } } catch { } });
_ = Task.WhenAll(t1, t2).ContinueWith(_ => process.Dispose());

Both streams drain in parallel; Dispose() is chained only after both complete. Deadlock path eliminated.


New Analysis

Coalescer implementation — ✅ Correct

CopilotService.cs:344-361 — The Interlocked.CompareExchange/Exchange pattern is correct:

  • Only the first caller in a burst arms the timer (CAS 0→1); subsequent calls no-op
  • FireCoalescedStateChanged resets _stateChangedPending to 0 before calling InvokeOnUI, so events arriving during dispatch get a fresh timer cycle — no notifications lost
  • ObjectDisposedException on Timer.Change() is properly caught
  • DisposeAsync (line ~3406) disposes and nulls the timer, consistent with the existing debounce timer pattern

Call sites — ✅ Correct thread safety

  • Events.cs:311NotifyStateChangedCoalesced() called directly on the SDK event thread (outside Invoke()). Safe: CompareExchange and Timer.Change() are both thread-safe, and FireCoalescedStateChanged uses InvokeOnUI for the final dispatch
  • Events.cs:422, 452 — Called inside Invoke() blocks (UI thread). Safe
  • Bridge.cs:52 — Called from bridge background thread. Safe for same reason as above

DevTunnelService.cs — ✅ Correct

  • TryHostTunnelAsync (~line 282-285): Dispose() now happens after the if-block, covering both the HasExited and !HasExited paths. Previously a process that had already exited was never disposed. _hostProcess = null follows immediately
  • Stop() (~line 462-468): Dispose() is inside the try block; _hostProcess = null is outside at line 468, so it runs regardless of whether Dispose() throws. Correct

ServerManager.cs StopServer() — ✅ Correct

process.Dispose() after process.Kill() at line 141 is correct — Process.GetProcessById allocates a new handle object that should be freed.


Test Coverage

StateChangeCoalescerTests.cs adds 4 tests covering:

  • 20 rapid calls coalesce into 1–3 notifications (not 20) — Assert.InRange(fireCount, 1, 3) is appropriately loose for timer-based timing
  • Single call fires exactly once
  • Separate bursts fire separately (~1 notification each)
  • NotifyStateChanged() (immediate path) still works synchronously

Tests are structurally sound. One note: test instances don't call DisposeAsync(), leaving the coalesce timer pending — harmless in practice but inconsistent with test hygiene. Not a blocker.

No tests for the process disposal changes (DevTunnelService, ServerManager), but these are straightforward resource cleanup and the fix is clearly correct.


Verdict: ✅ Approve

Round 1 critical finding is fixed correctly. All new code paths are logically sound:

  • Coalescer correctly reduces render flood for high-frequency events while preserving immediate dispatch for critical events
  • Process handle leaks are plugged in all affected code paths
  • Thread safety of the coalescer is correct

No blocking issues remain.

@btessiau btessiau marked this pull request as ready for review March 10, 2026 16:29
@PureWeen PureWeen merged commit 740a956 into PureWeen:main Mar 10, 2026
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…ocess handles (PureWeen#337)

## Problem

When running many CLI sessions in parallel (5+), the app progressively
freezes and crashes.

### Root causes

1. **Render flood** — Each session fires OnStateChanged on every SDK
event. With N parallel sessions, the WebView WebContent process gets
starved by rapid Blazor re-renders (241 updateFreezerStatus errors in
30min).

2. **Process handle leaks** — ServerManager never disposed Process
objects. OS handles accumulated across reconnect cycles (468 unobserved
exceptions in crash log).

## Fix

### Coalesced state notifications
Added NotifyStateChangedCoalesced() — a 150ms timer-based coalescer for
high-frequency, non-critical state updates. Applied to:
- ToolExecutionStartEvent phase change (Working)
- AssistantTurnStartEvent phase change (Thinking)
- ToolExecutionCompleteEvent permission denial notification
- WsBridgeClient.OnStateChanged remote session sync

Critical events remain immediate: session completion, errors, session
switches, watchdog.

### Process handle disposal
- ServerManager.StartServerAsync: dispose Process after stdout/stderr
streams close
- ServerManager.StopServer: dispose after stopping
- DevTunnelService.Stop/TryHostTunnelAsync: dispose before nulling

## Testing
Build: 0 errors. Tests: 2378 passed (4 pre-existing failures).

Fixes PureWeen#336

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App freezes and crashes under heavy parallel session load

2 participants