Skip to content

fix: restore MCP servers and skills when recreating expired session#330

Merged
PureWeen merged 2 commits intomainfrom
fix/session-fix-sdk-json-message-can-you-tel-20260309-1714
Mar 9, 2026
Merged

fix: restore MCP servers and skills when recreating expired session#330
PureWeen merged 2 commits intomainfrom
fix/session-fix-sdk-json-message-can-you-tel-20260309-1714

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

@PureWeen PureWeen commented Mar 9, 2026

Problem

When a session's JSON-RPC connection is lost and the server reports "Session not found" (session expired), the reconnect path in SendPromptAsync creates a fresh SessionConfig that was missing McpServers and SkillDirectories. This caused the "environment to keep going away" — MCP tools and skills disappeared after reconnection.

Root Cause

In CopilotService.cs, the freshConfig in the "Session not found" catch block only set Model, WorkingDirectory, Tools, and OnPermissionRequest. The original CreateSessionAsync() also includes McpServers and SkillDirectories loaded from settings.

Fix

Updated the reconnect path's freshConfig to load McpServers and SkillDirectories from settings, matching the original CreateSessionAsync behavior.

Tests

Added 3 structural regression tests in ConnectionRecoveryTests.cs:

  • SendPromptAsync_FreshSessionConfig_IncludesMcpServers
  • SendPromptAsync_FreshSessionConfig_IncludesSkillDirectories
  • SendPromptAsync_FreshSessionConfig_MatchesCreateSessionFields

All 34 ConnectionRecoveryTests pass. Build succeeds on Mac Catalyst.

When a JSON-RPC connection is lost and the server-side session has expired
('Session not found'), SendPromptAsync creates a fresh session. Previously,
the freshConfig was missing McpServers and SkillDirectories, causing the
session's environment (MCP tools, skills) to disappear after reconnection.

Now the reconnect path loads McpServers and SkillDirectories from settings,
matching the original CreateSessionAsync behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Mar 9, 2026

Code Review — PR #330 (fix: restore MCP servers and skills when recreating expired session)

CI: ⚠️ No checks on branch (pre-existing gap)
Prior reviews: None


🟡 MODERATE — SystemMessage still missing from freshConfig

CopilotService.cs:2456 (5/5 models)

The PR comment at line 2450 explicitly states the freshConfig is "matching CreateSessionAsync" with "full config (MCP servers, skills, system message)", but SystemMessage is not set in freshConfig. The original CreateSessionAsync (line 1681) sets SystemMessage = new SystemMessageConfig { Mode = SystemMessageMode.Append, Content = ... }. A session recreated via this reconnect path silently loses the system prompt, causing behavioral divergence from the original session.

Fix: Add SystemMessage to freshConfig using the same derivation as CreateSessionAsync. If the system message depends on session-specific context (e.g. state.Info.WorkingDirectory vs. ProjectDir), apply the same conditional logic.


🟡 MODERATE — Structural tests don't verify the property assignments, only the load calls

ConnectionRecoveryTests.cs:287, 303 (4/5 models)

Assert.Contains("McpServers", afterNotFound) passes because "McpServers" is a substring of "LoadMcpServers" — which appears in the variable declaration line, not the SessionConfig property assignment. The 800-char window ends before McpServers = freshMcpServers inside the config initializer. This means:

  • The test would still PASS if someone kept the LoadMcpServers() call but removed McpServers = freshMcpServers from freshConfig
  • The regression these tests claim to prevent would slip through undetected

Fix: Either expand the window to cover the full freshConfig block (the 1500-char window in the third test does reach the property assignments), or anchor the assertion to the property assignment specifically:

// Instead of relying on substring coincidence:
Assert.Contains("McpServers = ", afterNotFound);
Assert.Contains("SkillDirectories = ", afterNotFound);

🟢 MINOR — MatchesCreateSessionFields omits SystemMessage from required fields

ConnectionRecoveryTests.cs:322 (2/5 models)

The test named "MatchesCreateSessionFields" checks that the reconnect path includes Model, WorkingDirectory, McpServers, SkillDirectories, Tools, OnPermissionRequest — but SystemMessage is absent from the requiredFields array. This means the test cannot catch the MODERATE finding above. Once SystemMessage is added to freshConfig, add it here too.


✅ What's correct

  • Adding McpServers and SkillDirectories to freshConfig is the right fix for the core bug
  • _currentSettings ?? ConnectionSettings.Load() pattern is consistent with the existing reconnect path (line 2393)
  • LoadSkillDirectories has its own internal try/catch — safe in exception handler context
  • Cancellation token threading is correct

Verdict

⚠️ Request changes — the core MCP/skills fix is correct, but SystemMessage is still missing despite the PR explicitly claiming parity with CreateSessionAsync. The structural tests also have a false-positive issue that undermines their regression value. Both are easy to fix together.

Address PR review feedback:
- Add SystemMessage with relaunch instructions to freshConfig, matching
  CreateSessionAsync's conditional logic for ProjectDir sessions
- Anchor structural tests on 'freshConfig = new SessionConfig' instead
  of 'Session not found' for reliable window sizing
- Assert property assignments ('McpServers = ') not just identifier names
- Add dedicated test for SystemMessage presence
- Include SystemMessage in MatchesCreateSessionFields required fields

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Mar 9, 2026

Round 2 Re-review

5-model consensus review (claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex)


Previous Findings Status

# Finding Status
F1 SystemMessage missing from freshConfig FIXED (5/5)
F2 Tests false-positive — Assert.Contains("McpServers") matched variable name, not assignment FIXED (5/5)
F3 MatchesCreateSessionFields omitted SystemMessage from requiredAssignments FIXED (5/5)

All three Round 1 findings are addressed. The new freshConfig block correctly mirrors CreateSessionAsync with McpServers, SkillDirectories, and SystemMessage (including the conditional relaunch-instruction content for PolyPilot-specific sessions). The 600/800-char test windows cover all required property assignments with margin.


New Findings (consensus-filtered)

No new bugs reached the 2+ model consensus threshold.


Single-Model Observation (below consensus, verified manually)

🟢 CopilotService.cs:2477reconnectModel ?? DefaultModel silently passes "" when the model is unset

NormalizeToSlug returns "" (not null) for null/whitespace input. When state.Info.Model is unset (legacy persisted session), reconnectModel is "" and the ?? never fires — the SDK receives an empty model string. Every other call site guards this explicitly:

// Lines 1334–1335 and 1512–1513
var model = NormalizeToSlug(src ?? fallback ?? DefaultModel);
if (string.IsNullOrEmpty(model)) model = DefaultModel;

Suggested fix:

Model = string.IsNullOrEmpty(reconnectModel) ? DefaultModel : reconnectModel,

This was flagged by 1/5 models and does not block merge, but is worth a quick fix for consistency.


CI Status

⚠️ CI results not available (checks not run / pre-existing infra state)

Recommended Action

Approve — All Round 1 findings resolved, no new consensus bugs. Optional: apply the one-line string.IsNullOrEmpty guard at line 2477 for consistency with the rest of the codebase.

@PureWeen PureWeen merged commit cd027d2 into main Mar 9, 2026
@PureWeen PureWeen deleted the fix/session-fix-sdk-json-message-can-you-tel-20260309-1714 branch March 9, 2026 19:40
PureWeen added a commit that referenced this pull request Mar 9, 2026
Adds ChatExperienceSafetyTests.cs with 41 tests (40 active, 1 pending PR #330)
covering the invariants documented in processing-state-safety skill:

- INV-1: All 8 termination paths clear state correctly (CompleteResponse,
  SessionErrorEvent, AbortSessionAsync, watchdog)
- INV-2: State mutations marshaled to UI thread via InvokeOnUI
- INV-3: ProcessingGeneration guard prevents stale IDLE from killing new turns
- INV-5: HasUsedToolsThisTurn protects sessions between tool rounds (not just
  while tools are active)
- INV-9: IsMultiAgentSession set before StartProcessingWatchdog in both
  SendPromptAsync and RestoreSingleSessionAsync paths

Behavioral tests (demo mode integration):
- Multi-turn message preservation (5 sequential turns, all history retained)
- Abort clears all 9 INV-1 fields, fires OnSessionComplete
- Post-abort send succeeds without deadlock (SendingFlag cleared)
- Session isolation (stuck session doesn't block others)
- WatchdogToolExecutionTimeoutSeconds > WatchdogInactivityTimeoutSeconds
- WatchdogMaxProcessingTimeSeconds >= 30 minutes

Source-code assertion tests (regression guards against future refactors):
- useToolTimeout formula has all 4 conditions (INV-5)
- TurnEnd fallback checks HasUsedToolsThisTurn before firing CompleteResponse
- FlushCurrentResponse called at AssistantTurnEndEvent (content persistence fix)
- FlushCurrentResponse dedup guard prevents SDK-replay duplicates
- CompleteResponse cancels watchdog before cleanup
- Reconnect path carries forward IsMultiAgentSession + HasUsedToolsThisTurn

These tests are designed to catch the class of regressions documented in
regression-history.md (PRs #141-#284).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen added a commit that referenced this pull request Mar 9, 2026
Adds ChatExperienceSafetyTests.cs with 41 tests (40 active, 1 pending PR #330)
covering the invariants documented in processing-state-safety skill:

- INV-1: All 8 termination paths clear state correctly (CompleteResponse,
  SessionErrorEvent, AbortSessionAsync, watchdog)
- INV-2: State mutations marshaled to UI thread via InvokeOnUI
- INV-3: ProcessingGeneration guard prevents stale IDLE from killing new turns
- INV-5: HasUsedToolsThisTurn protects sessions between tool rounds (not just
  while tools are active)
- INV-9: IsMultiAgentSession set before StartProcessingWatchdog in both
  SendPromptAsync and RestoreSingleSessionAsync paths

Behavioral tests (demo mode integration):
- Multi-turn message preservation (5 sequential turns, all history retained)
- Abort clears all 9 INV-1 fields, fires OnSessionComplete
- Post-abort send succeeds without deadlock (SendingFlag cleared)
- Session isolation (stuck session doesn't block others)
- WatchdogToolExecutionTimeoutSeconds > WatchdogInactivityTimeoutSeconds
- WatchdogMaxProcessingTimeSeconds >= 30 minutes

Source-code assertion tests (regression guards against future refactors):
- useToolTimeout formula has all 4 conditions (INV-5)
- TurnEnd fallback checks HasUsedToolsThisTurn before firing CompleteResponse
- FlushCurrentResponse called at AssistantTurnEndEvent (content persistence fix)
- FlushCurrentResponse dedup guard prevents SDK-replay duplicates
- CompleteResponse cancels watchdog before cleanup
- Reconnect path carries forward IsMultiAgentSession + HasUsedToolsThisTurn

These tests are designed to catch the class of regressions documented in
regression-history.md (PRs #141-#284).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen added a commit that referenced this pull request Mar 9, 2026
Adds ChatExperienceSafetyTests.cs with 41 tests (40 active, 1 pending PR #330)
covering the invariants documented in processing-state-safety skill:

- INV-1: All 8 termination paths clear state correctly (CompleteResponse,
  SessionErrorEvent, AbortSessionAsync, watchdog)
- INV-2: State mutations marshaled to UI thread via InvokeOnUI
- INV-3: ProcessingGeneration guard prevents stale IDLE from killing new turns
- INV-5: HasUsedToolsThisTurn protects sessions between tool rounds (not just
  while tools are active)
- INV-9: IsMultiAgentSession set before StartProcessingWatchdog in both
  SendPromptAsync and RestoreSingleSessionAsync paths

Behavioral tests (demo mode integration):
- Multi-turn message preservation (5 sequential turns, all history retained)
- Abort clears all 9 INV-1 fields, fires OnSessionComplete
- Post-abort send succeeds without deadlock (SendingFlag cleared)
- Session isolation (stuck session doesn't block others)
- WatchdogToolExecutionTimeoutSeconds > WatchdogInactivityTimeoutSeconds
- WatchdogMaxProcessingTimeSeconds >= 30 minutes

Source-code assertion tests (regression guards against future refactors):
- useToolTimeout formula has all 4 conditions (INV-5)
- TurnEnd fallback checks HasUsedToolsThisTurn before firing CompleteResponse
- FlushCurrentResponse called at AssistantTurnEndEvent (content persistence fix)
- FlushCurrentResponse dedup guard prevents SDK-replay duplicates
- CompleteResponse cancels watchdog before cleanup
- Reconnect path carries forward IsMultiAgentSession + HasUsedToolsThisTurn

These tests are designed to catch the class of regressions documented in
regression-history.md (PRs #141-#284).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…ureWeen#330)

## Problem
When a session's JSON-RPC connection is lost and the server reports
"Session not found" (session expired), the reconnect path in
`SendPromptAsync` creates a fresh `SessionConfig` that was missing
`McpServers` and `SkillDirectories`. This caused the "environment to
keep going away" — MCP tools and skills disappeared after reconnection.

## Root Cause
In `CopilotService.cs`, the `freshConfig` in the "Session not found"
catch block only set `Model`, `WorkingDirectory`, `Tools`, and
`OnPermissionRequest`. The original `CreateSessionAsync()` also includes
`McpServers` and `SkillDirectories` loaded from settings.

## Fix
Updated the reconnect path's `freshConfig` to load `McpServers` and
`SkillDirectories` from settings, matching the original
`CreateSessionAsync` behavior.

## Tests
Added 3 structural regression tests in `ConnectionRecoveryTests.cs`:
- `SendPromptAsync_FreshSessionConfig_IncludesMcpServers`
- `SendPromptAsync_FreshSessionConfig_IncludesSkillDirectories`
- `SendPromptAsync_FreshSessionConfig_MatchesCreateSessionFields`

All 34 ConnectionRecoveryTests pass. Build succeeds on Mac Catalyst.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant