Skip to content

fix: phantom sessions on reconnect + ForceComplete consolidation + health monitoring skill#540

Merged
PureWeen merged 3 commits intomainfrom
fix/reliability-continued
Apr 7, 2026
Merged

fix: phantom sessions on reconnect + ForceComplete consolidation + health monitoring skill#540
PureWeen merged 3 commits intomainfrom
fix/reliability-continued

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

@PureWeen PureWeen commented Apr 7, 2026

What

Two reliability fixes discovered during live monitoring, plus a new health-monitoring skill.

1. Phantom (previous) sessions on reconnect

When ResumeSessionAsync returns a different session ID (server assigned a new one), the old ID was not added to _closedSessionIds. The merge in SaveActiveSessionsToDisk would find both the old and new entries and rename the old one to (previous).

Fix: Mark old session ID as closed before updating to the new one.

2. ForceCompleteProcessingAsync uses ClearProcessingState

Last remaining site that manually cleared 15+ fields instead of calling the atomic ClearProcessingState helper. This was the code path used during reconnect bystander session cleanup.

3. Health monitoring skill

New .claude/skills/health-monitoring/SKILL.md for continuous PolyPilot session monitoring — covers single sessions, multi-agent, IDLE-DEFER, reconnect recovery audits, mobile/bridge/DevTunnel diagnostics.

Testing

  • 3309/3309 tests pass
  • Live validated: 2.5 hours of monitoring, 4 reconnects all handled correctly, zero new phantoms after fix

PureWeen and others added 3 commits April 7, 2026 13:06
…earProcessingState

Two fixes discovered during live monitoring:

1. When a reconnect changes a session's ID (ResumeSessionAsync returns a
   different ID), the old ID was not added to _closedSessionIds. The merge
   in SaveActiveSessionsToDisk would find both the old and new entries and
   rename the old one to '(previous)'. Now the old ID is marked closed
   before updating to the new ID.

2. ForceCompleteProcessingAsync had the same manual field-clearing anti-
   pattern — 15+ fields cleared individually instead of calling
   ClearProcessingState. Refactored to use the atomic helper.

Also adds health-monitoring skill for continuous PolyPilot session monitoring.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers DevTunnel chain, WsBridgeServer diagnostics, common mobile issues
(blank sessions, stuck Working state, disconnects), bridge-specific
diagnostic tags, and DevTunnel management commands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All 54 Console.WriteLine calls in WsBridgeServer now go through
BridgeLog() which writes to event-diagnostics.log via
CopilotService.LogBridgeDiagnostic(). This enables monitoring of:
- Mobile client connect/disconnect
- Bridge prompt dispatch and errors
- WebSocket proxy lifecycle
- Authentication events
- State sync activity

All messages use [BRIDGE] tag prefix for filtering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 7, 2026

🔍 Multi-Model Code Review — PR #540

PR: fix: phantom sessions on reconnect + ForceComplete consolidation + health monitoring skill
Branch: fix/reliability-continuedmain
Changes: +420 / -35 across 4 files (2 production, 1 test, 1 skill doc)
CI Status: ⚠️ No CI checks reported on this branch
Prior Reviews: None


Findings: None (consensus)

Three independent reviewers examined the full diff. Several single-reviewer findings were raised but none achieved consensus:


✅ Verified Correct

  • Reconnect phantom fix_closedSessionIds[state.Info.SessionId] = 0 correctly added before state.Info.SessionId = actualId. No null check needed (both IDs validated by surrounding guard). _closedSessionIds is ConcurrentDictionary (thread-safe). FlushSaveActiveSessionsToDisk() called after both assignments.
  • ForceCompleteProcessingAsync consolidation — response captured before ClearProcessingState clears buffers. API time accumulated manually before accumulateApiTime: false. AllowTurnStartRearm = false set after (matching 5 other approved call sites). WatchdogCaseA/B resets correctly left outside ClearProcessingState (force-complete-specific, not general processing state).
  • Field completenessClearProcessingState additionally clears 5 fields the old code missed (ToolHealthStaleChecks, EventCountThisTurn, TurnEndReceivedAtTicks, IsReconnectedSend, LastUpdatedAt). Net improvement.
  • Tests — structural tests correctly updated to verify delegation to ClearProcessingState instead of individual fields.
  • Health monitoring skill — comprehensive 395-line skill doc; documentation-only, no production code impact.

🏁 Recommendation: ✅ Approve

Clean, focused reliability fix. The reconnect phantom session fix closes a real gap. ForceComplete consolidation follows the established ClearProcessingState pattern from PR #531. No consensus issues found.

@PureWeen PureWeen merged commit 0129522 into main Apr 7, 2026
@PureWeen PureWeen deleted the fix/reliability-continued branch April 7, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant