Problem
Issue #299 identifies that some Copilot CLI sessions never emit session.idle after a turn completes. PolyPilot has solid mitigations (TurnEnd→Idle fallback timer, watchdog, content flush), but none of them capture why it happened. This makes it difficult to file an accurate upstream SDK issue or confirm whether the root cause is in the CLI, the JSON-RPC transport, or PolyPilot itself.
The upstream analysis in copilot-sdk#794 identified two theoretical vulnerabilities but has not reproduced either in the real CLI. We need better diagnostic data from PolyPilot to narrow the root cause.
Proposed Solution: Zero-Idle Capture Bundle
When PolyPilot detects a missing session.idle event (i.e., the TurnEnd→Idle fallback fires), automatically capture a diagnostic snapshot that can be used to analyze what happened.
1. Capture bundle on fallback trigger
When the IDLE-FALLBACK fires (lines 519-524 in CopilotService.Events.cs), write a JSON capture file to ~/.polypilot/zero-idle-captures/:
~/.polypilot/zero-idle-captures/
capture_2026-03-11T22-45-00_sess-abc123.json
Each capture contains:
- Session metadata: session ID, name, model, history size, group
- Processing state snapshot:
IsProcessing, ProcessingPhase, ActiveToolCallCount, HasUsedToolsThisTurn, ProcessingGeneration, LastEventAtTicks age
- Event sequence: Last 50 events from
events.jsonl (parsed via existing ParseEventLogFile())
- Timing: When TurnEnd was received, how long the fallback waited, whether tools were used
- Concurrency context: Total active sessions, total sessions with
IsProcessing=true
2. All-events tracing to diagnostics log
Currently, HandleSessionEvent only logs 4 event types (TurnStart, TurnEnd, Idle, Error) to event-diagnostics.log. For zero-idle investigation, knowing the exact last event before silence is critical.
Add a diagnostic setting EnableVerboseEventTracing (default: false) that, when enabled, logs every SDK event type to the diagnostics log. This reveals whether the last event was ToolExecutionComplete, AssistantMessage, SessionCompactionComplete, etc.
3. Per-session event counter
Track a simple EventCountThisTurn counter on SessionState that increments on every event in HandleSessionEvent. When the fallback fires, include this count in the capture. This answers: "Did the session receive 3 events or 300 before going silent?"
Why This Helps
| What we learn |
How it helps |
| Last event type before silence |
Narrows which CLI code path failed to emit idle |
| events.jsonl final entries |
Shows what the CLI wrote vs what PP received (transport drop?) |
| Tool activity at fallback time |
Correlates with CLI's processQueuedItems post-loop code |
| History size / concurrent sessions |
Identifies environmental triggers (load, memory pressure) |
| Frequency data |
Is it 1% of turns? 20%? Specific to certain models? |
Implementation Details
New/Modified Files
CopilotService.Events.cs — Capture logic at fallback firing points + all-events tracing
CopilotService.cs — EventCountThisTurn field on SessionState, capture writer method
ConnectionSettings.cs — EnableVerboseEventTracing toggle
PolyPilot.Tests/ZeroIdleCaptureTests.cs — Unit tests for capture format and field population
Storage
- Location:
~/.polypilot/zero-idle-captures/
- Format: JSON (one file per capture, human-readable)
- Retention: Keep last 100 captures, auto-prune on startup
- Size: ~5-10 KB per capture (50 event lines + metadata)
Performance Impact
- Zero cost in normal operation — Capture only fires when the fallback triggers (rare)
- Verbose tracing — Opt-in via settings toggle, adds one
Debug() call per event (~negligible)
Acceptance Criteria
References
Problem
Issue #299 identifies that some Copilot CLI sessions never emit
session.idleafter a turn completes. PolyPilot has solid mitigations (TurnEnd→Idle fallback timer, watchdog, content flush), but none of them capture why it happened. This makes it difficult to file an accurate upstream SDK issue or confirm whether the root cause is in the CLI, the JSON-RPC transport, or PolyPilot itself.The upstream analysis in copilot-sdk#794 identified two theoretical vulnerabilities but has not reproduced either in the real CLI. We need better diagnostic data from PolyPilot to narrow the root cause.
Proposed Solution: Zero-Idle Capture Bundle
When PolyPilot detects a missing
session.idleevent (i.e., the TurnEnd→Idle fallback fires), automatically capture a diagnostic snapshot that can be used to analyze what happened.1. Capture bundle on fallback trigger
When the IDLE-FALLBACK fires (lines 519-524 in
CopilotService.Events.cs), write a JSON capture file to~/.polypilot/zero-idle-captures/:Each capture contains:
IsProcessing,ProcessingPhase,ActiveToolCallCount,HasUsedToolsThisTurn,ProcessingGeneration,LastEventAtTicksageevents.jsonl(parsed via existingParseEventLogFile())IsProcessing=true2. All-events tracing to diagnostics log
Currently,
HandleSessionEventonly logs 4 event types (TurnStart,TurnEnd,Idle,Error) toevent-diagnostics.log. For zero-idle investigation, knowing the exact last event before silence is critical.Add a diagnostic setting
EnableVerboseEventTracing(default:false) that, when enabled, logs every SDK event type to the diagnostics log. This reveals whether the last event wasToolExecutionComplete,AssistantMessage,SessionCompactionComplete, etc.3. Per-session event counter
Track a simple
EventCountThisTurncounter onSessionStatethat increments on every event inHandleSessionEvent. When the fallback fires, include this count in the capture. This answers: "Did the session receive 3 events or 300 before going silent?"Why This Helps
processQueuedItemspost-loop codeImplementation Details
New/Modified Files
CopilotService.Events.cs— Capture logic at fallback firing points + all-events tracingCopilotService.cs—EventCountThisTurnfield onSessionState, capture writer methodConnectionSettings.cs—EnableVerboseEventTracingtogglePolyPilot.Tests/ZeroIdleCaptureTests.cs— Unit tests for capture format and field populationStorage
~/.polypilot/zero-idle-captures/Performance Impact
Debug()call per event (~negligible)Acceptance Criteria
References