Fix premature idle: check BackgroundTasks before completing#399
Fix premature idle: check BackgroundTasks before completing#399
Conversation
The SDK's SessionIdleEvent includes a BackgroundTasks payload with agents[] and shells[] arrays. When background tasks are active, session.idle means 'foreground quiesced, background still running' — NOT true completion. Previously, PolyPilot unconditionally called CompleteResponse on every SessionIdleEvent, then tried to repair the damage with EVT-REARM, PrematureIdleSignal, and recovery heuristics. This caused 90% of multi-agent worker results to be truncated (44/49 recoveries failed to collect full content, median gap to next TurnStart was 51.8s vs the 15s freshness window). The fix: inspect idle.Data.BackgroundTasks via reflection before calling CompleteResponse. If agents or shells are active, flush accumulated text and wait for the real idle (no background tasks). This eliminates the root cause rather than patching symptoms. The existing safety nets (EVT-REARM, watchdog, recovery) are kept as defense-in-depth for edge cases where BackgroundTasks is null. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review feedback: the SDK types are public, so reflection was unnecessary overhead (~1000x slower) with no compile-time safety. Also adds a test for the Data==null edge case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #399 Review — Fix premature idle: check BackgroundTasks before completing
✅ What's rightThe core fix is sound: inspecting 🟡 MODERATE (5/5 consensus):
|
| Concern | Verdict |
|---|---|
| Infinite defer — stuck forever | Not a bug (watchdog absolute cap is the circuit breaker) |
FlushCurrentResponse without IsProcessing guard |
Not a bug (AbortSessionAsync clears CurrentResponse first; flush is a no-op) |
| Newline injection from multiple deferred flushes | Not a bug (same pattern as TurnEnd flush path; newlines between segments are expected) |
Test Coverage
✅ 6/6 new BackgroundTasksIdleTests pass. Covers: null, empty, agents-only, shells-only, both, default SessionIdleEvent.
Gap: No test for idle.Data == null. The reflection guard handles it, but once replaced with typed access (idle.Data?.BackgroundTasks), a test documenting this null-safe behavior would complete the contract.
CI Status
RenderThrottleTests.CompleteResponse_OnSessionComplete_FiresBeforeOnStateChanged — confirmed not caused by this PR.
Verdict: ⚠️ Request Changes
One actionable change before merging: replace the reflection-based HasActiveBackgroundTasks with direct typed access. The risk isn't about the current SDK version (which works fine) — it's that a silent regression back to 90% truncation is far worse than an explicit compile failure. The fix is a 4-line simplification.
The MINOR watchdog concern is informational and does not block merge.
- multi-agent-orchestration SKILL.md: 4→5 phase lifecycle, new IDLE-DEFER section, fix INV-O3 ordering, new INV-O15, mark premature idle bug as FIXED - processing-state-safety SKILL.md: 10→16 paths table with tags, new INV-18 for BackgroundTasks, IDLE-DEFER in stuck session table, note EVT-REARM is now secondary defense, add PRs #373/#375/#399 to regression history - copilot-instructions.md: update SDK Event Flow step 9 for BackgroundTasks check, add [IDLE-DEFER] diagnostic tag, fix stale path count (8→15+), add BackgroundTasksIdleTests to test list Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Round 2 Review — PR #399All 3 previous findings resolved in commits Previous Findings Status
What changed
var bt = idle.Data?.BackgroundTasks;
if (bt == null) return false;
return (bt.Agents is { Length: > 0 }) || (bt.Shells is { Length: > 0 });Clean, typed, compile-time safe. A future SDK rename now fails at build time rather than silently disabling the fix. Tests — 7/7 pass (added ✅ Approve — Ready to merge |
…#399) ## Problem Multi-agent workers consistently return truncated responses (90% failure rate). The root cause: `SessionIdleEvent` fires mid-turn when background tasks (sub-agents, shells) are still running, and PolyPilot unconditionally calls `CompleteResponse`, truncating the response. Diagnostic data from 49 observed premature idle events: - **44/49 (90%)** recoveries gave up with truncated content - Median gap between premature idle and next TurnStart: **51.8 seconds** (vs 15s freshness threshold) - 4 cases had EVT-REARM fire AFTER recovery already finalized ## Root Cause The SDK's `SessionIdleEvent` has a `Data.BackgroundTasks` payload with `agents[]` and `shells[]` arrays. When background tasks are present, `session.idle` means "foreground quiesced, background still running" — NOT true completion. PolyPilot was treating every `session.idle` as terminal, then trying to repair with EVT-REARM, PrematureIdleSignal, and recovery heuristics — a "hack chain" that failed 90% of the time. ## Fix In the `SessionIdleEvent` handler, inspect `idle.Data.BackgroundTasks` via reflection: - **If agents/shells active** → flush accumulated text, log `[IDLE-DEFER]`, keep `IsProcessing=true`, wait for real idle - **If no background tasks** → proceed with normal `CompleteResponse` This eliminates the root cause (premature completion) rather than patching symptoms (post-hoc recovery). ## Safety - Existing safety nets (EVT-REARM, watchdog, recovery) are **kept** as defense-in-depth - If SDK omits `BackgroundTasks` (null), falls through to normal completion (safe default) - Reflection-based access with try/catch — gracefully falls back if SDK changes - All 2709 existing tests pass + 6 new tests for `HasActiveBackgroundTasks` ## Files Changed - `CopilotService.Events.cs` — `SessionIdleEvent` handler + `HasActiveBackgroundTasks` helper - `BackgroundTasksIdleTests.cs` — 6 tests covering null, empty, agents, shells, both, default --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
Multi-agent workers consistently return truncated responses (90% failure rate). The root cause:
SessionIdleEventfires mid-turn when background tasks (sub-agents, shells) are still running, and PolyPilot unconditionally callsCompleteResponse, truncating the response.Diagnostic data from 49 observed premature idle events:
Root Cause
The SDK's
SessionIdleEventhas aData.BackgroundTaskspayload withagents[]andshells[]arrays. When background tasks are present,session.idlemeans "foreground quiesced, background still running" — NOT true completion.PolyPilot was treating every
session.idleas terminal, then trying to repair with EVT-REARM, PrematureIdleSignal, and recovery heuristics — a "hack chain" that failed 90% of the time.Fix
In the
SessionIdleEventhandler, inspectidle.Data.BackgroundTasksvia reflection:[IDLE-DEFER], keepIsProcessing=true, wait for real idleCompleteResponseThis eliminates the root cause (premature completion) rather than patching symptoms (post-hoc recovery).
Safety
BackgroundTasks(null), falls through to normal completion (safe default)HasActiveBackgroundTasksFiles Changed
CopilotService.Events.cs—SessionIdleEventhandler +HasActiveBackgroundTaskshelperBackgroundTasksIdleTests.cs— 6 tests covering null, empty, agents, shells, both, default