Skip to content

Fix premature idle: check BackgroundTasks before completing#399

Merged
PureWeen merged 3 commits intomainfrom
fix/idle-background-tasks
Mar 17, 2026
Merged

Fix premature idle: check BackgroundTasks before completing#399
PureWeen merged 3 commits intomainfrom
fix/idle-background-tasks

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Problem

Multi-agent workers consistently return truncated responses (90% failure rate). The root cause: SessionIdleEvent fires mid-turn when background tasks (sub-agents, shells) are still running, and PolyPilot unconditionally calls CompleteResponse, truncating the response.

Diagnostic data from 49 observed premature idle events:

  • 44/49 (90%) recoveries gave up with truncated content
  • Median gap between premature idle and next TurnStart: 51.8 seconds (vs 15s freshness threshold)
  • 4 cases had EVT-REARM fire AFTER recovery already finalized

Root Cause

The SDK's SessionIdleEvent has a Data.BackgroundTasks payload with agents[] and shells[] arrays. When background tasks are present, session.idle means "foreground quiesced, background still running" — NOT true completion.

PolyPilot was treating every session.idle as terminal, then trying to repair with EVT-REARM, PrematureIdleSignal, and recovery heuristics — a "hack chain" that failed 90% of the time.

Fix

In the SessionIdleEvent handler, inspect idle.Data.BackgroundTasks via reflection:

  • If agents/shells active → flush accumulated text, log [IDLE-DEFER], keep IsProcessing=true, wait for real idle
  • If no background tasks → proceed with normal CompleteResponse

This eliminates the root cause (premature completion) rather than patching symptoms (post-hoc recovery).

Safety

  • Existing safety nets (EVT-REARM, watchdog, recovery) are kept as defense-in-depth
  • If SDK omits BackgroundTasks (null), falls through to normal completion (safe default)
  • Reflection-based access with try/catch — gracefully falls back if SDK changes
  • All 2709 existing tests pass + 6 new tests for HasActiveBackgroundTasks

Files Changed

  • CopilotService.Events.csSessionIdleEvent handler + HasActiveBackgroundTasks helper
  • BackgroundTasksIdleTests.cs — 6 tests covering null, empty, agents, shells, both, default

PureWeen and others added 2 commits March 16, 2026 18:39
The SDK's SessionIdleEvent includes a BackgroundTasks payload with
agents[] and shells[] arrays. When background tasks are active,
session.idle means 'foreground quiesced, background still running' —
NOT true completion.

Previously, PolyPilot unconditionally called CompleteResponse on every
SessionIdleEvent, then tried to repair the damage with EVT-REARM,
PrematureIdleSignal, and recovery heuristics. This caused 90% of
multi-agent worker results to be truncated (44/49 recoveries failed
to collect full content, median gap to next TurnStart was 51.8s
vs the 15s freshness window).

The fix: inspect idle.Data.BackgroundTasks via reflection before
calling CompleteResponse. If agents or shells are active, flush
accumulated text and wait for the real idle (no background tasks).
This eliminates the root cause rather than patching symptoms.

The existing safety nets (EVT-REARM, watchdog, recovery) are kept
as defense-in-depth for edge cases where BackgroundTasks is null.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review feedback: the SDK types are public, so reflection was unnecessary
overhead (~1000x slower) with no compile-time safety. Also adds a test
for the Data==null edge case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PR #399 Review — Fix premature idle: check BackgroundTasks before completing

5-model consensus review (claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex). No prior review comments.


✅ What's right

The core fix is sound: inspecting idle.Data.BackgroundTasks to distinguish "foreground quiesced" from "turn complete" directly addresses the root cause of the 90% worker truncation rate. The logic, the deferred-flush path, and the test coverage are all correct. The PR's claims about the BackgroundTasks payload semantics match the SDK behavior.


🟡 MODERATE (5/5 consensus): HasActiveBackgroundTasks uses reflection on publicly accessible SDK types

CopilotService.Events.cs:1857–1881

HasActiveBackgroundTasks navigates Data → BackgroundTasks → Agents/Shells entirely via reflection with a bare catch { return false; }. Yet the tests in BackgroundTasksIdleTests.cs directly construct SessionIdleData, SessionIdleDataBackgroundTasks, SessionIdleDataBackgroundTasksAgentsItem, and SessionIdleDataBackgroundTasksShellsItem using typed SDK access — proving all properties are publicly accessible. <TrimmerRootAssembly Include="GitHub.Copilot.SDK" RootMode="All" /> already protects SDK types from linker trimming.

The failure mode asymmetry is the problem: if the SDK renames BackgroundTasks to anything else, the tests break at compile time (immediately caught), but production silently returns false at runtimeHasActiveBackgroundTasks never defers, and the fix silently reverts to the 90%-truncation bug with no log trace, no exception, and no indication anything is wrong.

Recommended fix — replace the reflection chain with what the tests already validate:

internal static bool HasActiveBackgroundTasks(SessionIdleEvent idle)
{
    var bt = idle.Data?.BackgroundTasks;
    if (bt == null) return false;
    return (bt.Agents?.Length > 0) || (bt.Shells?.Length > 0);
}

This is shorter, faster, trimmer-safe (already guaranteed), and breaks at compile time on any future SDK rename.


🟢 MINOR (2/5 models): Potential long stuck-session window in pathological defer loop

CopilotService.Events.cs:624

CancelTurnEndFallback(state) fires unconditionally before the background-task check. When deferring, no replacement timer is scheduled. Each incoming SessionIdleEvent resets LastEventAtTicks (line ~229), which means the 120s inactivity watchdog branch never fires during an active background-task defer loop. The only remaining circuit breaker is the absolute WatchdogMaxProcessingTimeSeconds cap (~600–3600s depending on path).

3/5 models explicitly dismissed this as not a bug (subsequent AssistantTurnEndEvents re-arm the fallback, and the watchdog absolute cap is adequate). Flagging for awareness: if the SDK enters a pathological loop of idle-with-background-tasks with no AssistantTurnEndEvent, the apparent hang can last up to the absolute watchdog cap before auto-recovery.


Dismissed (not bugs)

Concern Verdict
Infinite defer — stuck forever Not a bug (watchdog absolute cap is the circuit breaker)
FlushCurrentResponse without IsProcessing guard Not a bug (AbortSessionAsync clears CurrentResponse first; flush is a no-op)
Newline injection from multiple deferred flushes Not a bug (same pattern as TurnEnd flush path; newlines between segments are expected)

Test Coverage

✅ 6/6 new BackgroundTasksIdleTests pass. Covers: null, empty, agents-only, shells-only, both, default SessionIdleEvent.

Gap: No test for idle.Data == null. The reflection guard handles it, but once replaced with typed access (idle.Data?.BackgroundTasks), a test documenting this null-safe behavior would complete the contract.


CI Status

⚠️ Pre-existing: RenderThrottleTests.CompleteResponse_OnSessionComplete_FiresBeforeOnStateChanged — confirmed not caused by this PR.


Verdict: ⚠️ Request Changes

One actionable change before merging: replace the reflection-based HasActiveBackgroundTasks with direct typed access. The risk isn't about the current SDK version (which works fine) — it's that a silent regression back to 90% truncation is far worse than an explicit compile failure. The fix is a 4-line simplification.

The MINOR watchdog concern is informational and does not block merge.

- multi-agent-orchestration SKILL.md: 4→5 phase lifecycle, new
  IDLE-DEFER section, fix INV-O3 ordering, new INV-O15, mark
  premature idle bug as FIXED
- processing-state-safety SKILL.md: 10→16 paths table with tags,
  new INV-18 for BackgroundTasks, IDLE-DEFER in stuck session table,
  note EVT-REARM is now secondary defense, add PRs #373/#375/#399
  to regression history
- copilot-instructions.md: update SDK Event Flow step 9 for
  BackgroundTasks check, add [IDLE-DEFER] diagnostic tag, fix
  stale path count (8→15+), add BackgroundTasksIdleTests to test list

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

✅ Round 2 Review — PR #399

All 3 previous findings resolved in commits ca9e6f5f and 69216ce5.

Previous Findings Status

# Finding Status
F1 🟡 HasActiveBackgroundTasks used reflection on public SDK types (silent catch → silent regression) FIXED — replaced with direct typed access: idle.Data?.BackgroundTasks
F2 🟡 CancelTurnEndFallback before background-task check (long hang window) N/A — 3/5 models dismissed this as acceptable; watchdog absolute cap is sufficient
F3 🟢 Missing Data=null edge case test FIXEDHasActiveBackgroundTasks_DataNull_ReturnsFalse test added

What changed

HasActiveBackgroundTasks (Events.cs) — reflection replaced with:

var bt = idle.Data?.BackgroundTasks;
if (bt == null) return false;
return (bt.Agents is { Length: > 0 }) || (bt.Shells is { Length: > 0 });

Clean, typed, compile-time safe. A future SDK rename now fails at build time rather than silently disabling the fix.

Tests — 7/7 pass (added Data=null test).

✅ Approve — Ready to merge

@PureWeen PureWeen merged commit 4ad3f26 into main Mar 17, 2026
@PureWeen PureWeen deleted the fix/idle-background-tasks branch March 17, 2026 03:59
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…#399)

## Problem

Multi-agent workers consistently return truncated responses (90% failure
rate). The root cause: `SessionIdleEvent` fires mid-turn when background
tasks (sub-agents, shells) are still running, and PolyPilot
unconditionally calls `CompleteResponse`, truncating the response.

Diagnostic data from 49 observed premature idle events:
- **44/49 (90%)** recoveries gave up with truncated content
- Median gap between premature idle and next TurnStart: **51.8 seconds**
(vs 15s freshness threshold)
- 4 cases had EVT-REARM fire AFTER recovery already finalized

## Root Cause

The SDK's `SessionIdleEvent` has a `Data.BackgroundTasks` payload with
`agents[]` and `shells[]` arrays. When background tasks are present,
`session.idle` means "foreground quiesced, background still running" —
NOT true completion.

PolyPilot was treating every `session.idle` as terminal, then trying to
repair with EVT-REARM, PrematureIdleSignal, and recovery heuristics — a
"hack chain" that failed 90% of the time.

## Fix

In the `SessionIdleEvent` handler, inspect `idle.Data.BackgroundTasks`
via reflection:
- **If agents/shells active** → flush accumulated text, log
`[IDLE-DEFER]`, keep `IsProcessing=true`, wait for real idle
- **If no background tasks** → proceed with normal `CompleteResponse`

This eliminates the root cause (premature completion) rather than
patching symptoms (post-hoc recovery).

## Safety

- Existing safety nets (EVT-REARM, watchdog, recovery) are **kept** as
defense-in-depth
- If SDK omits `BackgroundTasks` (null), falls through to normal
completion (safe default)
- Reflection-based access with try/catch — gracefully falls back if SDK
changes
- All 2709 existing tests pass + 6 new tests for
`HasActiveBackgroundTasks`

## Files Changed

- `CopilotService.Events.cs` — `SessionIdleEvent` handler +
`HasActiveBackgroundTasks` helper
- `BackgroundTasksIdleTests.cs` — 6 tests covering null, empty, agents,
shells, both, default

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant