feat: Intercept SDK task tool for multi-agent orchestration (#229) by PureWeen · Pull Request #318 · PureWeen/PolyPilot

PureWeen · 2026-03-09T03:28:17Z

Closes #229

What

Replaces fragile @worker: text-parsing with structured tool-call dispatch for multi-agent orchestration.

Changes

Block SDK built-in task tool via ExcludedTools=["task"]
Register custom task AIFunction that routes to workers via ExecuteWorkerAsync
Thread-safe result collection (WorkerDelegationContext)
Tool-dispatch path with @worker: text fallback
Reconnect handling (clear _reflectToolConfigured)
Codespace client support (GetClientForGroup)
14 unit tests

Testing

2114+ tests passing
Use Implement & Challenge preset (OrchestratorReflect mode) to verify tool-dispatch
Fallback to @worker: text parsing works when tool setup fails

- Add WorkerDelegationContext and WorkerDelegationTool to intercept the model built-in task sub-agent calls in OrchestratorReflect mode - Resume orchestrator session with ExcludedTools + custom delegation AIFunction that routes calls to PolyPilot workers - Add BuildOrchestratorReflectToolPrompt for tool-based planning prompts - Add EnsureOrchestratorReflectToolsAsync to configure orchestrators - Detect tool-dispatch results and skip ParseTaskAssignments fallback - Add WorkerDelegationToolTests covering context and tool behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When an orchestrator session reconnects, SendPromptAsync resumes with only ShowImageTool -- no ExcludedTools or custom delegation function. Previously _reflectToolConfigured stayed set, so EnsureOrchestratorReflect- ToolsAsync would skip re-configuration, leaving the orchestrator without the delegation tool after reconnect. Fix: TryRemove the session from _reflectToolConfigured after the reconnect so EnsureOrchestratorReflectToolsAsync re-runs on the next reflect iteration. Adds regression test ReconnectState_ShouldClearReflectToolConfigured. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@worker

Bug 1 - Thread safety: WorkerDelegationContext._results was a plain List<T> that the SDK could mutate concurrently via parallel tool calls. Added _resultsLock; AddResult and Reset lock on it; DispatchedResults returns a ToList() snapshot so callers see a consistent view. Bug 2 - Wrong nudge format: when usingToolDispatch=true but the model skips the task tool, code fell through to ParseTaskAssignments and sent a @worker: text-format nudge. Added BuildToolDispatchNudgePrompt and handle the no-tool-call case inside the tool-dispatch if-block. Bug 3 - False GoalMet: at iteration > 1 with usingToolDispatch=true and no tool calls, the text-path declared GoalMet with no workers dispatched. The new tool-dispatch branch handles this by nudging the model up to 3 times before stalling, never declaring GoalMet without actual dispatches. Bug 4 - Stale usingToolDispatch: computed once before the loop; a mid-loop reconnect cleared _reflectToolConfigured but usingToolDispatch stayed true. Now re-checked at the top of each iteration: if the flag was cleared, EnsureOrchestratorReflectToolsAsync re-registers the tool on the new session handle and refreshes usingToolDispatch. Added 5 regression tests: Context_ConcurrentAddResult_DoesNotCorrupt, Context_DispatchedResults_ReturnsConcurrentSnapshot, WorkerDelegationContext_ResultsList_IsThreadSafe, ToolDispatch_ShouldNudgeWhenModelSkipsTaskTool, ToolDispatch_ShouldRefreshUsingToolDispatchEachIteration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@worker

… codespace support (#229) Now that PR #308 (codespace support) is merged, an OrchestratorReflect group can have its sessions running in a codespace via a tunnel client stored in _codespaceClients[groupId] — not in the local _client field. EnsureOrchestratorReflectToolsAsync was calling _client.ResumeSessionAsync directly, which for a codespace orchestrator would use the wrong (local) client, causing ResumeSessionAsync to fail and silently fall back to @worker: text dispatch — negating the entire #229 feature for codespace groups. Fix: resolve the correct CopilotClient via GetClientForGroup(orchestratorGroupId), matching the pattern used by CreateSessionAsync, SendPromptAsync, and all other session operations throughout CopilotService. Regression test added: EnsureOrchestratorReflectTools_UsesGetClientForGroup verifies the source uses GetClientForGroup and not _client.ResumeSessionAsync. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@worker

…estratorToolsAsync (#229) - EnsureOrchestratorReflectToolsAsync renamed to EnsureOrchestratorToolsAsync (not reflect-specific) - _reflectToolConfigured renamed to _toolDispatchConfigured throughout - SendViaOrchestratorAsync now calls EnsureOrchestratorToolsAsync and handles tool-dispatch results, letting regular Orchestrator groups use task tool calls instead of @worker: text blocks - BuildOrchestratorPlanningPrompt gains useToolDispatch param: emits task-tool instructions instead of @worker:/@EnD blocks when tool-dispatch is active - All regression tests updated to reflect new names and expanded test window Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The SDK validates that external tools with names matching built-in tools must explicitly declare is_override=true in AdditionalProperties. Use the AIFunctionFactoryOptions overload of AIFunctionFactory.Create to set Name, Description, and AdditionalProperties together. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When a reflect loop is cancelled (e.g., user sends a new message) while a worker is still running its task tool invocation, the new loop's dispatch immediately fails with 'Session is already processing a request'. ExecuteWorkerAsync now checks if the target worker is still processing and waits on its ResponseCompletion TCS (up to 10 min) before dispatching the new task. This mirrors the busy-wait pattern used elsewhere and prevents the 'elapsed=0.0s FAILED' pattern seen in production logs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tool dispatch (#229) When the orchestrator calls our custom task tool, the SDK may fire SessionIdleEvent before the long-running worker callback completes. Fix: increment ActiveToolCallCount on tool dispatch start, decrement on completion (in finally block). The existing guard in CompleteResponse (line 476) already skips completion when ActiveToolCallCount > 0. Also removes ExcludedTools=[task] from ResumeSessionConfig since is_override=true on the custom tool handles the replacement, and updates stale log messages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…stom tool The SDK (0.1.32) fires ToolExecutionStartEvent for is_override custom tools but never fires ToolExecutionCompleteEvent or SessionIdleEvent after the callback returns. This leaves ActiveToolCallCount permanently elevated and the orchestrator stuck until the 600s watchdog fires. Two-part fix: 1. SessionIdleEvent handler: guard against premature completion when ActiveToolCallCount > 0 (tools still in-flight) 2. OnToolDispatchEnd: after 8s grace period, if SDK hasn't produced events, manually reset ActiveToolCallCount and trigger CompleteResponse Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…losure EnsureOrchestratorToolsAsync replaces the SessionState during ResumeSessionAsync. The OnToolDispatchStart/End closures captured the old state reference, so the manual CompleteResponse workaround was operating on a stale object — its ProcessingGeneration was 0 and its TCS had no awaiters. Fix: look up the current state from _sessions dictionary at callback time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Cache the delegation AIFunction in _delegationFunctions dictionary so the reconnect handler in SendPromptAsync can include it in the ResumeSessionConfig.Tools list. This prevents iteration 2+ from stalling because the reconnected session only had ShowImageTool. Also handles the fresh-session fallback path (session expired). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ning concurrent tools The unconditional Exchange(ref ActiveToolCallCount, 0) could corrupt tool accounting if a new tool started during the 8s grace period. Now uses CompareExchange to only reset when exactly 1 tool remains, and skips manual CompleteResponse if other tools are still in-flight. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…1600ms) The SDK either fires events within milliseconds or not at all. Polling with early bail-out reduces worst-case latency from 8s to ~3s while still giving the SDK a fair chance at each interval. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…aching ResumeSessionAsync on a session stuck in 'tool executing' state produces a dead session (no events). Instead of including the delegation tool in the reconnect config (which doesn't help), always clear the flag and add a post-send re-ensure check: after SendPromptAndWaitAsync returns, if tools were lost (reconnect cleared the flag), immediately recreate a clean session via EnsureOrchestratorToolsAsync and retry the prompt. This avoids the 600s watchdog wait on each reconnect. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ResumeSessionAsync on a session stuck in 'tool executing' state produces a dead session that never fires events. For orchestrator sessions with tool dispatch, skip the retry entirely and complete with empty response. The reflection loop's post-send re-ensure detects tools were lost, creates a clean session via EnsureOrchestratorToolsAsync, and retries immediately — no 600s watchdog wait. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…sh session fallback When ResumeSessionAsync throws 'Session not found' (expired server-side), create a fresh session via CreateSessionAsync with the delegation tool. Also moved _delegationFunctions storage to after successful session creation and clean up on failure to prevent stale entries that cause the reconnect handler to incorrectly skip retry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The SDK fires ToolExecutionStartEvent for ALL parallel tool calls but only invokes our is_override callback for ONE of them. The un-invoked calls leave phantom ActiveToolCallCount entries that prevent manual CompleteResponse from firing. Force-reset to 0 since our OnToolDispatchStart/End callbacks are the real source of truth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ator mode - Add post-send re-ensure after planning prompt (same as OrchestratorReflect) - Clear delegation state before synthesis prompt so reconnect retries normally instead of skip-retry (synthesis doesn't use the task tool) - Fixes dead-session hang during PR Review Squad synthesis phase Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

After manual CompleteResponse for is_override tool dispatch, the server-side session is stuck in 'tool executing' state. ResumeSessionAsync on it creates a dead session. Instead, create a completely fresh session via CreateSessionAsync for the synthesis phase (which doesn't need the delegation tool). Also adds post-send re-ensure for the planning prompt in Orchestrator mode, matching the pattern already used in OrchestratorReflect. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…allel The SDK (0.1.32) fires ToolExecutionStartEvent for all parallel is_override tool calls but only invokes the callback for a subset. This causes phantom dispatches where some workers never execute. Instructing the model to call task one-at-a-time works around this SDK limitation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The SDK only executes one is_override tool callback per turn, then our manual CompleteResponse ends the turn. For multi-worker scenarios (e.g., review 3 PRs), Orchestrator mode now loops: each iteration dispatches one worker via a fresh session with continuation prompt, accumulating results until all workers are dispatched. Then a final synthesis prompt combines all results. Capped at 10 iterations for safety. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…urns After manual CompleteResponse, the server-side session is stuck in 'tool executing' state. ResumeSessionAsync creates a dead session. Added forceCreate parameter to EnsureOrchestratorToolsAsync that uses CreateSessionAsync instead, giving each continuation turn a clean session. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…-robin Reset() with all workers + round-robin always picked worker-1. Now passes only un-dispatched workers so each continuation iteration dispatches to a different worker. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Check ActiveToolCallCount before completing on reconnect. If > 0, a tool callback is still executing (worker is running). Let it finish naturally — OnToolDispatchEnd will trigger CompleteResponse. Prevents premature empty response when worker takes longer than the SDK reconnect interval. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The continuation prompt told the model 'workers still need tasks' which caused it to dispatch ALL available workers even when the user's request was fulfilled. Now explicitly tells the model to stop dispatching when all requested work is done, and that it does NOT need to use all available workers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When RestorePreviousSessionsAsync encounters a 'session file corrupted' error (e.g., events.jsonl locked by another copilot process), fall back to CreateSessionAsync instead of silently dropping the session. This prevents multi-agent orchestrator sessions from vanishing after restart. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When a session's events.jsonl is locked by another copilot process (e.g., Copilot CLI terminal), the SDK reports 'session file corrupted'. Updated the UI error message to explain the actual cause (CLI lock) and suggest closing the CLI session first. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Changed the dispatch loop from sequential (await each worker before dispatching the next) to parallel (fire-and-forget each worker, await all at the end). The SDK's one-callback-per-turn limitation means we still loop to dispatch (~3s per worker), but workers now run in parallel instead of serially. Before: dispatch W1 (wait 43s) → W2 (wait 199s) → W3 (wait 531s) = 773s After: dispatch all 3 (~9s) → await longest (531s) = ~540s Key changes: - WorkerDelegationTool.InvokeAsync is now non-blocking: starts worker in background via Task.Run, returns 'dispatched' immediately - WorkerDelegationContext gains PendingTasks dict and AwaitAllPendingAsync() - FullReset() clears pending tasks; Reset() preserves them across iterations - Dispatch loop collects dispatched names (not results), sends lightweight continuation prompts, then awaits all pending tasks after loop completes - Updated prompts to explain parallel execution to the model Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix OrchestratorReflect round-robin: add ResetResults() that preserves round-robin index across iterations so workers alternate correctly (worker-1 → worker-2 → worker-1) instead of always picking index 0 - Fix critical processing-state cleanup violation: reconnect skip-retry path now clears all 9 companion fields matching CompleteResponse pattern (was only clearing IsProcessing + watchdog, leaving stale API time, tool counts, phase, and resumed flag) - Fix placeholder Success=true: add Dispatched flag to ToolDispatchedResult so placeholders are semantically distinct from completed results - Add ObserveAllPending() for safe cleanup of unobserved task exceptions when dispatch loop crashes before AwaitAllPendingAsync() - Replace Interlocked.CompareExchange read-only checks with Volatile.Read for ActiveToolCallCount (idiomatic and clearer intent) - Normalize MSBuild path separators (backslash → forward slash) - Improve session error message to cover both locked and corrupted cases - Add 3 new tests: AwaitAllPendingAsync with throwing task, ResetResults preserves round-robin, ObserveAllPending suppresses exceptions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When the model responds with text (planning) instead of calling the task tool on the first turn, the Orchestrator mode dispatch branch was skipped entirely — falling through to text-parsing and never dispatching workers in parallel. Fix: mirror the OrchestratorReflect nudge pattern (lines 2604-2630). If DispatchedResults is empty after the first turn, send a BuildToolDispatchNudgePrompt on a fresh session before falling through to text-parsing. This gives the model a second chance to use the task tool. Also adds 7 new unit tests covering: - ResetResults/Reset preserve pending tasks across iterations - FullReset clears pending tasks - PendingTasks accumulate across context resets - Multiple dispatches round-robin workers - Nudge scenario (empty results then dispatch) - ToolDispatchedResult.Dispatched flag defaults Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ndingAsync Two fixes for zombie worker sessions after SDK reconnect: 1. Worker revival: when SendPromptAndWaitAsync returns an empty response (dead session — SDK lost event stream after reconnect), automatically send a 'please continue' steering message. The server-side session may still be alive; a new SendAsync can kick it back into action. 2. Per-worker timeout on AwaitAllPendingAsync: if a worker task doesn't complete within 5 minutes (configurable), return a timeout error instead of blocking the entire dispatch forever. Previously, a single dead worker with a stuck watchdog (600s tool timeout) blocked synthesis for 10+ min. Root cause analysis: workers that reconnect during SDK event streaming get a new session object, but the server-side session sometimes goes silent after 1-2 turns. The watchdog eventually fires but uses the 600s tool timeout (isMultiAgentSession=true triggers Case B at 600s instead of 120s). This is a pre-existing SDK issue, not caused by our changes. Also adds test: Context_AwaitAllPendingAsync_TimesOutDeadWorker Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When the initial ResumeSession produces a dead session, the nudge code creates a fresh session via CreateSessionAsync. The fresh session has no conversation history, so sending only the nudge text ('use the task tool') left the model without context about being an orchestrator. Fix: re-send the full planningPrompt (with appended tool-use instruction) on the fresh session so the model has all the context it needs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two fixes for zombie worker sessions: 1. Serialize client reconnect (thundering herd fix): When multiple workers hit IsConnectionError concurrently, each one would dispose and recreate _client, destroying the previous worker's SSE stream. Only the last worker to reconnect survived. Fix: add _clientReconnectLock (SemaphoreSlim) to serialize the path. The first worker recreates _client; subsequent workers reuse it. 2. Fresh-session revival: When a worker returns empty (dead SSE stream), instead of sending a steering message to the same dead session, create a brand new session via CreateSessionAsync and replay the full worker prompt. The old session is disposed, the new SessionId is stored in Info, and _sessions[name] is replaced with fresh state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-10T05:29:20Z

PR #318 — 5-Model Consensus Review

Title: feat: Intercept SDK task tool for multi-agent orchestration
CI: ⚠️ No checks configured on feature/intercept-task-tool-229
Prior reviews: None
Models: claude-opus-4.6 ×2, gemini-3-pro-preview, gpt-5.3-codex (4/5 complete; 1 timed out — findings below reflect 4-model consensus)

🔴 CRITICAL — Reconnect-skip path leaves `SendingFlag` set → session deadlock (4/4 consensus)

PolyPilot/Services/CopilotService.cs — reconnect-skip block

The new orchestrator reconnect-skip path sets IsProcessing = false but does not call Interlocked.Exchange(ref state.SendingFlag, 0).

SendPromptAsync sets SendingFlag = 1 at entry. All other IsProcessing = false paths release it (either via CompleteResponse or explicit reset). This path does neither. The next call to SendPromptAsync for this session will fail the CompareExchange(ref state.SendingFlag, 1, 0) check and throw "Session is already processing a request" — permanently deadlocking the session until the app restarts.

Additional missing INV-1 cleanup in the same block (4/4 consensus):

FlushCurrentResponse(state) not called before IsProcessing = false → any streaming content accumulated before reconnect is silently dropped
ClearPermissionDenials() not called → stale permission denials persist into next turn
OnSessionComplete not fired → external subscribers never unblocked

// Missing from the reconnect-skip block:
FlushCurrentResponse(state);                             // before IsProcessing = false
Interlocked.Exchange(ref state.SendingFlag, 0);          // ← CRITICAL: prevents deadlock
state.Info.ClearPermissionDenials();
OnSessionComplete?.Invoke(sessionName);
// Also missing diagnostic log tag: [RECONNECT-SKIP] or similar

🟡 MODERATE — `OnStateChanged?.Invoke()` in reconnect-skip path fires on background thread (4/4 consensus)

PolyPilot/Services/CopilotService.cs — same reconnect-skip block

OnStateChanged?.Invoke() is called directly without InvokeOnUI(). This code runs in SendPromptAsync's catch block, which for worker sessions executes on a Task.Run background thread. Project invariant INV-2 requires ALL IsProcessing state mutations and OnStateChanged notifications to be marshaled to the UI thread. Every other termination path uses InvokeOnUI() or _syncContext.Post(). This will cause cross-thread Blazor render exceptions under load.

Fix: Wrap the entire block in InvokeOnUI(() => { ... }).

🟡 MODERATE — Escalating poll calls `CompleteResponse` on background thread when `_syncContext == null` (4/4 consensus)

PolyPilot/Services/CopilotService.Organization.cs — EnsureOrchestratorToolsAsync, OnToolDispatchEnd lambda

if (_syncContext != null)
    _syncContext.Post(_ => CompleteResponse(cur, endGen), null);
else
    CompleteResponse(cur, endGen);  // ← called on Task.Run thread

The else branch calls CompleteResponse directly on the thread pool. CompleteResponse mutates IsProcessing, fires OnStateChanged, and clears SendingFlag — all must run on the UI thread per INV-2. The watchdog uses InvokeOnUI() (which handles null _syncContext safely); this code should too. In production MAUI _syncContext is typically non-null, but the fallback path is a correctness bug.

Fix: Replace else CompleteResponse(cur, endGen) with else InvokeOnUI(() => CompleteResponse(cur, endGen)).

🟡 MODERATE — `AddPendingTask` silently overwrites prior task for same worker name (3/4 consensus)

PolyPilot/Services/WorkerDelegationTool.cs — AddPendingTask

internal void AddPendingTask(string workerName, Task<ToolDispatchedResult> task)
{
    _pendingTasks[workerName] = task;  // overwrites silently
}

The round-robin index wraps around when all workers have been dispatched. With 2 workers and 3 dispatches, the sequence is w1→w2→w1. The second w1 dispatch overwrites the first task in _pendingTasks. The first w1 worker continues running in the background but its result is permanently lost — AwaitAllPendingAsync only collects the second task. This also produces an unobserved task exception if the first task faults.

Fix: Use a list-of-tasks keyed by worker name, or append a dispatch counter to the key.

🟡 MODERATE — Stale pending tasks from interrupted dispatch survive into new context (3/4 consensus)

PolyPilot/Services/CopilotService.Organization.cs — EnsureOrchestratorToolsAsync

var context = _delegationContexts.GetOrAdd(orchestratorName, _ => new WorkerDelegationContext());

GetOrAdd reuses the existing WorkerDelegationContext on reconnect. The orchestration loops call FullReset at the start of a new cycle (which clears _pendingTasks), but the reflect loop only calls Reset/ResetResults — which deliberately preserve pending tasks. After a reconnect mid-reflect-loop, the new iteration's EnsureOrchestratorToolsAsync reuses the context with surviving pending tasks from the interrupted dispatch. Those tasks are then awaited by AwaitAllPendingAsync in the new iteration, mixing stale results from the old dispatch with new ones.

🟡 MODERATE — NRE in synthesis session creation when `_client` is null (2/4 consensus)

PolyPilot/Services/CopilotService.Organization.cs — synthesis path

GitHub.Copilot.SDK.CopilotClient synthClient;
try { synthClient = GetClientForGroup(orchestratorGroupId2); }
catch { synthClient = _client; }
var freshSession = await synthClient.CreateSessionAsync(...);  // NRE if _client is null

If GetClientForGroup throws (codespace tunnel disconnected) AND _client is null (after a failed reconnect set _client = null; IsInitialized = false), synthClient is null and the next line throws NullReferenceException, crashing the synthesis phase with an opaque error.

Fix: catch { synthClient = _client ?? throw new InvalidOperationException("No client available for synthesis session"); }

Test Coverage Assessment

The 14 new tests in WorkerDelegationToolTests.cs are comprehensive for WorkerDelegationContext behavior (concurrency, timeouts, round-robin, nudge scenarios). The 6 new tests in MultiAgentRegressionTests.cs use the source-text-as-spec pattern (read source file, assert string patterns exist) which is fragile but consistent with the existing regression test style.

Missing test coverage for new code paths:

Reconnect-skip path: no test verifying SendingFlag is cleared, FlushCurrentResponse called, and OnStateChanged fires on UI thread
AddPendingTask overwrite: the existing Context_ConcurrentAddResult_DoesNotCorrupt test doesn't cover the overwrite scenario (same worker dispatched twice → second overwrites first)
Stale pending task contamination after reconnect: no test for GetOrAdd reuse with surviving tasks

Suggested test cases:

ReconnectSkipPath_ClearsSendingFlag — verify SendingFlag == 0 after reconnect-skip fires
AddPendingTask_SameWorkerTwice_BothTasksAwaited — currently only the second task is collected
AwaitAllPendingAsync_StaleTasksFromPreviousCycle_AreNotIncluded

Summary

The thundering-herd fix (_clientReconnectLock) and GetClientForGroup usage in EnsureOrchestratorToolsAsync are correct. The WorkerDelegationContext thread-safety model is sound. The core issue is that the new reconnect-skip path introduces a new IsProcessing = false path that violates 4 of the project's 12 required cleanup steps — most critically, SendingFlag is not released, which will deadlock sessions.

⚠️ Request changes — address the CRITICAL SendingFlag issue + the two threading violations before merge.

4-model review: claude-opus-4.6 ×2, gemini-3-pro-preview, gpt-5.3-codex

PureWeen · 2026-03-10T05:30:42Z

PR #318 Review — feat: Intercept SDK task tool for multi-agent orchestration

CI Status: ⚠️ No CI checks configured on this branch
Existing reviews: None
Models: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex (5/5)
Consensus threshold: ≥2 models

🟡 MODERATE — Missing `McpServers`, `SkillDirectories`, `SystemMessage` in all fresh session configs (5/5 consensus)

Every SessionConfig created fresh in this PR omits MCP servers, skill directories, and system message. This affects five code paths:

Location	Affected path
`CopilotService.Organization.cs`	`EnsureOrchestratorToolsAsync` — `forceCreate=true`
`CopilotService.Organization.cs`	`EnsureOrchestratorToolsAsync` — resume path
`CopilotService.Organization.cs`	`EnsureOrchestratorToolsAsync` — "session not found" fallback
`CopilotService.Organization.cs`	`SendViaOrchestratorAsync` — synthesis session
`CopilotService.Organization.cs`	`ExecuteWorkerAsync` — dead-worker revival

Compare with the existing reconnect handler (CopilotService.cs:~2561) which correctly calls LoadMcpServers() and LoadSkillDirectories(). PR #330 fixed this exact gap for the standard reconnect path; this PR reintroduces it for five new paths.

The most impactful case is worker revival: a revived worker loses all external tool access — file editing, running tests, reading code — making the revival useless for any real task. The orchestrator forceCreate paths are less critical (orchestrators primarily call the task tool) but still lose custom system prompts and MCP context.

Fix: Copy the MCP/skills/system-message restoration pattern from the existing reconnect handler into each fresh SessionConfig construction.

🟡 MODERATE (disputed 2/5) — `CompleteResponse` called on threadpool when `_syncContext` is null

In OnToolDispatchEnd's background Task.Run, the fallback branch:

if (_syncContext != null)
    _syncContext.Post(_ => CompleteResponse(cur, endGen), null);
else
    CompleteResponse(cur, endGen);  // runs on threadpool

Per INV-2, all IsProcessing mutations must run on the UI thread. Two models (opus-4.6, gemini) flag this as an INV-2 violation. One model (sonnet) disagrees, arguing this is functionally identical to InvokeOnUI's null-context fallback behavior.

Additionally, the _syncContext.Post path dispatches CompleteResponse without a try-catch; all other dispatch sites use InvokeOnUI() or Invoke() which wrap the callback. An exception from CompleteResponse would propagate unhandled into the SynchronizationContext.

Suggestion: Use InvokeOnUI(() => CompleteResponse(cur, endGen)) to consolidate the dispatch logic and get the try-catch wrapping for free.

🟡 MODERATE (disputed 2/5) — `Interlocked.Exchange(ref cur.ActiveToolCallCount, 0)` races with concurrent `ToolExecutionStartEvent`

The polling loop unconditionally zeroes the count after ~3s:

Interlocked.Exchange(ref cur.ActiveToolCallCount, 0); // Force-reset
_syncContext.Post(_ => CompleteResponse(cur, endGen), null);

Two models (opus-4.6, gemini) flag that this races with a concurrent ToolExecutionStartEvent increment for a second parallel task call, potentially leaving ActiveToolCallCount at 0 while a second tool is in-flight.

One model (sonnet) disagrees: "The SDK invokes at most one is_override callback per turn, so OnToolDispatchStart/End fire exactly once before the 3-second poll runs." The comment in the code itself acknowledges this: "The SDK fires ToolExecutionStartEvent for ALL parallel tool calls but may only invoke our callback for ONE of them."

Given the competing analysis, this warrants confirmation of the SDK behavior but may not be a bug in practice.

Test Coverage Assessment

The PR includes 452-line WorkerDelegationToolTests.cs and 14 new regression tests in MultiAgentRegressionTests.cs. Strong coverage overall. Missing test cases:

MCP server propagation in revival/forceCreate paths: Verify that EnsureOrchestratorToolsAsync (forceCreate=true) and ExecuteWorkerAsync revival restore MCP servers in the new session config.
_syncContext == null dispatch path: Test that CompleteResponse is invoked correctly when _syncContext is null in the OnToolDispatchEnd fallback.

Below consensus (single-model findings, for author awareness)

(sonnet) CopilotService.Events.cs:528 — CancelTurnEndFallback fires unconditionally before the ActiveToolCallCount guard. When the guard triggers (tools in-flight), both the SessionIdleEvent's CompleteResponse path AND the TurnEnd-fallback timer are gone — only the 3s OnToolDispatchEnd poll remains. Consider only cancelling CancelTurnEndFallback after the guard passes.
(opus-1) CopilotService.Organization.cs:1182 — continuation retry calls retryCtx2.Reset(prompt, workerNames, ct) with the full workerNames list instead of remaining workers; round-robin restarts at worker-0 regardless of dispatch history.
(opus-2) CopilotService.cs:2682 — non-volatile state.ActiveToolCallCount direct read in the reconnect path; all other reads use Volatile.Read.
(opus-2) CopilotService.cs:2683-2698 — orchestrator reconnect-skip path sets IsProcessing=false without calling FlushCurrentResponse(state) first (per INV-1).
(codex) WorkerDelegationContext.cs — _pendingTasks keyed only by workerName; dispatching the same worker twice in one iteration overwrites the first task handle, orphaning it.

Recommended Action

⚠️ Request changes

The MCP/skills regression (5/5 consensus) is the primary blocker: worker revival loses all external tool access, making the revival feature non-functional for any task requiring tools. The fix is mechanical — copy the MCP/skills restoration pattern from the existing reconnect handler into the five new SessionConfig construction sites.

Phase 1 (Critical/P0): - Fix reconnect-skip SendingFlag deadlock (FlushCurrentResponse, ClearPermissionDenials, InvokeOnUI) - Split watchdog tiers: active-tool=600s, used-tools-idle=180s, default=120s - Reset HasUsedToolsThisTurn before reconnect retry Phase 2 (Fresh Session Config): - Extract BuildFreshSessionConfig helper with MCP/skills/system-message - Applied to all 5 fresh SessionConfig sites + reconnect handler Phase 3 (Moderate): - Fix OnToolDispatchEnd threading (InvokeOnUI) - Counter-keyed AddPendingTask prevents task overwrite - Reset clears stale pending tasks - Synthesis NRE guard when _client is null - Wire CTS to ResponseCompletion TCS for real timeout Phase 4 (Dispatch Coverage): - Continuation prompt includes full planningPrompt + dispatch status - Removed discouraging dispatch language - Added duplicate-task prevention instruction Bug fixes from multi-model review: - Bug 1: Clean delegation dicts in reflect finally + CloseSessionCoreAsync - Bug 2: Cancel watchdog/fallback timers before session replacement in EnsureOrchestratorToolsAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-10T14:49:08Z

Superseded by #339. The tool-intercept approach fought the SDK event model — reverting to enhanced text-parsing dispatch with JSON mode, retry loop, and ported reliability fixes.

@worker

) ## Summary Replaces the tool-intercept approach (PR #318) with enhanced text-parsing dispatch. The tool-intercept approach fought the SDK's event model — 22/33 commits were workarounds for the impedance mismatch, and it *caused* the 'single worker dispatched' and 'identical tasks' production bugs. This PR keeps the text-parsing path that was working reliably, and enhances it: ### Text-Parsing Enhancements - **JSON mode parsing**: Orchestrator can return \[{worker, task}]\ JSON arrays, parsed with System.Text.Json. Falls back to \@worker:...@EnD\ regex on parse failure. - **Exact match only** for worker name resolution — removed bidirectional \Contains\ fallback that caused misroutes when names are substrings of each other. - **Backtick/quote stripping** from worker names for robustness. - **Differentiated task instruction**: 'Each worker MUST receive a DIFFERENT sub-task' - **Retry loop** (up to 3 iterations) with conversation history when not all workers dispatched. Model remembers what it already assigned — no fresh sessions, no amnesia. ### Reliability Fixes (ported from tool-dispatch branch) - \_clientReconnectLock\: SemaphoreSlim thundering-herd fix for concurrent workers hitting connection errors - **Watchdog tier split**: active-tool=600s, used-tools-idle=180s, default=120s (cuts zombie detection from 10min to 3min) - **BuildFreshSessionConfig** helper: MCP servers, skills, system message in one place - **CTS-to-TCS wiring**: 10-minute timeout in SendPromptAndWaitAsync actually cancels the TCS - **HasUsedToolsThisTurn reset** before reconnect retry - **Worker revival**: detect empty response → fresh session → retry once (~20 lines) - **Volatile.Read** cleanup for ActiveToolCallCount - **Corrupt/locked session** restore fallback ### Tests - 6 new JSON parsing tests - Updated fuzzy match tests for exact-match-only behavior - Updated structural regression guards for BuildFreshSessionConfig ### Why Not Tool-Intercept? Multi-model review (Opus 4.6, Sonnet 4, Gemini 3 Pro, GPT-5.2) unanimously recommended against the tool-intercept approach. The core issue: \is_override\ on a custom task AIFunction returns a fake placeholder while running real work in background Task.Run. The SDK doesn't handle this — events don't fire correctly, sessions get stuck, and fresh sessions lose conversation history. Closes #229 Supersedes #318 --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-10T20:38:18Z

PR #318 Review: "feat: Intercept SDK task tool for multi-agent orchestration"

CI Status: ⚠️ No CI checks configured

5 models reviewed (claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex). 4 consensus findings.

🔴 CRITICAL — `[[GROUP_REFLECT_COMPLETE]]` checked on wrong string (5/5 models)

CopilotService.Organization.cs — SendViaOrchestratorReflectAsync

The reflect loop termination check is performed on the concatenated WORKER responses, not on the orchestrator's synthesizing response. [[GROUP_REFLECT_COMPLETE]] is the signal the orchestrator sends when it decides the multi-agent loop is done — but the code checks the string that workers returned, not the orchestrator's reply. As a result, the reflect loop never self-terminates. It will run until the max iteration cap regardless of whether the orchestrator signals completion. This causes unbounded cost/latency on every multi-agent reflect cycle.

🔴 CRITICAL — `OnToolDispatchEnd` poll fires on wrong session after reconnect (4/5 models)

CopilotService.Organization.cs — EnsureOrchestratorToolsAsync / OnToolDispatchEnd

The poll callback captures endGen from the session's ProcessingGeneration at dispatch time. After a reconnect, the new session object starts at generation 0. If endGen was also 0 (first prompt after init), the endGen == state.Info.ProcessingGeneration guard is satisfied on the wrong session. CompleteResponse fires on the new/wrong session, incorrectly ending a prompt that hasn't finished.

🟡 MODERATE — `FullReset()` creates zombie tasks (4/5 models)

WorkerDelegationTool.cs — FullReset()

FullReset() clears _pendingTasks without first calling ObserveAllPending(). Any in-flight worker Task objects are abandoned with no observer. When they eventually complete or fault, they inject their (now-stale) results into the next orchestration cycle's task dictionary via the shared completion callback. This causes stale results from a previous cycle to corrupt the next cycle.

Fix: Call ObserveAllPending() (or equivalent drain) before Reset().

🟡 MODERATE — `AddPendingTask` duplicate key overwrites first task (3/5 models)

WorkerDelegationTool.cs — AddPendingTask()

Worker name is used as the dictionary key. In a reflect cycle where the same worker is dispatched more than once across iterations, the second AddPendingTask call for that worker silently overwrites the first task. The first worker's result is lost; the orchestrator never sees it. Combined with the FullReset() zombie issue, this creates unpredictable result sets in multi-iteration reflect cycles.

Test Coverage

No tests were added for the new WorkerDelegationTool, WorkerDelegationContext, or the orchestration interception path. Given the severity of the generation-guard race and the FullReset zombie issue, these are critical paths that need unit test coverage. Suggested:

FullReset_DoesNotLeakPendingTaskResults
AddPendingTask_DuplicateWorkerName_ThrowsOrDeduplicates
OnToolDispatchEnd_DoesNotFireOnWrongGeneration

⚠️ Request changes — the [[GROUP_REFLECT_COMPLETE]] string check and OnToolDispatchEnd generation guard are both logic bugs that affect every multi-agent reflect cycle.

@worker

…#229) (PureWeen#339) ## Summary Replaces the tool-intercept approach (PR PureWeen#318) with enhanced text-parsing dispatch. The tool-intercept approach fought the SDK's event model — 22/33 commits were workarounds for the impedance mismatch, and it *caused* the 'single worker dispatched' and 'identical tasks' production bugs. This PR keeps the text-parsing path that was working reliably, and enhances it: ### Text-Parsing Enhancements - **JSON mode parsing**: Orchestrator can return \[{worker, task}]\ JSON arrays, parsed with System.Text.Json. Falls back to \@worker:...@EnD\ regex on parse failure. - **Exact match only** for worker name resolution — removed bidirectional \Contains\ fallback that caused misroutes when names are substrings of each other. - **Backtick/quote stripping** from worker names for robustness. - **Differentiated task instruction**: 'Each worker MUST receive a DIFFERENT sub-task' - **Retry loop** (up to 3 iterations) with conversation history when not all workers dispatched. Model remembers what it already assigned — no fresh sessions, no amnesia. ### Reliability Fixes (ported from tool-dispatch branch) - \_clientReconnectLock\: SemaphoreSlim thundering-herd fix for concurrent workers hitting connection errors - **Watchdog tier split**: active-tool=600s, used-tools-idle=180s, default=120s (cuts zombie detection from 10min to 3min) - **BuildFreshSessionConfig** helper: MCP servers, skills, system message in one place - **CTS-to-TCS wiring**: 10-minute timeout in SendPromptAndWaitAsync actually cancels the TCS - **HasUsedToolsThisTurn reset** before reconnect retry - **Worker revival**: detect empty response → fresh session → retry once (~20 lines) - **Volatile.Read** cleanup for ActiveToolCallCount - **Corrupt/locked session** restore fallback ### Tests - 6 new JSON parsing tests - Updated fuzzy match tests for exact-match-only behavior - Updated structural regression guards for BuildFreshSessionConfig ### Why Not Tool-Intercept? Multi-model review (Opus 4.6, Sonnet 4, Gemini 3 Pro, GPT-5.2) unanimously recommended against the tool-intercept approach. The core issue: \is_override\ on a custom task AIFunction returns a fake placeholder while running real work in background Task.Run. The SDK doesn't handle this — events don't fire correctly, sessions get stuck, and fresh sessions lose conversation history. Closes PureWeen#229 Supersedes PureWeen#318 --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen force-pushed the feature/intercept-task-tool-229 branch 2 times, most recently from b5ba48c to c8441b6 Compare March 9, 2026 23:22

PureWeen mentioned this pull request Mar 9, 2026

Orchestrator session fails to restore when session is locked by Copilot CLI #334

Open

PureWeen and others added 27 commits March 9, 2026 20:26

PureWeen force-pushed the feature/intercept-task-tool-229 branch from 975d8de to a6e8b9f Compare March 10, 2026 01:27

PureWeen force-pushed the feature/intercept-task-tool-229 branch from d5fd38a to 45b34b3 Compare March 10, 2026 03:08

PureWeen and others added 3 commits March 9, 2026 22:58

PureWeen force-pushed the feature/intercept-task-tool-229 branch from 618b920 to 7ebf792 Compare March 10, 2026 04:58

PureWeen force-pushed the feature/intercept-task-tool-229 branch from 7ebf792 to b457532 Compare March 10, 2026 05:00

PureWeen mentioned this pull request Mar 10, 2026

feat: enhanced text-parsing dispatch with reliability fixes (#229) #339

Merged

PureWeen closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Intercept SDK task tool for multi-agent orchestration (#229)#318

feat: Intercept SDK task tool for multi-agent orchestration (#229)#318
PureWeen wants to merge 34 commits intomainfrom
feature/intercept-task-tool-229

PureWeen commented Mar 9, 2026

Uh oh!

PureWeen commented Mar 10, 2026

Uh oh!

PureWeen commented Mar 10, 2026

Uh oh!

PureWeen commented Mar 10, 2026

Uh oh!

PureWeen commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Mar 9, 2026

What

Changes

Testing

Uh oh!

PureWeen commented Mar 10, 2026

PR #318 — 5-Model Consensus Review

🔴 CRITICAL — Reconnect-skip path leaves SendingFlag set → session deadlock (4/4 consensus)

🟡 MODERATE — OnStateChanged?.Invoke() in reconnect-skip path fires on background thread (4/4 consensus)

🟡 MODERATE — Escalating poll calls CompleteResponse on background thread when _syncContext == null (4/4 consensus)

🟡 MODERATE — AddPendingTask silently overwrites prior task for same worker name (3/4 consensus)

🟡 MODERATE — Stale pending tasks from interrupted dispatch survive into new context (3/4 consensus)

🟡 MODERATE — NRE in synthesis session creation when _client is null (2/4 consensus)

Test Coverage Assessment

Summary

Uh oh!

PureWeen commented Mar 10, 2026

PR #318 Review — feat: Intercept SDK task tool for multi-agent orchestration

🟡 MODERATE — Missing McpServers, SkillDirectories, SystemMessage in all fresh session configs (5/5 consensus)

🟡 MODERATE (disputed 2/5) — CompleteResponse called on threadpool when _syncContext is null

🟡 MODERATE (disputed 2/5) — Interlocked.Exchange(ref cur.ActiveToolCallCount, 0) races with concurrent ToolExecutionStartEvent

Test Coverage Assessment

Below consensus (single-model findings, for author awareness)

Recommended Action

Uh oh!

PureWeen commented Mar 10, 2026

Uh oh!

PureWeen commented Mar 10, 2026

PR #318 Review: "feat: Intercept SDK task tool for multi-agent orchestration"

🔴 CRITICAL — [[GROUP_REFLECT_COMPLETE]] checked on wrong string (5/5 models)

🔴 CRITICAL — OnToolDispatchEnd poll fires on wrong session after reconnect (4/5 models)

🟡 MODERATE — FullReset() creates zombie tasks (4/5 models)

🟡 MODERATE — AddPendingTask duplicate key overwrites first task (3/5 models)

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🔴 CRITICAL — Reconnect-skip path leaves `SendingFlag` set → session deadlock (4/4 consensus)

🟡 MODERATE — `OnStateChanged?.Invoke()` in reconnect-skip path fires on background thread (4/4 consensus)

🟡 MODERATE — Escalating poll calls `CompleteResponse` on background thread when `_syncContext == null` (4/4 consensus)

🟡 MODERATE — `AddPendingTask` silently overwrites prior task for same worker name (3/4 consensus)

🟡 MODERATE — NRE in synthesis session creation when `_client` is null (2/4 consensus)

🟡 MODERATE — Missing `McpServers`, `SkillDirectories`, `SystemMessage` in all fresh session configs (5/5 consensus)

🟡 MODERATE (disputed 2/5) — `CompleteResponse` called on threadpool when `_syncContext` is null

🟡 MODERATE (disputed 2/5) — `Interlocked.Exchange(ref cur.ActiveToolCallCount, 0)` races with concurrent `ToolExecutionStartEvent`

🔴 CRITICAL — `[[GROUP_REFLECT_COMPLETE]]` checked on wrong string (5/5 models)

🔴 CRITICAL — `OnToolDispatchEnd` poll fires on wrong session after reconnect (4/5 models)

🟡 MODERATE — `FullReset()` creates zombie tasks (4/5 models)

🟡 MODERATE — `AddPendingTask` duplicate key overwrites first task (3/5 models)