Strengthen multi-agent skills, Fix flow, and stability tests by PureWeen · Pull Request #375 · PureWeen/PolyPilot

PureWeen · 2026-03-15T16:03:27Z

Multi-agent orchestration SKILL.md: Added PR 373 reconnect safety invariants (INV-O9 through INV-O13), session death/deadlock bug documentation, Fix with Copilot multi-agent awareness section, OrchestratorReflect monitoring guide, enhanced end-to-end checklist, and comprehensive test coverage gap analysis
Processing-state-safety SKILL.md: Added INV-14 through INV-17 covering IsOrphaned guards, TryUpdate concurrency, handler-before- publish ordering, and MCP server reload on reconnect
SessionSidebar.razor Fix flow: GetBugReportDebugInfo now includes multi-agent context (group name, mode, role, members, event diagnostics, PendingOrchestration state) via AppendMultiAgentDebugInfo. BuildCopilotPrompt adds multi-agent testing requirements when the session is in a multi-agent group
SessionStabilityTests.cs: 20 new tests covering IsOrphaned guards (5 tests), ForceCompleteProcessingAsync INV-1 compliance (3 tests), synthesis prompt mixed success/failure (2 tests), sibling re-resume safety (4 tests), MCP reload (2 tests), diagnostic log completeness, watchdog companion fields, and Fix prompt multi-agent enhancement

PureWeen · 2026-03-15T19:44:26Z

Multi-Model Consensus Review -- Round 2 (5-model × 5-agent)

CI Status: ⚠️ No CI configured

Prior Review Comments: None outstanding

🔴 CRITICAL -- 3 issues (2 new this round)

1. cancelIdx == -1 false-positive in test (SessionStabilityTests.cs:113-116)
The ForceCompleteProcessing_CancelsTimersBeforeUiThreadWork test compares cancelIdx < invokeIdx without first asserting cancelIdx >= 0. If CancelProcessingWatchdog is removed from the method, cancelIdx becomes -1, which is always < invokeIdx, so the test passes vacuously. Add a guard: Assert.True(cancelIdx >= 0, "CancelProcessingWatchdog must be present");

2. LongRunningWorker_IsNotKilledByStandardTimeout is a tautological no-op (LongRunningSessionSafetyTests.cs:260-281) (NEW)
This test sets IsProcessing = true via reflection then immediately asserts it's still true. The watchdog is never started (Demo mode skips StartProcessingWatchdog), so this test proves nothing about timeout behavior. It needs to either (a) exercise the actual watchdog with a real timer, or (b) be renamed to reflect what it actually tests.

3. Revival path violates INV-15 (CopilotService.Organization.cs:1524) (NEW)
_sessions[workerName] = freshState uses direct index assignment instead of TryUpdate(key, freshState, deadState). Per INV-15 (documented in .claude/skills/processing-state-safety/SKILL.md:189-204), this allows a stale Task.Run from a concurrent reconnect to overwrite newer state. The sibling reconnect path at CopilotService.cs:~2860 correctly uses TryUpdate -- this revival path should match.

🟡 MODERATE -- 5 issues

4. File.ReadAllLines blocks UI thread (SessionSidebar.razor:2436)
GetBugReportDebugInfo() calls File.ReadAllLines(diagPath) synchronously. The event-diagnostics.log can grow large. Consider File.ReadAllLinesAsync or moving to a background task.

5. Hardcoded UserProfile path bypasses BaseDir (SessionSidebar.razor:2431, 2455)
Both diagPath and pendingPath use Environment.GetFolderPath(UserProfile) instead of CopilotService.BaseDir. This breaks test isolation (SetBaseDirForTesting) and could fail on platforms where UserProfile differs from BaseDir.

6. Organization.Sessions/Groups accessed without lock (SessionSidebar.razor:2402-2418)
Three accesses to Organization.Sessions and Organization.Groups in AppendMultiAgentDebugInfo are unprotected. Thread-safe accessors exist: GetSessionMetadataSnapshots() and GetGroupSnapshots().

7. File-scoped test assertions (SessionStabilityTests.cs:182-212)
Several tests search the entire file source with Assert.Contains() instead of using ExtractMethod() to scope to specific methods. This can produce false positives if patterns appear in comments or unrelated methods.

8. AllSessionCreationPaths_RegisterEventHandler uses file-wide count (LongRunningSessionSafetyTests.cs:232-246) (NEW)
Counts global occurrences of handler registration patterns. A missing handler on one creation path can be masked by extra handlers elsewhere in the file.

Recommended Action: ⚠️ Request Changes

The production code changes (IsOrphaned, handler-before-send, IsMultiAgentSession copy) are sound and confirmed by 5/5 models. The issues are in test quality (3 CRITICAL) and one INV-15 violation in the revival path. Please address at minimum the 3 CRITICALs before merge.

- Re-arm IsProcessing on TurnStartEvent after premature session.idle (SDK sends idle then continues 15+ tool rounds — ghost events lost) - Fix INV-15: revival path uses TryUpdate instead of index assignment - Fix thread safety: Organization.Sessions/Groups use snapshot methods - Fix hardcoded UserProfile path → CopilotService.BaseDir - Fix vacuous/tautological test assertions (PR review CRITICALs #1, #2) - Fix test method scoping to use Task<string> SendPromptAsync( signature - Widen handler proximity check to 60 lines, skip CopilotService.cs (retry patterns share single handler after catch blocks) All 2616 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds 8 tests covering the PR #375 bug where steering a busy orchestrator canceled the in-flight orchestration TCS: - EnqueueMessage_QueuesDrainAfterCompletion: verify queue works - EnqueueMessage_MultipleMessages_QueuedInOrder: FIFO ordering - DashboardDispatch_OrchestratorCheckBeforeSteer: structural ordering - DashboardDispatch_OrchestratorUsesEnqueueNotSteer: correct dispatch - DashboardDispatch_OrchestratorQueueHasDiagnosticLog: QUEUED_ORCH_BUSY tag - DashboardDispatch_NonOrchestratorStillSteered: steer path preserved - GetOrchestratorGroupId_WorkerInActiveGroup_ReturnsNull: workers steerable - LongRunningOrchestrator_UserFollowup_MustQueue: 15min scenario Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Re-arm IsProcessing on TurnStartEvent after premature session.idle (SDK sends idle then continues 15+ tool rounds — ghost events lost) - Fix INV-15: revival path uses TryUpdate instead of index assignment - Fix thread safety: Organization.Sessions/Groups use snapshot methods - Fix hardcoded UserProfile path → CopilotService.BaseDir - Fix vacuous/tautological test assertions (PR review CRITICALs #1, #2) - Fix test method scoping to use Task<string> SendPromptAsync( signature - Widen handler proximity check to 60 lines, skip CopilotService.cs (retry patterns share single handler after catch blocks) All 2616 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds 8 tests covering the PR #375 bug where steering a busy orchestrator canceled the in-flight orchestration TCS: - EnqueueMessage_QueuesDrainAfterCompletion: verify queue works - EnqueueMessage_MultipleMessages_QueuedInOrder: FIFO ordering - DashboardDispatch_OrchestratorCheckBeforeSteer: structural ordering - DashboardDispatch_OrchestratorUsesEnqueueNotSteer: correct dispatch - DashboardDispatch_OrchestratorQueueHasDiagnosticLog: QUEUED_ORCH_BUSY tag - DashboardDispatch_NonOrchestratorStillSteered: steer path preserved - GetOrchestratorGroupId_WorkerInActiveGroup_ReturnsNull: workers steerable - LongRunningOrchestrator_UserFollowup_MustQueue: 15min scenario Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Re-arm IsProcessing on TurnStartEvent after premature session.idle (SDK sends idle then continues 15+ tool rounds — ghost events lost) - Fix INV-15: revival path uses TryUpdate instead of index assignment - Fix thread safety: Organization.Sessions/Groups use snapshot methods - Fix hardcoded UserProfile path → CopilotService.BaseDir - Fix vacuous/tautological test assertions (PR review CRITICALs #1, #2) - Fix test method scoping to use Task<string> SendPromptAsync( signature - Widen handler proximity check to 60 lines, skip CopilotService.cs (retry patterns share single handler after catch blocks) All 2616 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds 8 tests covering the PR #375 bug where steering a busy orchestrator canceled the in-flight orchestration TCS: - EnqueueMessage_QueuesDrainAfterCompletion: verify queue works - EnqueueMessage_MultipleMessages_QueuedInOrder: FIFO ordering - DashboardDispatch_OrchestratorCheckBeforeSteer: structural ordering - DashboardDispatch_OrchestratorUsesEnqueueNotSteer: correct dispatch - DashboardDispatch_OrchestratorQueueHasDiagnosticLog: QUEUED_ORCH_BUSY tag - DashboardDispatch_NonOrchestratorStillSteered: steer path preserved - GetOrchestratorGroupId_WorkerInActiveGroup_ReturnsNull: workers steerable - LongRunningOrchestrator_UserFollowup_MustQueue: 15min scenario Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Multi-agent orchestration SKILL.md: Added PR 373 reconnect safety invariants (INV-O9 through INV-O13), session death/deadlock bug documentation, Fix with Copilot multi-agent awareness section, OrchestratorReflect monitoring guide, enhanced end-to-end checklist, and comprehensive test coverage gap analysis - Processing-state-safety SKILL.md: Added INV-14 through INV-17 covering IsOrphaned guards, TryUpdate concurrency, handler-before- publish ordering, and MCP server reload on reconnect - SessionSidebar.razor Fix flow: GetBugReportDebugInfo now includes multi-agent context (group name, mode, role, members, event diagnostics, PendingOrchestration state) via AppendMultiAgentDebugInfo. BuildCopilotPrompt adds multi-agent testing requirements when the session is in a multi-agent group - SessionStabilityTests.cs: 20 new tests covering IsOrphaned guards (5 tests), ForceCompleteProcessingAsync INV-1 compliance (3 tests), synthesis prompt mixed success/failure (2 tests), sibling re-resume safety (4 tests), MCP reload (2 tests), diagnostic log completeness, watchdog companion fields, and Fix prompt multi-agent enhancement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ety tests Root cause: ExecuteWorkerAsync revival path created a fresh CopilotSession without registering an event handler (.On(evt => HandleSessionEvent(...))). The CLI processed prompts and wrote 189+ events to events.jsonl, but HandleSessionEvent never fired — leaving the session permanently stuck. Fix (Organization.cs): - Set deadState.IsOrphaned = true before dispose (prevents stale callbacks) - Copy IsMultiAgentSession from dead state to fresh state - Register event handler on fresh session before storing in _sessions Safety tests (25 tests in LongRunningSessionSafetyTests.cs): - Timeout constants accommodate 20+ min workers and 30 min freshness - Revival path verified: event handler, IsOrphaned, IsMultiAgentSession - All session creation paths across 3 files register event handlers - Guard against mtime staleness detection reintroduction - Theory tests for various worker durations (3-45 min) and thinking pauses - Abort clears all processing state for long-running sessions Skill update (SKILL.md): - Long-Running Session Safety section with cardinal rule and safe/unsafe table - INV-O14: Never add mtime staleness detection to watchdog - Bug pattern: Dead event stream after worker revival Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Re-arm IsProcessing on TurnStartEvent after premature session.idle (SDK sends idle then continues 15+ tool rounds — ghost events lost) - Fix INV-15: revival path uses TryUpdate instead of index assignment - Fix thread safety: Organization.Sessions/Groups use snapshot methods - Fix hardcoded UserProfile path → CopilotService.BaseDir - Fix vacuous/tautological test assertions (PR review CRITICALs #1, #2) - Fix test method scoping to use Task<string> SendPromptAsync( signature - Widen handler proximity check to 60 lines, skip CopilotService.cs (retry patterns share single handler after catch blocks) All 2616 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When a message is sent to an orchestrator that's already processing, the steer path was canceling the in-flight orchestration TCS, aborting worker dispatch (TaskCanceledException). Now orchestrator sessions queue the message instead, which gets drained after the current orchestration completes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds 8 tests covering the PR #375 bug where steering a busy orchestrator canceled the in-flight orchestration TCS: - EnqueueMessage_QueuesDrainAfterCompletion: verify queue works - EnqueueMessage_MultipleMessages_QueuedInOrder: FIFO ordering - DashboardDispatch_OrchestratorCheckBeforeSteer: structural ordering - DashboardDispatch_OrchestratorUsesEnqueueNotSteer: correct dispatch - DashboardDispatch_OrchestratorQueueHasDiagnosticLog: QUEUED_ORCH_BUSY tag - DashboardDispatch_NonOrchestratorStillSteered: steer path preserved - GetOrchestratorGroupId_WorkerInActiveGroup_ReturnsNull: workers steerable - LongRunningOrchestrator_UserFollowup_MustQueue: 15min scenario Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…r feedback Fixes 3 moderate issues from PR Review Squad consensus: 1. SessionSidebar crash.log uses CopilotService.BaseDir instead of hardcoded UserProfile path (consistent with other file paths) 2. Replace File.ReadAllLines with ReadLastLines helper that streams without loading entire file — fixes both crash.log (10 lines) and event-diagnostics.log (500 lines tail) paths 3. Add user-visible system message when orchestrator queues a message ('📋 Orchestrator is busy...') so users know their message was received Also updates skill documentation: - multi-agent-orchestration SKILL.md: steering-orchestrator conflict bug, premature session.idle truncation bug - processing-state-safety SKILL.md: path #10 TurnStart re-arm Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

SDK bug #299: session.idle fires prematurely mid-turn, causing orchestrator to collect truncated worker content. Fix adds post-collection recovery: - WasPrematurelyIdled volatile flag set in EVT-REARM, cleared in SendPromptAsync - RecoverFromPrematureIdleIfNeededAsync: 5s detection poll → OnSessionComplete subscription → 120s wait → History re-collect → LoadHistoryFromDisk fallback - Only applies to IsMultiAgentSession workers (zero overhead for single sessions) - Fix mutation-before-commit: SessionId assigned after TryUpdate succeeds - Fix silent null guard: Assert.NotNull in LongRunningSessionSafetyTests - Fix File.ReadAllText in debug panel: use streaming FileStream (review finding) - 11 new structural regression tests for premature idle recovery Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ection, fix exception type and dispatchTime filter - Replace volatile bool WasPrematurelyIdled + busy-poll loop with ManualResetEventSlim PrematureIdleSignal: exits immediately when EVT-REARM fires (zero latency for normal completions); times out after 5s only when no premature idle occurs - Fix catch(InvalidOperationException) on List<T>.ToArray() — should be catch(Exception) since List<T>.ToArray() does not throw InvalidOperationException on concurrent modification - Add && m.Timestamp >= dispatchTime filter to disk fallback, matching in-memory History path - Rename LoadHistoryFromDisk to LoadHistoryFromDiskAsync with FileShare.ReadWrite and async ReadLineAsync; keep sync wrapper for backward compat - Update structural tests to reflect ManualResetEventSlim API Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The 5s WasPrematurelyIdled polling window was fundamentally insufficient — EVT-REARM can take 30-60s to fire after premature session.idle. Worker-3 in live testing was truncated (363 chars) because recovery never triggered. Changes: - Add IsEventsFileActive() helper — checks events.jsonl mtime freshness - Detection now uses two parallel signals: WasPrematurelyIdled flag OR events.jsonl freshness (<15s age = worker still active) - Recovery loops through repeated premature idle cycles (observed: 4x in a row for worker-3) instead of waiting for a single OnSessionComplete - After each completion round, checks events.jsonl staleness to decide if worker is truly done vs hitting premature idle again - Add PrematureIdleEventsFileFreshnessSeconds constant (15s) - 4 new structural regression tests for freshness check and loop pattern - Update search windows from 5000 to 8000 chars for expanded method body All 2649 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Sessions are now loaded as lightweight placeholders at startup (history from events.jsonl only, no SDK connection). The actual SDK ResumeSessionAsync call happens lazily when the user sends a message to a session (EnsureSessionConnectedAsync). Previously, RestorePreviousSessionsAsync called SDK ResumeSessionAsync for each of 41 sessions sequentially. Each call connects to the persistent Copilot server. With many sessions, this blocked the UI thread for minutes, showing only the BlazorWebView background (#0F0F22) which appeared as a 'blue screen'. Changes: - RestorePreviousSessionsAsync now creates placeholder SessionState objects with Session=null and history loaded from disk - Background restore wrapper (RestoreSessionsInBackgroundAsync) runs post-restore tasks off the UI thread - EnsureSessionConnectedAsync lazily connects to SDK on first message - SendPromptAsync checks for null Session and lazy-resumes before send - IsCodespaceSession helper distinguishes codespace vs lazy placeholders Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RestoreSessionsInBackgroundAsync was started as fire-and-forget on the UI thread. Its async continuations captured the UIKitSynchronizationContext, so they ran on the UI thread. Inside the restore loop, LoadHistoryFromDisk calls .GetAwaiter().GetResult() which blocks the UI thread. The async file I/O inside LoadHistoryFromDiskAsync then needs to post its continuation to the blocked UI thread → classic SyncContext deadlock. Fix: Wrap the fire-and-forget in Task.Run() so the entire restore runs on the ThreadPool where there is no SyncContext to capture. Also add ConfigureAwait(false) to async I/O calls as belt-and-suspenders. Verified: 41 sessions restore in ~9s on ThreadPool thread, UI renders immediately, all 2649 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

1. Add missing 'if (!completed)' guard in premature idle recovery loop (Organization.cs:1776) — without it, the recovery always broke on first iteration making the multi-round logic dead code. (Consensus: Opus + Sonnet + GPT-5.1) 2. Marshal ReconcileOrganization to UI thread via InvokeOnUI in RestoreSessionsInBackgroundAsync — Organization.Sessions is a plain List<T> that's not thread-safe, and restore now runs on ThreadPool. (Consensus: Opus + Sonnet + Gemini) 3. Use SnapshotSessionMetas() in IsCodespaceSession and EnsureSessionConnectedAsync instead of direct Organization.Sessions access — these can run from background threads. (Consensus: Sonnet + Gemini) 4. Move lazy resume (EnsureSessionConnectedAsync) inside SendingFlag atomic guard to prevent double-resume race on rapid sends. (Opus) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-16T05:14:13Z

PR #375 Review — Round 4 (post-fix `04d23a79`)

Tests: ✅ 2667/2667 passing (68 new tests since Round 1)

Previous Findings Status

Finding	Status
N1 🟡 mutation-before-commit (Org.cs:1517)	✅ FIXED
N2 🟡 silent null guard (LongRunningSessionSafetyTests.cs)	✅ FIXED
N3 🟡 5s poll tax in `RecoverFromPrematureIdleIfNeeded`	✅ FIXED — fast-path short-circuit added: checks `PrematureIdleSignal.IsSet` then `IsEventsFileActive()` before entering the 500ms polling loop. Most workers now exit instantly.
N4 🟡 disk fallback missing `dispatchTime` filter	⚠️ STILL PRESENT
Minor 🟢 synchronous `ReadToEnd()` in SessionSidebar.razor:2459	⚠️ STILL PRESENT (low risk, diagnostic path only)

Remaining Issue

N4 🟡 MODERATE: Disk fallback LoadHistoryFromDiskAsync lacks dispatchTime filter (2 sites)

CopilotService.Organization.cs:1627-1629 — dead-event-stream fallback picks LastOrDefault(m.Role == "assistant") without checking m.Timestamp >= dispatchTime, could return stale content from a prior orchestration round
CopilotService.Organization.cs:1832-1835 — recovery-path disk fallback has the same issue
The in-memory paths at lines 1581 and 1596 correctly filter by dispatchTime — the disk paths should match

Fix: Add && m.Timestamp >= dispatchTime (or equivalent) to both LastOrDefault queries, consistent with the in-memory fallback paths.

Verdict: ⚠️ Request Changes

One moderate finding remains (N4). The 5s poll tax (N3) is fully addressed by the fast-path pattern. Recommend adding the dispatchTime filter to the disk fallback paths for consistency with the in-memory paths, then this is ready to merge.

Add m.Timestamp >= dispatchTime to both LoadHistoryFromDiskAsync fallback queries, matching the in-memory paths. Without this filter, stale assistant messages from prior orchestration rounds could be picked up as worker results. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Both rename tests sent commands immediately after ConnectClientAsync without waiting for the initial session list to arrive. Under load, the rename message could race with the WebSocket handshake/initial state push, causing the server-side rename to fail silently. Fix: wait for client.Sessions to contain the pre-existing session before sending rename commands, matching the pattern used by other bridge integration tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-16T13:42:35Z

PR #375 Review — Round 5 (5-model consensus)

Tests: ✅ 2667 passed, 0 failed | CI: ⚠️ None configured

Previous Findings

#	Severity	Status	Finding
N4	🔴→✅	FIXED	Disk fallback `dispatchTime` filter — commit `3454da35` adds `m.Timestamp >= dispatchTime` to both paths (lines 1629, 1835)
F2	🟡	STILL PRESENT	Recovery timeout OCE discards `bestResponse` (5/5 models)
F3	🟡	STILL PRESENT	`catch (InvalidOperationException)` wrong for `List<T>.ToArray()` (3/5 models — 2 say it is correct)
F4	🟢	STILL PRESENT	`ManualResetEventSlim` not disposed (5/5 models)
F5	🟢	STILL PRESENT	Sync `ReadToEnd()` in diagnostic path (5/5 models)

New Findings

#	Severity	Location	Finding	Models
N5	🔴 CRITICAL	`Organization.cs:1482`	`DateTime.UtcNow` vs `DateTime.Now` mismatch — `dispatchTime = DateTime.UtcNow` but `ChatMessage.Timestamp` uses `DateTime.Now` (local). On UTC−N systems, all valid post-dispatch messages are filtered OUT (empty results). The resume path (line 2079) correctly converts via `ToLocalTime()`. Fix: `var dispatchTime = DateTime.Now;`	2/5
N6	🟡 MODERATE	`Organization.cs:~2113`	Resume disk fallback missing `dispatchTimeLocal` filter — `ResumeOrchestrationIfPendingAsync` disk path queries without timestamp filter, while the in-memory path at line 2095 correctly uses `dispatchTimeLocal`. Same class of bug as N4 but in the restart-resume path.	2/5

Detail on F2 (still blocking)

recoveryCts = CreateLinkedTokenSource(cancellationToken) with CancelAfter(120s). When inner timeout fires during Task.Delay(2000, recoveryCts.Token) (line 1802), OCE is thrown. The catch at line 1860 has when (cancellationToken.IsCancellationRequested) — this is FALSE (outer token still valid). OCE propagates to ExecuteWorkerAsync generic catch → WorkerResult(success=false), discarding accumulated bestResponse.

Fix: Wrap the Task.Delay in try { } catch (OCE) when (!cancellationToken.IsCancellationRequested) { break; } to cleanly exit the loop and return bestResponse.

Detail on N5 (new critical)

// Line 1482 — WRONG: UTC
var dispatchTime = DateTime.UtcNow;

// ChatMessage.cs lines 20,68-95 — LOCAL
new ChatMessage() : this("assistant", "", DateTime.Now) { }

Every m.Timestamp >= dispatchTime comparison in the worker dispatch path is broken for non-UTC timezones. The resume path already has the fix pattern at line 2079-2080.

Verdict: ⚠️ Request Changes

Blocking:

N5: Change dispatchTime to DateTime.Now (one-line fix at line 1482)
F2: Catch inner OCE to preserve bestResponse

Non-blocking (track for follow-up):

N6: Add dispatchTimeLocal filter to resume disk fallback
F3: Widen catch to Exception for defensive safety
F4: Dispose PrematureIdleSignal on session removal
F5: Pre-existing sync I/O in diagnostic block

…ilter N5 (CRITICAL): dispatchTime used DateTime.UtcNow but ChatMessage.Timestamp uses DateTime.Now. On non-UTC systems, all valid post-dispatch messages were filtered out. Changed to DateTime.Now to match ChatMessage convention. F2 (MODERATE): Recovery timeout OCE from recoveryCts propagated past the catch filter (which only catches outer cancellation), discarding accumulated bestResponse. Now catches inner OCE and breaks cleanly from the loop. N6 (MODERATE): Resume disk fallback in ResumeOrchestrationIfPendingAsync was missing the dispatchTimeLocal filter, same class of bug as N4. Added the filter to match the in-memory path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-16T13:59:16Z

PR #375 Review — Round 6 (5-model consensus)

Tests: ⚠️ 2666/2667 passed — 1 flaky failure (PurgeOldCaptures_KeepsOnlyMostRecent) passes in isolation; pre-existing timing issue, not in PR files.
CI: ⚠️ None configured.

Round 5 Findings Status

#	Severity	Status
N5	🔴→✅	FIXED — `dispatchTime = DateTime.Now` (line 1482) matches `ChatMessage.Timestamp`
F2	🟡→✅	FIXED — `try/catch (OCE) when (!ct.IsCancellationRequested) { break; }` correctly breaks to `bestResponse` return
N6	🟡→✅	FIXED — Resume disk fallback at line 2115 now has `&& m.Timestamp >= dispatchTimeLocal`
F4	🟢	STILL PRESENT — `ManualResetEventSlim PrematureIdleSignal` never disposed (minor, non-blocking)
F5	🟢	STILL PRESENT — Sync `ReadToEnd()` at `SessionSidebar.razor:2459` (minor, non-blocking)

New Findings (Round 6)

#	Severity	Location	Finding	Models
N7	🟡 MODERATE	`CopilotService.Persistence.cs:350`	Lazy-resume fallback uses `_client` instead of `GetClientForGroup(groupId)`	2/5
N8	🟢 MINOR	`Organization.cs:1861`	User-abort OCE catch returns `initialResponse` not `bestResponse` — `bestResponse` is out of scope in the outer catch (declared inside the `try` block at line 1756)	2/5

Detail on N7 (new, moderate)

EnsureSessionConnectedAsync (line 312, introduced in commit 7dd372e1) correctly uses GetClientForGroup(groupId).ResumeSessionAsync(...) at line 341 for the happy path. But the fallback when resume fails (corrupt/session file errors) at line 350 uses _client.CreateSessionAsync(...) — the local client — instead of GetClientForGroup(groupId). For codespace-backed multi-agent groups, the fresh session will be created on the wrong server.

Fix: await GetClientForGroup(groupId).CreateSessionAsync(...)

Detail on N8 (minor)

// Line 1756 (inside try block) — bestResponse not in scope at catch below
string? bestResponse = initialResponse;

// Line 1861 — user abort
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
    return initialResponse;  // ← should be bestResponse ?? initialResponse
}

Fix: hoist string? bestResponse = initialResponse; above the try block.

Verdict: ⚠️ Request Changes (one targeted fix)

Blocking: N7 — one-line fix: change _client.CreateSessionAsync → GetClientForGroup(groupId).CreateSessionAsync at CopilotService.Persistence.cs:350.

Non-blocking (track for follow-up): N8 (minor scope issue), F4 (MRE disposal), F5 (sync read in diagnostic).

Co-reviewed by: Copilot 223556219+Copilot@users.noreply.github.com

…CE scope N7 (MODERATE): Lazy-resume fallback used _client.CreateSessionAsync instead of GetClientForGroup(groupId), which would route to the wrong server for codespace-backed sessions. Changed to GetClientForGroup(groupId) to match the resume path 9 lines above. N8 (LOW): Outer OCE catch on user abort returned initialResponse while bestResponse (potentially longer) was trapped inside the try scope. Hoisted bestResponse above the try block so the abort path preserves any content collected during recovery rounds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-16T14:15:08Z

PR #375 — Round 7: Architectural & SDK Workaround Assessment

Tests: ✅ 2667/2667 pass | Round 6 findings: N7 ✅ FIXED, N8 ✅ FIXED

SDK Workaround Classification

#	Mechanism	Class	SDK Gap	Risk
1	Premature Idle Detection (`RecoverFromPrematureIdleIfNeededAsync`)	🟡 WORKAROUND	SDK fires `session.idle` before agent truly completes (observed: 4x premature idles in a row)	HIGH — polling `events.jsonl` freshness is fragile; if SDK changes file format or write timing, recovery silently breaks
2	Dead Event Stream (revival via `CreateSessionAsync` after `DisposeAsync`)	🔴 HACK	SDK stops firing `HandleSessionEvent` callbacks after session disposal+recreation on same session ID	HIGH — `LoadHistoryFromDiskAsync` scrapes SDK internal `events.jsonl`; any format change breaks it silently
3	Multi-round Recovery Loop (`while` + `OnSessionComplete` + `Task.Delay(2000)` + `IsEventsFileActive`)	🟡 WORKAROUND	No reliable "agent turn truly complete" signal from SDK	MEDIUM — correct pattern given the constraint, but 2s poll intervals add latency; event-driven would be better
4	Lazy Session Resume (`EnsureSessionConnectedAsync`)	🟢 PROPER	N/A — legitimate optimization	LOW — clean deferred-connection pattern, no SDK dependency
5	`DateTime.Now` convention	🟡 WORKAROUND	SDK `ChatMessage` uses `DateTime.Now` (local) throughout; PolyPilot must match	LOW — fragile if SDK switches to UTC, but low blast radius
6	Processing Watchdog (30s/120s/600s timeouts)	🟡 WORKAROUND	SDK does not guarantee `session.idle` delivery ("zero-idle sessions" are a known bug)	HIGH — watchdog is the last line of defense; if timeout values are wrong, sessions hang forever or get killed prematurely
7	`events.jsonl` file scraping (`LoadHistoryFromDiskAsync`, `IsEventsFileActive`)	🔴 HACK	SDK provides no API for "what happened in this session since timestamp X"	HIGH — reads SDK-internal file format; could break on any SDK update without warning

Summary: 1 proper, 4 workarounds, 2 hacks

The multi-agent orchestration layer is functionally robust — it has survived 7 rounds of multi-model review and handles real failure modes (premature idle, dead event streams, app restarts mid-turn). But it sits on top of two fragile foundations:

events.jsonl scraping (mechanisms 2 & 7) — This is the biggest risk. PolyPilot reads the SDK internal event log as a recovery mechanism. This file format is not part of any public API contract. A single SDK update could change the JSON structure, timestamp format, or file location and silently break content recovery.
Premature idle detection (mechanisms 1 & 3) — The SDK fires session.idle before the agent is truly done, forcing PolyPilot to build its own "is the agent really done?" logic via file freshness + signal + multi-round polling. This is the root cause of most complexity in this PR.

Recommended Upstream SDK Issues (Priority Order)

🔴 P0: Reliable turn-complete signal — SDK should guarantee that session.idle only fires when the agent turn is truly complete (no more tool calls, no more delta events). This would eliminate mechanisms 1, 3, and 6.
🔴 P0: Session history API — SDK should expose GetHistorySince(DateTime) or equivalent, eliminating the need to scrape events.jsonl. This would eliminate mechanisms 2 and 7.
🟡 P1: Event callback survival across dispose/recreate — If a session is disposed and recreated (revival), the SDK should re-register event callbacks. This would eliminate the dead event stream hack.
🟡 P1: UTC timestamps — SDK should use DateTime.UtcNow consistently in ChatMessage and all internal timestamps. Minor but would prevent timezone-related filter bugs.
🟢 P2: session.idle delivery guarantee — Ensure every turn emits exactly one session.idle. The "zero-idle sessions" bug forces the watchdog to exist as a safety net.

Is Multi-Agent Bullet-Proof?

For production use: Yes, with caveats.

✅ The recovery mechanisms work — premature idle, dead streams, app restarts, and user aborts are all handled with multi-tier fallbacks.
✅ 2667 tests pass including multi-agent regression tests.
✅ 7 rounds of 5-model consensus review have hardened the code significantly.

⚠️ Systemic risk: Two mechanisms (events.jsonl scraping and premature idle detection) are tightly coupled to SDK implementation details. An SDK update could silently degrade multi-agent reliability without any test failures (because the tests mock the SDK layer). Recommend:

Pin the SDK version and test against each update before upgrading
Add integration tests that exercise the real SDK (not mocks) for the critical paths
File the P0 upstream issues to get proper API support

Verdict: ✅ Approve

All blocking findings from Rounds 1-6 are fixed. The code is as robust as it can be given current SDK limitations. The two hacks should be tracked as tech debt with upstream issue references.

PureWeen · 2026-03-16T14:17:45Z

PR #375 Round 7 — Architecture & SDK Workaround Assessment

Tests: ✅ 2667 passed, 0 failed | Round 6 fixes: ✅ N7 (wrong client) and ✅ N8 (OCE scope) both verified fixed.

SDK Workaround Audit

1. Premature Idle Detection (`RecoverFromPrematureIdleIfNeededAsync`) — 🟡 WORKAROUND

The SDK fires session.idle before the agent is truly done, then continues. PolyPilot detects this via PrematureIdleSignal (MRE set by TurnStartEvent arriving after idle) + events.jsonl freshness polling. The while loop handles 4+ consecutive premature idles.

Risk: Medium. If SDK fixes premature idle, this becomes harmless dead code. If SDK changes the TurnStart-after-idle pattern, EVT-REARM signal stops firing.
Upstream fix needed: SDK should guarantee session.idle fires only once all content is committed.

2. Dead Event Stream Recovery (tier-3 fallback: `LoadHistoryFromDiskAsync`) — 🟡 WORKAROUND

After DisposeAsync() + CreateSessionAsync() (session revival), the SDK stops routing events to the original HandleSessionEvent callback. PolyPilot falls back to scraping events.jsonl.

Risk: High. events.jsonl is an internal SDK file. Format, location, or schema changes silently break all recovery.
Upstream fix needed: SDK should replay buffered events on re-registration, or expose GetHistory() API.

3. Multi-round Recovery Loop (while + `OnSessionComplete` + `IsEventsFileActive`) — 🟡 WORKAROUND

Structurally sound pattern. Complexity arises from SDK not providing a reliable "all events delivered" signal. OnSessionComplete is PolyPilot-internal so risk is low on that side; the disk fallback carries the high-risk SDK dependency.

Upstream fix needed: SDK session.idle should be final and reliable; or provide WaitForCompletion().

4. Lazy Session Resume (`EnsureSessionConnectedAsync`) — 🟢 PROPER

Deferring ResumeSessionAsync until first use is a standard lazy-initialization pattern. The N7 fix (using GetClientForGroup(groupId)) makes it codespace-correct. No upstream issue needed.

5. `DateTime.Now` vs `DateTime.UtcNow` — 🔴 DESIGN SMELL (internal)

ChatMessage uses DateTime.Now throughout. The N5 fix (dispatchTime = DateTime.Now) papers over the symptom. Root fix: standardize all ChatMessage timestamps on UTC. This is PolyPilot-internal tech debt, not an SDK issue.

6. Processing Watchdog (30s/120s/600s + events.jsonl freshness) — 🟡 WORKAROUND

Backstop against SDK failing to deliver session.idle. The 3-tier logic (tool active vs idle vs resumed) is well-reasoned. The Case B freshness check is clever but fragile.

Risk: Medium. Relies on SDK always writing AssistantTurnEndEvent before any stuck state.
Upstream fix needed: SDK should guarantee session.idle delivery; or provide heartbeat API.

7. `events.jsonl` Scraping (`LoadHistoryFromDiskAsync`, `IsEventsFileActive`) — 🔴 FRAGILE

Five separate callers read the SDK's internal events.jsonl file (content recovery x3, watchdog liveness, status display). This file is not part of the public SDK API.

Risk: Critical. A CLI update could rename the file, change format, or stop writing it. All five callers silently degrade to empty results with no alarm.
Upstream fix needed (highest priority): SDK needs GetSessionHistory() and IsSessionActive APIs. These two additions eliminate all events.jsonl scraping.

Summary

Classification	Count	Mechanisms
🟢 Proper	1	Lazy session resume
🟡 Workaround	4	Premature idle detection, dead event stream, recovery loop, watchdog
🔴 Fragile	2	`events.jsonl` scraping, `DateTime.Now` inconsistency

Prioritized Upstream Issues to File

[P0] session.idle reliability — Root cause of premature idle recovery, watchdog, and TurnEnd fallback. Without this, all those workarounds are permanently necessary.
[P1] GetSessionHistory() API — Eliminates fragile events.jsonl scraping across 5 callers.
[P1] Event replay after session revival — After DisposeAsync() + CreateSessionAsync(), buffered events should be replayable. Eliminates dead event stream recovery.
[P2] IsSessionActive / heartbeat API — Replaces events.jsonl mtime polling in watchdog Case A.
[P3] Standardize UTC timestamps — Internal PolyPilot tech debt, not strictly an SDK issue.

Is Multi-Agent Bullet-Proof?

Honest answer: Production-ready but not bullet-proof.

The architecture is defensive with multiple recovery layers for each known failure. What makes it fragile is a single point of dependency on events.jsonl — if that file moves or changes format in a CLI update, the fallback chain silently returns empty worker results. There is no canary test or alarm for this regression path.

Recommended hardening before calling it bullet-proof:

File P0+P1 upstream SDK issues and get commitments
Add a canary test that validates LoadHistoryFromDiskAsync against a real events.jsonl fixture
Add observability: when events.jsonl fallback fires, write a warning to crash.log/telemetry

PR verdict: ✅ Approve — All round findings fixed (N5, F2, N6, N7, N8). Remaining items (F4 MRE disposal, F5 sync ReadToEnd) are non-blocking. The workarounds are as good as they can be absent SDK changes.

…terrupted sessions Fix 1: ManualResetEventSlim resource leak - Add DisposePrematureIdleSignal helper called in all session removal/replacement paths: CreateSessionAsync error paths, SyncRemoteSessions, codespace reconnect, connection-error reconnect, worker revival, and provider cleanup. - Guard PrematureIdleSignal reads with ObjectDisposedException catch. Fix 2: Eager resume for mid-turn sessions - Sessions with LastPrompt set (interrupted mid-turn) are now eagerly resumed via EnsureSessionConnectedAsync after all placeholders load, in a background Task.Run. - Add per-session SemaphoreSlim in _sessionConnectLocks to prevent concurrent resume races between eager restore and first user SendPromptAsync. - 2 new structural tests verify the eager resume wiring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-03-16T14:51:14Z

PR #375 Round 8 Re-Review

Tests: ⚠️ 2669 run, 1 failure (TurnEndFallbackTests.FallbackTimer_NotCancelled_FiresAfterDelay) — passes in isolation, pre-existing flaky, not introduced by this commit.

CI: ⚠️ pre-existing (not PR-specific)

Previous Findings Status

#	Severity	Status
F4	🟢 MINOR	✅ FIXED — `DisposePrematureIdleSignal` helper added at ~12 session removal/replacement sites; `ObjectDisposedException` guards added around `PrematureIdleSignal` reads
F5	🟢 MINOR	Not addressed — sync `ReadToEnd()` in `SessionSidebar.razor:2459` untouched (pre-existing, non-blocking)

New Findings (5-model consensus)

🟢 MINOR — _sessionConnectLocks entries never cleaned up (5/5 models)
CopilotService.cs:61 + Persistence.cs:314

_sessionConnectLocks.GetOrAdd(sessionName, _ => new SemaphoreSlim(1,1)) is called on every EnsureSessionConnectedAsync invocation but nothing ever removes entries. RemoveSessionAsync at line 3700 already has the correct pattern for _modelSwitchLocks:

if (_modelSwitchLocks.TryRemove(name, out var sem)) sem.Dispose();

No equivalent exists for _sessionConnectLocks. Practical impact is low (one ~96-byte SemaphoreSlim per session name ever used), but it's inconsistent with the established pattern.

Fix: Add if (_sessionConnectLocks.TryRemove(name, out var cl)) cl.Dispose(); to RemoveSessionAsync (line ~3749), and add _sessionConnectLocks.Clear(); alongside _sessions.Clear() in ReconnectAsync (line 883) and DisposeAsync (line 3804).

🟢 MINOR — ReconnectAsync and DisposeAsync clear loops missing DisposePrematureIdleSignal (2/5 models)
CopilotService.cs:876-883 and 3796-3804

Both foreach (var state in _sessions.Values) loops cancel watchdogs and dispose SDK sessions but don't call DisposePrematureIdleSignal(state). MREs are orphaned to the GC finalizer. All individually-removed paths are correct; only the bulk-clear paths missed.

Fix: Add DisposePrematureIdleSignal(state); inside both loops, before _sessions.Clear().

Confirmed Non-Issues

finally { connectLock.Release() } safety: WaitAsync is before the try block at Persistence.cs:315 — if the token is cancelled before acquiring, Release() is never called. Correct pattern. ✅
MRE use-after-dispose race: IsPrematureIdleSignalSet() and WaitForPrematureIdleSignal() local functions catch ObjectDisposedException. ✅
Stale lazyState in eager resume: Serialized by the per-session semaphore; state.Session != null guard prevents double-resume. ✅

Verdict

✅ Approve — All round findings except F5 (pre-existing, non-blocking) are resolved. The two new findings are minor hygiene gaps with negligible runtime impact. The _sessionConnectLocks fix is a one-liner that mirrors the already-correct _modelSwitchLocks pattern.

Co-reviewed by: Copilot 223556219+Copilot@users.noreply.github.com

…and history recovery (#391) ## Summary Multiple bug fixes discovered after PR #375 merge, addressing worker failures, session persistence, server health detection, and conversation history loss. ## Changes ### 1. Never push to main rule - Added as first Git Workflow rule in `.github/copilot-instructions.md` ### 2. Permission recovery killing multi-agent workers - `TryRecoverPermissionAsync` calls `TrySetCanceled()` on `ResponseCompletion` TCS, propagating as `TaskCanceledException` to orchestrator workers - **Fix**: Retry loop in `SendPromptAndWaitAsync` detects permission-recovery cancellation and re-awaits new state's TCS (up to 3 retries) ### 3. Session ID not persisted after reconnect - When SDK returns different session ID on resume, `state.Info.SessionId` was updated in memory but `FlushSaveActiveSessionsToDisk()` never called - **Fix**: Added flush after every SessionId update in 4 reconnect sites ### 4. Server health notice for posix_spawn failures - Bundled CLI native modules can be deleted by unknown processes, causing `posix_spawn ENOENT` - **Fix**: `ServerHealthNotice` banner on Dashboard with Restart Server button and full server restart cycle ### 5. Session history loss from dead event streams - After server-side idle cleanup + re-resume, SDK event file writer breaks — events flow in-memory but never persist to events.jsonl - **Fix**: `LoadBestHistoryAsync()` compares latest user message timestamps from events.jsonl and chat_history.db, picks whichever is more recent ### 6. PR review fixes - **CRITICAL**: `RestartServerAsync` wrapped in `_clientReconnectLock` (race condition fix) - **HIGH**: `DisposePrematureIdleSignal` added in restart disposal loop (MRE leak) - **HIGH**: History recency threshold reduced from 1 minute to 5 seconds - **MINOR**: Dashboard restores `ServerHealthNotice` on restart failure ## Related Issues - #392 — posix_spawn upstream bug - #395 — Spinner gap during premature idle recovery ## Testing All 2669 tests pass. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- multi-agent-orchestration SKILL.md: 4→5 phase lifecycle, new IDLE-DEFER section, fix INV-O3 ordering, new INV-O15, mark premature idle bug as FIXED - processing-state-safety SKILL.md: 10→16 paths table with tags, new INV-18 for BackgroundTasks, IDLE-DEFER in stuck session table, note EVT-REARM is now secondary defense, add PRs #373/#375/#399 to regression history - copilot-instructions.md: update SDK Event Flow step 9 for BackgroundTasks check, add [IDLE-DEFER] diagnostic tag, fix stale path count (8→15+), add BackgroundTasksIdleTests to test list Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n#375) - Multi-agent orchestration SKILL.md: Added PR 373 reconnect safety invariants (INV-O9 through INV-O13), session death/deadlock bug documentation, Fix with Copilot multi-agent awareness section, OrchestratorReflect monitoring guide, enhanced end-to-end checklist, and comprehensive test coverage gap analysis - Processing-state-safety SKILL.md: Added INV-14 through INV-17 covering IsOrphaned guards, TryUpdate concurrency, handler-before- publish ordering, and MCP server reload on reconnect - SessionSidebar.razor Fix flow: GetBugReportDebugInfo now includes multi-agent context (group name, mode, role, members, event diagnostics, PendingOrchestration state) via AppendMultiAgentDebugInfo. BuildCopilotPrompt adds multi-agent testing requirements when the session is in a multi-agent group - SessionStabilityTests.cs: 20 new tests covering IsOrphaned guards (5 tests), ForceCompleteProcessingAsync INV-1 compliance (3 tests), synthesis prompt mixed success/failure (2 tests), sibling re-resume safety (4 tests), MCP reload (2 tests), diagnostic log completeness, watchdog companion fields, and Fix prompt multi-agent enhancement --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…and history recovery (PureWeen#391) ## Summary Multiple bug fixes discovered after PR PureWeen#375 merge, addressing worker failures, session persistence, server health detection, and conversation history loss. ## Changes ### 1. Never push to main rule - Added as first Git Workflow rule in `.github/copilot-instructions.md` ### 2. Permission recovery killing multi-agent workers - `TryRecoverPermissionAsync` calls `TrySetCanceled()` on `ResponseCompletion` TCS, propagating as `TaskCanceledException` to orchestrator workers - **Fix**: Retry loop in `SendPromptAndWaitAsync` detects permission-recovery cancellation and re-awaits new state's TCS (up to 3 retries) ### 3. Session ID not persisted after reconnect - When SDK returns different session ID on resume, `state.Info.SessionId` was updated in memory but `FlushSaveActiveSessionsToDisk()` never called - **Fix**: Added flush after every SessionId update in 4 reconnect sites ### 4. Server health notice for posix_spawn failures - Bundled CLI native modules can be deleted by unknown processes, causing `posix_spawn ENOENT` - **Fix**: `ServerHealthNotice` banner on Dashboard with Restart Server button and full server restart cycle ### 5. Session history loss from dead event streams - After server-side idle cleanup + re-resume, SDK event file writer breaks — events flow in-memory but never persist to events.jsonl - **Fix**: `LoadBestHistoryAsync()` compares latest user message timestamps from events.jsonl and chat_history.db, picks whichever is more recent ### 6. PR review fixes - **CRITICAL**: `RestartServerAsync` wrapped in `_clientReconnectLock` (race condition fix) - **HIGH**: `DisposePrematureIdleSignal` added in restart disposal loop (MRE leak) - **HIGH**: History recency threshold reduced from 1 minute to 5 seconds - **MINOR**: Dashboard restores `ServerHealthNotice` on restart failure ## Related Issues - PureWeen#392 — posix_spawn upstream bug - PureWeen#395 — Spinner gap during premature idle recovery ## Testing All 2669 tests pass. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen force-pushed the multi-agent-skill-hardening branch from e42a357 to 8aae337 Compare March 15, 2026 18:48

PureWeen force-pushed the multi-agent-skill-hardening branch from 39499b9 to 5293996 Compare March 15, 2026 23:12

PureWeen force-pushed the multi-agent-skill-hardening branch from 5293996 to cb8caaf Compare March 15, 2026 23:42

PureWeen and others added 11 commits March 15, 2026 23:40

PureWeen force-pushed the multi-agent-skill-hardening branch from 61b2d9c to 52e1a3d Compare March 16, 2026 04:42

PureWeen and others added 2 commits March 16, 2026 08:31

PureWeen merged commit 8e11ee1 into main Mar 16, 2026

PureWeen deleted the multi-agent-skill-hardening branch March 16, 2026 14:51

PureWeen mentioned this pull request Mar 16, 2026

Fix multi-agent worker failures, session persistence, server health, and history recovery #391

Merged

github-actions Bot mentioned this pull request Apr 23, 2026

fix: standardize timestamps to DateTimeOffset UTC (fixes #386) #751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strengthen multi-agent skills, Fix flow, and stability tests#375

Strengthen multi-agent skills, Fix flow, and stability tests#375
PureWeen merged 17 commits intomainfrom
multi-agent-skill-hardening

PureWeen commented Mar 15, 2026

Uh oh!

PureWeen commented Mar 15, 2026

Uh oh!

PureWeen commented Mar 16, 2026

Uh oh!

PureWeen commented Mar 16, 2026

Uh oh!

PureWeen commented Mar 16, 2026

Uh oh!

PureWeen commented Mar 16, 2026

Uh oh!

PureWeen commented Mar 16, 2026

Uh oh!

PureWeen commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Mar 15, 2026

Uh oh!

PureWeen commented Mar 15, 2026

Multi-Model Consensus Review -- Round 2 (5-model × 5-agent)

CI Status: ⚠️ No CI configured

Prior Review Comments: None outstanding

🔴 CRITICAL -- 3 issues (2 new this round)

🟡 MODERATE -- 5 issues

Recommended Action: ⚠️ Request Changes

Uh oh!

PureWeen commented Mar 16, 2026

PR #375 Review — Round 4 (post-fix 04d23a79)

Previous Findings Status

Remaining Issue

Verdict: ⚠️ Request Changes

Uh oh!

PureWeen commented Mar 16, 2026

PR #375 Review — Round 5 (5-model consensus)

Previous Findings

New Findings

Detail on F2 (still blocking)

Detail on N5 (new critical)

Verdict: ⚠️ Request Changes

Uh oh!

PureWeen commented Mar 16, 2026

PR #375 Review — Round 6 (5-model consensus)

Round 5 Findings Status

New Findings (Round 6)

Detail on N7 (new, moderate)

Detail on N8 (minor)

Verdict: ⚠️ Request Changes (one targeted fix)

Uh oh!

PureWeen commented Mar 16, 2026

PR #375 — Round 7: Architectural & SDK Workaround Assessment

SDK Workaround Classification

Summary: 1 proper, 4 workarounds, 2 hacks

Recommended Upstream SDK Issues (Priority Order)

Is Multi-Agent Bullet-Proof?

Verdict: ✅ Approve

Uh oh!

PureWeen commented Mar 16, 2026

PR #375 Round 7 — Architecture & SDK Workaround Assessment

SDK Workaround Audit

1. Premature Idle Detection (RecoverFromPrematureIdleIfNeededAsync) — 🟡 WORKAROUND

2. Dead Event Stream Recovery (tier-3 fallback: LoadHistoryFromDiskAsync) — 🟡 WORKAROUND

3. Multi-round Recovery Loop (while + OnSessionComplete + IsEventsFileActive) — 🟡 WORKAROUND

4. Lazy Session Resume (EnsureSessionConnectedAsync) — 🟢 PROPER

5. DateTime.Now vs DateTime.UtcNow — 🔴 DESIGN SMELL (internal)

6. Processing Watchdog (30s/120s/600s + events.jsonl freshness) — 🟡 WORKAROUND

7. events.jsonl Scraping (LoadHistoryFromDiskAsync, IsEventsFileActive) — 🔴 FRAGILE

Summary

Prioritized Upstream Issues to File

Is Multi-Agent Bullet-Proof?

Uh oh!

PureWeen commented Mar 16, 2026

PR #375 Round 8 Re-Review

Previous Findings Status

New Findings (5-model consensus)

Confirmed Non-Issues

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PR #375 Review — Round 4 (post-fix `04d23a79`)

1. Premature Idle Detection (`RecoverFromPrematureIdleIfNeededAsync`) — 🟡 WORKAROUND

2. Dead Event Stream Recovery (tier-3 fallback: `LoadHistoryFromDiskAsync`) — 🟡 WORKAROUND

3. Multi-round Recovery Loop (while + `OnSessionComplete` + `IsEventsFileActive`) — 🟡 WORKAROUND

4. Lazy Session Resume (`EnsureSessionConnectedAsync`) — 🟢 PROPER

5. `DateTime.Now` vs `DateTime.UtcNow` — 🔴 DESIGN SMELL (internal)

7. `events.jsonl` Scraping (`LoadHistoryFromDiskAsync`, `IsEventsFileActive`) — 🔴 FRAGILE