fix: make 'Service not initialized' retryable in worker dispatch#422
fix: make 'Service not initialized' retryable in worker dispatch#422
Conversation
When a concurrent worker's connection error sets IsInitialized=false, subsequent
workers in the same orchestration run immediately throw InvalidOperationException
('Service not initialized') which was not treated as retryable, causing all-workers-
failed results at 0.0s elapsed (the 'PR Review Squad all failed' pattern).
Changes:
- Add IsInitializationError() helper to CopilotService.Utilities.cs: matches
InvalidOperationException with 'not initialized' in the message
- Extend the ExecuteWorkerAsync retry gate to include IsInitializationError alongside
IsConnectionError (line ~2250 in Organization.cs)
- Inside the retry catch, attempt lazy InitializeAsync() before the 2s delay so the
next attempt finds the client ready
- Add 10 new tests in InitializationErrorDetectionTests covering: true/false detection,
wrong exception type, case-insensitivity, and structural verification that the retry
gate includes both checks
All 2911 tests pass. Build clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 PR Review Squad — Round 2 (Full Review)CI Status
Build & Tests
PR SummaryWhen a concurrent worker's connection error sets Multi-Agent Orchestration Skill ReviewChecked against invariants: The skill documents that worker failures are collected (not fatal to orchestration) — the retry fix is additive and compatible with all documented invariants (INV-O1 through INV-O13). Specifically:
Skill documentation gap (not a blocker): The skill's Processing State Safety ReviewChecked against invariants: The lazy
One pre-existing observation: Code Analysis
internal static bool IsInitializationError(Exception ex) =>
ex is InvalidOperationException && ex.Message.Contains("not initialized", StringComparison.OrdinalIgnoreCase);
Retry gate: catch (Exception ex) when (attempt < maxRetries && (IsConnectionError(ex) || IsInitializationError(ex)))
Potential concern: If Test Coverage Assessment8 new unit tests in
Missing: no behavioral integration test that simulates the cascade failure scenario (one worker sets Verdict: ✅ ApproveFix is correct, minimal, and well-tested. The |
…eWeen#422) ## Problem When a concurrent worker's connection error sets `IsInitialized=false`, all subsequent workers in the same orchestration run immediately throw `InvalidOperationException("Service not initialized")`. This was **not treated as retryable** in `ExecuteWorkerAsync`, causing the entire worker wave to fail at 0.0s elapsed — the "all workers failed with 'Service not initialized'" pattern seen during PR PureWeen#421 review. ## Root Cause `ExecuteWorkerAsync`'s retry gate only catches `IsConnectionError(ex)`: ```csharp catch (Exception ex) when (attempt < maxRetries && IsConnectionError(ex)) ``` `InvalidOperationException` is not an `IOException`/`SocketException`, so it falls straight through to the final catch and returns a failed `WorkerResult`. ## Fix 1. **`CopilotService.Utilities.cs`**: Add `IsInitializationError()` — matches `InvalidOperationException` with "not initialized" in the message. 2. **`CopilotService.Organization.cs`** (~line 2250): Extend the retry gate: ```csharp catch (Exception ex) when (attempt < maxRetries && (IsConnectionError(ex) || IsInitializationError(ex))) ``` And inside the catch, attempt lazy `InitializeAsync()` before the 2s delay so the next attempt finds the client ready. ## Tests 10 new tests in `InitializationErrorDetectionTests` covering: - True/false detection for `InvalidOperationException` variants - Case-insensitivity - Wrong exception type returns false - Structural verification that the retry gate includes both checks **All 2911 tests pass. Build clean.** Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
When a concurrent worker's connection error sets
IsInitialized=false, all subsequent workers in the same orchestration run immediately throwInvalidOperationException("Service not initialized"). This was not treated as retryable inExecuteWorkerAsync, causing the entire worker wave to fail at 0.0s elapsed — the "all workers failed with 'Service not initialized'" pattern seen during PR #421 review.Root Cause
ExecuteWorkerAsync's retry gate only catchesIsConnectionError(ex):InvalidOperationExceptionis not anIOException/SocketException, so it falls straight through to the final catch and returns a failedWorkerResult.Fix
CopilotService.Utilities.cs: AddIsInitializationError()— matchesInvalidOperationExceptionwith "not initialized" in the message.CopilotService.Organization.cs(~line 2250): Extend the retry gate:And inside the catch, attempt lazy
InitializeAsync()before the 2s delay so the next attempt finds the client ready.Tests
10 new tests in
InitializationErrorDetectionTestscovering:InvalidOperationExceptionvariantsAll 2911 tests pass. Build clean.