feat: perpetual session analyzer for automated reliability monitoring#621
feat: perpetual session analyzer for automated reliability monitoring#621
Conversation
Adds a background service that maintains a dedicated copilot CLI session to perpetually analyze running PolyPilot sessions for reliability issues. Key components: - SessionAnalyzerService: manages a hidden 'PolyPilot Monitor' session - Periodically collects diagnostics (event-diagnostics.log, crash.log, active session states, server health) - Sends analysis prompts to the monitor session in autopilot mode - The session can identify stuck sessions, watchdog kills, error patterns, premature completions, dead connections, and resource leaks - When code bugs are found, the session creates branches and opens PRs Settings: - EnableSessionAnalyzer (default: false) - opt-in toggle - SessionAnalyzerIntervalMinutes (default: 10) - analysis frequency Tests: 8 new tests covering diagnostics collection, prompt building, constants, lifecycle, and dispose safety. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Expert Code Review — PR #621feat: perpetual session analyzer for automated reliability monitoring CI Status:
|
| # | Original Finding | Verdict | Notes |
|---|---|---|---|
| 1 | 🔴 Prompt injection into autopilot session | PR-creation instruction removed; prompt now says "Do NOT autonomously create branches or PRs — report only". But agentMode: "autopilot" still grants full tool execution with auto-approve. See remaining issue below. |
|
| 2 | 🔴 Stop/Dispose broken — cannot restart | Local cleanup improved (await loop, null fields, IAsyncDisposable). But session never removed from CopilotService._sessions — restart permanently broken. See remaining issue below. | |
| 3 | 🟡 Invalid model slug claude-sonnet-4-5 |
✅ FIXED | Now "claude-sonnet-4.5" |
| 4 | �� Unvalidated interval | Lower bound clamped to 1 minute. No upper bound — see minor note below. | |
| 5 | 🟡 Missing [Collection("BaseDir")] |
✅ FIXED | Attribute added, finally blocks restore, TestSetup.Initialize() registers |
| 6 | 🟡 Test coverage gaps | 14 tests now (up from 8): TailFile, DisposeAsync, interval clamping, report-only prompt, null guard. But Start→Stop→Start lifecycle still untested — exactly the path where the critical session-leak bug lives. | |
| 7 | 🟢 Torn reads in CollectSessionStates | ✅ FIXED | .ToList() snapshot added |
| 8 | 🟢 Thread safety of AnalysisCount/LastAnalysisAt | ✅ FIXED | Interlocked operations throughout |
Remaining Issues
🔴 CRITICAL: StopAsync/Stop never remove session from CopilotService — restart permanently broken
File: SessionAnalyzerService.cs — StopAsync(), Stop()
Flagged by: 3/3 reviewers
Neither StopAsync(), Stop(), nor Dispose/DisposeAsync call _copilotService.CloseSessionAsync(_analyzerSessionName) before nulling the field. The "PolyPilot Monitor" session remains in CopilotService._sessions forever.
The failure sequence:
- User enables analyzer →
StartAsync→CreateSessionAsync("PolyPilot Monitor")succeeds - User disables analyzer →
StopAsync→ nulls_analyzerSessionName, but never removes "PolyPilot Monitor" from_sessions - User re-enables analyzer →
StartAsync→CreateSessionAsync("PolyPilot Monitor")→ throwsInvalidOperationException("Session 'PolyPilot Monitor' already exists.") - The catch block logs the error and returns — analyzer is permanently dead until app restart
Evidence: CopilotService.cs line ~2712: if (_sessions.ContainsKey(name)) throw new InvalidOperationException(...). CloseSessionAsync (line ~5025) is the only path that calls _sessions.TryRemove.
Suggested fix: StopAsync must call await _copilotService.CloseSessionAsync(_analyzerSessionName) before nulling the field. Stop (sync) can fire-and-forget or skip session cleanup (with a warning log).
🟡 MODERATE: agentMode: "autopilot" still grants full tool execution despite text guardrail
File: SessionAnalyzerService.cs — RunSingleAnalysisAsync()
Flagged by: 3/3 reviewers
The prompt text fix ("Do NOT autonomously create branches or PRs — report only") is a soft guardrail. However, agentMode: "autopilot" combined with OnPermissionRequest = AutoApprovePermissions means the analyzer session has auto-approved access to file writes, git operations, and shell execution. The LLM is not technically bound by its system prompt — it can still execute tools if injected log content redirects it.
The PR-creation instruction was removed (good), which significantly lowers the risk. But unsanitized log content is still interpolated into the prompt, and the session mode still allows autonomous actions.
Suggested fix: Either (a) use a non-tool-executing mode for analyzer runs (e.g., remove agentMode: "autopilot" and use the default interactive mode), or (b) configure the analyzer session with restricted tool permissions (no git, no file write).
🟢 MINOR: No upper bound on intervalMinutes
File: SessionAnalyzerService.cs — StartAsync() / ConnectionSettings.cs
Flagged by: 2/3 reviewers
Lower bound is clamped to 1. But TimeSpan.FromMinutes(Int32.MaxValue) overflows Task.Delay's internal millisecond cap (~35,791 minutes max), causing ArgumentOutOfRangeException that silently kills the loop. Extremely unlikely in practice, but trivially fixable.
Suggested fix: Add Math.Min(1440, intervalMinutes) (cap at 24 hours) or similar reasonable upper bound.
🟢 MINOR: Start→Stop→Start lifecycle untested
File: SessionAnalyzerTests.cs
Flagged by: 2/3 reviewers
The critical session-leak bug (Finding #2 above) lives in the Stop→Start path, which has no test coverage. Adding a test that mocks CreateSessionAsync to track registrations would catch this.
Discarded (single reviewer, not confirmed)
- Concurrent
StartAsync/StopAsyncrace on_cts(1/3 — unlikely since UI calls run on MAUI main thread) _disposednot thread-safe for concurrent Dispose + DisposeAsync (1/3 — DI container calls dispose once on shutdown)
�� Summary
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | 🟡→🟡 | Prompt injection (autopilot mode remains) | |
| 2 | 🔴→🔴 | Session never removed → restart broken | |
| 3 | 🟡 | Invalid model slug | ✅ FIXED |
| 4 | 🟡→🟢 | Interval validation | |
| 5 | 🟡 | Missing [Collection("BaseDir")] |
✅ FIXED |
| 6 | 🟡→🟢 | Test coverage gaps | |
| 7 | 🟢 | Torn reads | ✅ FIXED |
| 8 | 🟢 | Thread safety | ✅ FIXED |
4/8 findings fully resolved. 4/8 partially addressed.
⚠️ Recommendation: Request Changes
Good progress — 4 of 8 findings are fully fixed, and the remaining 4 are meaningfully improved. The fix commit demonstrates strong responsiveness to feedback.
One issue still blocks merge:
- 🔴 The session-leak/restart-broken bug —
StopAsyncmust callCloseSessionAsyncbefore nulling the session name. Without this, toggling the analyzer on→off→on permanently kills the feature. This is a one-line fix.
Two issues to address (can be done in this PR or a fast follow-up):
- 🟡 Autopilot mode — Consider downgrading from
agentMode: "autopilot"since the analyzer is report-only. This eliminates the injection surface entirely. - 🟢 Upper bound on interval + lifecycle test — Quick wins that prevent edge-case crashes and catch future regressions.
There was a problem hiding this comment.
Expert Code Review Summary — PR #621
Methodology: 3 independent reviewers with adversarial consensus. Findings require 2/3+ agreement; contested 1/3 findings went through follow-up evaluation by the other reviewers.
Findings by Severity
🔴 CRITICAL (3)
| # | Finding | Agreement | File |
|---|---|---|---|
| 1 | Test isolation: missing [Collection("BaseDir")] + no _polypilotDir restore after tests |
3/3 | SessionAnalyzerTests.cs:5 |
| 2 | _analyzerSessionName assigned before CreateSessionAsync succeeds — stale state enables SendPromptAsync to nonexistent session |
3/3 after follow-up | SessionAnalyzerService.cs:52 |
| 3 | Autopilot mode with auto-PR instructions is a prompt injection risk — log content is untrusted input fed into an auto-approved agent | 3/3 after follow-up | SessionAnalyzerService.cs:100 |
🟡 MODERATE (6)
| # | Finding | Agreement | File |
|---|---|---|---|
| 4 | Stop/Dispose never closes analyzer session (resource leak) or awaits _analysisLoop (task leak) |
3/3 | SessionAnalyzerService.cs:77 |
| 5 | Feature completely inert — StartAsync never called, settings never read |
3/3 | MauiProgram.cs:123 |
| 6 | Model slug "claude-sonnet-4-5" likely typo for "claude-sonnet-4.5" |
2/3 after follow-up | SessionAnalyzerService.cs:57 |
| 7 | SendPromptAsync blocks analysis loop with no timeout (autopilot can run 30+ min) |
2/3 after follow-up | SessionAnalyzerService.cs:100 |
| 8 | No validation on SessionAnalyzerIntervalMinutes — 0 creates tight loop |
2/3 after follow-up | ConnectionSettings.cs:144 |
| 9 | TailFile reads entire unbounded log file into memory |
2/3 | SessionAnalyzerService.cs:263 |
Other
| # | Finding | Agreement | File |
|---|---|---|---|
| 10 | TestHelpers at namespace level creates collision risk with existing TestStubs.cs pattern |
2/3 | SessionAnalyzerTests.cs:176 |
Discarded (single reviewer only)
CollectDiagnostics()result never empty — dead null check (Reviewer 2 only)LogAnalyzerwrites to same file it reads — feedback loop (Reviewer 2 only)AnalysisCount/LastAnalysisAtnot thread-safe (Reviewer 2 only)- Redundant
AnalyzerSessionNamefilter inCollectSessionStates(Reviewer 1 only)
Assessment
CI Status: Build succeeded. Tests pass.
Test Coverage: 8 new tests cover diagnostics collection, prompt building, constants, lifecycle, and dispose safety. However, tests have isolation issues (missing [Collection("BaseDir")], no dir restore) that need fixing. No tests cover the StartAsync/Stop lifecycle, the analysis loop, or error recovery paths.
Architecture: The concept of a perpetual monitoring session is sound, but the implementation has significant lifecycle management gaps (session leaks, task leaks, no activation path) and a security concern with unguarded autopilot mode consuming untrusted log data. The critical findings (stale state, prompt injection, test isolation) should be addressed before merge.
Generated by Expert Code Review (auto) for issue #621
|
|
||
| namespace PolyPilot.Tests; | ||
|
|
||
| public class SessionAnalyzerTests |
There was a problem hiding this comment.
🔴 CRITICAL — Test isolation: missing [Collection("BaseDir")] and no _polypilotDir restore (3/3 reviewers)
Three tests call SetBaseDirForTesting(tempDir) then delete tempDir, but never restore _polypilotDir to TestSetup.TestBaseDir. After these tests run, _polypilotDir points to a deleted directory, corrupting subsequent tests. Also, the class lacks [Collection("BaseDir")], enabling xUnit parallel execution races.
Fix:
- Add
[Collection("BaseDir")]to the class - Add
SessionAnalyzerService.SetBaseDirForTesting(TestSetup.TestBaseDir);in eachfinally - Register in
TestSetup.Initialize()
| var token = _cts.Token; | ||
|
|
||
| // Create the analyzer session | ||
| _analyzerSessionName = AnalyzerSessionName; |
There was a problem hiding this comment.
🔴 CRITICAL — _analyzerSessionName set before session creation succeeds (3/3 reviewers after follow-up)
_analyzerSessionName is assigned here before CreateSessionAsync at line 55. If creation fails, the catch block returns without clearing it. A subsequent RunSingleAnalysisAsync() passes the null check (line 87) and calls SendPromptAsync targeting a session that was never created.
Additionally, CreateSessionAsync sets _activeSessionName ??= name inside CopilotService, potentially making the hidden analyzer the UI-active session on fresh startup (the IsHidden flag is set after creation).
Fix: Move assignment to after successful creation:
var session = await _copilotService.CreateSessionAsync(...);
session.IsHidden = true;
_analyzerSessionName = AnalyzerSessionName; // assign AFTER success| { | ||
| var session = await _copilotService.CreateSessionAsync( | ||
| _analyzerSessionName, | ||
| model: "claude-sonnet-4-5", |
There was a problem hiding this comment.
🟡 MODERATE — Likely invalid model slug (2/3 reviewers after follow-up)
"claude-sonnet-4-5" (hyphen) ≠ "claude-sonnet-4.5" (dot). ModelHelper.FallbackModels uses the dot variant. The hyphen passes regex validation but resolves to a nonexistent model at runtime.
Fix: model: "claude-sonnet-4.5"
| /// </summary> | ||
| public void Stop() | ||
| { | ||
| _cts?.Cancel(); |
There was a problem hiding this comment.
🟡 MODERATE — Stop/Dispose lifecycle: session not closed + task not awaited (3/3 reviewers)
Two problems:
- Session leak:
Stop()never callsCloseSessionAsync. The SDK session persists in_sessionsandactive-sessions.jsonacross restarts. - Task leak:
_analysisLoopis never awaited. A newStartAsynccould begin before the previous loop drains.
Fix: Implement IAsyncDisposable, await _analysisLoop, call CloseSessionAsync, clear _analyzerSessionName.
| _analyzerSessionName, | ||
| prompt, | ||
| cancellationToken: cancellationToken, | ||
| agentMode: "autopilot"); |
There was a problem hiding this comment.
🔴 CRITICAL — Autopilot mode with autonomous code-writing is a prompt injection risk (3/3 reviewers after follow-up)
The prompt (line 246) instructs "create a branch, write the fix, run tests, and open a PR" and is sent with agentMode: "autopilot" which auto-approves all tool calls. Diagnostic data includes raw event-diagnostics.log and crash.log content that could contain attacker-controlled strings. A crafted log entry could redirect the agent to make unintended repo modifications.
Also 🟡 MODERATE — Blocking loop (2/3 after follow-up): SendPromptAsync awaits ResponseCompletion.Task (resolved on SessionIdleEvent). In autopilot, creating PRs can take 30+ minutes with no timeout wrapper, blocking the entire analysis loop.
Fix (security): Run in plan mode, remove auto-PR instructions, or require user approval.
Fix (blocking): Wrap with Task.WhenAny(analysisTask, Task.Delay(timeout)).
| builder.Services.AddSingleton<EfficiencyAnalysisService>(); | ||
| builder.Services.AddSingleton<PrLinkService>(); | ||
| builder.Services.AddSingleton<ScheduledTaskService>(); | ||
| builder.Services.AddSingleton<SessionAnalyzerService>(); |
There was a problem hiding this comment.
🟡 MODERATE — Feature is completely inert: StartAsync never called (3/3 reviewers)
StartAsync() is never called from any initialization path, settings handler, or UI code. Neither EnableSessionAnalyzer nor SessionAnalyzerIntervalMinutes are read anywhere. The entire feature is dead code as shipped.
Fix: Wire up startup logic gated by EnableSessionAnalyzer (e.g., in CopilotService.InitializeAsync), or document that activation will be added in a follow-up PR.
| /// <summary> | ||
| /// How often (in minutes) the session analyzer runs its diagnostic analysis. | ||
| /// </summary> | ||
| public int SessionAnalyzerIntervalMinutes { get; set; } = 10; |
There was a problem hiding this comment.
🟡 MODERATE — No validation on interval minutes (2/3 reviewers after follow-up)
Setting to 0 creates a tight loop burning API quota. Negative values cause Task.Delay to throw, which is caught and swallowed — same tight-loop effect.
Fix: intervalMinutes = Math.Max(1, intervalMinutes); in RunAnalysisLoopAsync.
| { | ||
| try | ||
| { | ||
| using var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); |
There was a problem hiding this comment.
🟡 MODERATE — TailFile reads entire file into memory (2/3 reviewers)
Reads all lines of event-diagnostics.log (which grows unboundedly with no rotation) into a List(string), then takes the last N. Can cause significant GC pressure/OOM after days of operation.
Fix: Seek backward from EOF counting newlines, or cap the read to the last ~1MB.
| /// <summary> | ||
| /// Shared test helpers for creating CopilotService instances for unit tests. | ||
| /// </summary> | ||
| internal static class TestHelpers |
There was a problem hiding this comment.
🟡 MODERATE — TestHelpers at namespace level creates collision risk (2/3 reviewers)
internal static class TestHelpers is visible to all test files in PolyPilot.Tests. If any other test file defines a TestHelpers class, the project won't compile. Existing pattern uses TestStubs.cs for shared infrastructure.
Fix: Move CreateTestCopilotService() into TestStubs.cs, or make it a private helper method inside the test class.
CRITICAL fixes: - #1 Test isolation: add [Collection("BaseDir")], restore dir in finally blocks, register in TestSetup.Initialize() - #2 Stale state: only set _analyzerSessionName AFTER CreateSessionAsync succeeds, clear on failure - #3 Prompt injection: remove autonomous PR-creation instructions, change to report-only mode ("Do NOT autonomously create branches or PRs") MODERATE fixes: - #4 Lifecycle: implement IAsyncDisposable, add StopAsync() that awaits _analysisLoop with 5s timeout, nulls _analyzerSessionName - #5 (Feature activation deferred to UI integration PR) - #6 Model slug: fix "claude-sonnet-4-5" → "claude-sonnet-4.5" - #7 Timeout: wrap SendPromptAsync in 10-minute linked CancellationToken - #8 Interval validation: clamp to Math.Max(1, value) in settings setter and in StartAsync - #9 TailFile: use Queue<string> ring buffer instead of List, cap file read to MaxLogFileSizeBytes (10 MB) OTHER fixes: - #10 Remove TestHelpers class, use private CreateService() method - Thread safety: use Interlocked for AnalysisCount and LastAnalysisAt - Torn reads: snapshot GetAllSessions() with .ToList() - 14 tests (up from 8): new tests for TailFile, interval clamping, DisposeAsync, RunSingleAnalysis null guard, report-only prompt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CRITICAL fix: - StopAsync/Stop now call CloseSessionAsync to remove the analyzer session from CopilotService._sessions before nulling the local reference. Without this, Stop→Start permanently killed the analyzer because CreateSessionAsync rejected the duplicate name. - Stop (sync) uses fire-and-forget Task.Run for best-effort cleanup. MODERATE fix: - Remove agentMode: "autopilot" from SendPromptAsync call. The analyzer is report-only — it should not have auto-approved tool execution. This eliminates the prompt injection surface entirely. MINOR fixes: - Add MaxAnalysisIntervalMinutes (1440 = 24h) upper bound constant. - Use Math.Clamp in StartAsync and ConnectionSettings setter to enforce both lower (1 min) and upper (24h) bounds, preventing Task.Delay overflow on extreme values. - Add 3 new tests: upper bound clamping, no-autopilot-mode assertion, and MaxAnalysisIntervalMinutes constant verification. - Total: 16 tests (up from 14), all pass. Full suite: 3500 pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The analyzer runs in autopilot so it can autonomously fix code bugs. All fixes accumulate on a single branch (fix/session-analyzer-findings) and a single PR — new findings push additional commits to the same PR rather than spawning multiple PRs. The prompt instructs the AI to: 1. Check if branch fix/session-analyzer-findings already exists 2. If yes, check it out and add the new fix on top 3. If no, create it from main 4. Check if a PR already exists for that branch 5. If yes, just push — the PR picks up new commits automatically 6. If no, open one Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add StartSessionAnalyzerIfEnabled() / StopSessionAnalyzerAsync() helpers to CopilotService - Call StartSessionAnalyzerIfEnabled() at end of InitializeAsync and ReconnectAsync — auto-starts if EnableSessionAnalyzer is true - Call StopSessionAnalyzerAsync() in ReconnectAsync teardown so the analyzer is cleanly stopped before reconnect - Add Settings UI toggle (desktop-only): checkbox to enable/disable, number input for interval (1-1440 minutes) - Toggle calls Start/Stop immediately — no reconnect needed - Interval changes saved quietly (take effect on next analysis cycle) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing — this functionality can be achieved with scheduled tasks instead. |
Summary
Adds a
SessionAnalyzerService— a background service that maintains a dedicated copilot CLI session to perpetually monitor all running PolyPilot sessions for reliability issues and auto-create PRs with fixes.Architecture
What the Analyzer Monitors
[WATCHDOG]in event-diagnostics.log[ERROR],[RECONNECT][IDLE-FALLBACK](previous)/(resumed)What It Does When Issues Are Found
Settings
EnableSessionAnalyzer(default:false) — opt-in toggleSessionAnalyzerIntervalMinutes(default:10) — analysis frequencyFiles Changed
PolyPilot/Services/SessionAnalyzerService.cs— Core service (302 lines)PolyPilot/Models/ConnectionSettings.cs— Settings propertiesPolyPilot/MauiProgram.cs— DI registrationPolyPilot.Tests/SessionAnalyzerTests.cs— 8 testsPolyPilot.Tests/PolyPilot.Tests.csproj— Compile includeTesting
All 8 new tests pass, existing test suite unaffected.