Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .claude/skills/processing-state-safety/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ Every code path that sets `IsProcessing = false` MUST also:
| 7 | SendAsync initial failure | CopilotService.cs | UI | Prompt send failed |
| 8 | Bridge OnTurnEnd | Bridge.cs | Background → InvokeOnUI | Remote mode turn complete |

## Content Persistence Safety

### Turn-End Flush
`FlushCurrentResponse` is called on `AssistantTurnEndEvent` to persist accumulated response text at each sub-turn boundary. Without this, response content between `assistant.turn_end` and `session.idle` is lost if the app restarts (the ReviewPRs bug — response content was lost on app restart).

### Dedup Guard on Resume
`FlushCurrentResponse` includes a dedup check: if the last non-tool assistant message in History has identical content, it skips the add and just clears `CurrentResponse`. This prevents duplicates when SDK replays events after session resume.

## 8 Invariants

### INV-1: Complete state cleanup
Expand Down Expand Up @@ -73,14 +81,18 @@ Clearing guarded on `!hasActiveTool && !HasUsedToolsThisTurn`.
`HandleComplete` is already on UI thread. `InvokeAsync` defers execution
causing stale renders.

## Top 3 Recurring Mistakes
## Top 4 Recurring Mistakes

1. **Incomplete cleanup** — modifying one IsProcessing path without
updating ALL fields that must be cleared simultaneously.
2. **ActiveToolCallCount as sole tool signal** — gets reset/skipped
in several paths; always check `HasUsedToolsThisTurn` too.
3. **Background thread mutations** — mutating IsProcessing or related
state on SDK event threads instead of marshaling to UI thread.
4. **Missing content flush on turn boundaries** — `FlushCurrentResponse`
must be called at every point where accumulated text could be lost
(turn_end, tool_start, abort, error, watchdog). The turn_end call
was missing until PR #224, causing response loss on app restart.

## Regression History

Expand Down
14 changes: 10 additions & 4 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ When a prompt is sent, the SDK emits events processed by `HandleSessionEvent` in
5. `ToolExecutionStartEvent` → tool activity starts, sets `ProcessingPhase=3`, increments `ToolCallCount` on complete
6. `ToolExecutionCompleteEvent` → tool done, increments `ToolCallCount`
7. `AssistantIntentEvent` → intent/plan updates
8. `AssistantTurnEndEvent` → end of a sub-turn, tool loop continues
8. `AssistantTurnEndEvent` → end of a sub-turn, tool loop continues. `FlushCurrentResponse` persists accumulated text before the next sub-turn.
9. `SessionIdleEvent` → turn complete, response finalized

### Processing Status Indicator
Expand All @@ -173,19 +173,25 @@ All three are reset in `SendPromptAsync` (new turn) and cleared in `CompleteResp
The UI shows: "Sending…" → "Server connected…" → "Thinking…" → "Working · Xm Xs · N tool calls…".

### Abort Behavior
`AbortSessionAsync` must clear ALL processing state — see `.claude/skills/processing-state-safety/SKILL.md` for the full cleanup checklist and the 7 paths that clear `IsProcessing`.
`AbortSessionAsync` must clear ALL processing state — see `.claude/skills/processing-state-safety/SKILL.md` for the full cleanup checklist and the 8 paths that clear `IsProcessing`.

### ⚠️ IsProcessing Cleanup Invariant
**CRITICAL**: Every code path that sets `IsProcessing = false` must clear 9 companion fields and call `FlushCurrentResponse`. This is the most recurring bug category (7 PRs, 16 fix/regression cycles). **Read `.claude/skills/processing-state-safety/SKILL.md` before modifying ANY processing path.** There are 8 such paths across CopilotService.cs, Events.cs, and Bridge.cs.
**CRITICAL**: Every code path that sets `IsProcessing = false` must clear 9 companion fields and call `FlushCurrentResponse`. This is the most recurring bug category (7 PRs of fix/regression cycles). **Read `.claude/skills/processing-state-safety/SKILL.md` before modifying ANY processing path.** There are 8 such paths across CopilotService.cs, Events.cs, and Bridge.cs.

### Content Persistence
`FlushCurrentResponse` is also called on `AssistantTurnEndEvent` to persist accumulated response text at each sub-turn boundary. This prevents content loss if the app restarts between `turn_end` and `session.idle` (e.g., "zero-idle sessions" where the SDK never emits `session.idle`). The flush includes a dedup guard to prevent duplicate messages from event replay on resume.

### Processing Watchdog
The processing watchdog (`RunProcessingWatchdogAsync` in `CopilotService.Events.cs`) detects stuck sessions by checking how long since the last SDK event. It checks every 15 seconds and has two timeout tiers:
The processing watchdog (`RunProcessingWatchdogAsync` in `CopilotService.Events.cs`) detects stuck sessions by checking how long since the last SDK event. It checks every 15 seconds and has three timeout tiers:
- **30 seconds** (resume quiescence) — for resumed sessions with zero SDK events since restart. Assumes the turn already finished before the restart. **Bypassed** when the events file shows recent activity (< 120s old) — in that case, the session was genuinely active and gets the longer timeout.
- **120 seconds** (inactivity timeout) — for sessions with no tool activity
- **600 seconds** (tool execution timeout) — used when ANY of these are true:
- A tool call is actively running (`ActiveToolCallCount > 0`)
- The session was resumed mid-turn after app restart (`IsResumed`)
- Tools have been used this turn (`HasUsedToolsThisTurn`) — even between tool rounds when the model is thinking

Note: Some sessions never receive `session.idle` events (SDK/CLI bug). In these "zero-idle" cases, `IsProcessing` is only cleared by the watchdog or user abort. The turn_end flush (see Content Persistence above) ensures response content is not lost.

When the watchdog fires, it marshals state mutations to the UI thread via `InvokeOnUI()` and adds a system warning message.

### Diagnostic Log Tags
Expand Down
237 changes: 237 additions & 0 deletions PolyPilot.Tests/ProcessingWatchdogTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1341,4 +1341,241 @@ public void AllThreeTimeoutTiers_AreDistinct()
Assert.True(CopilotService.WatchdogInactivityTimeoutSeconds
< CopilotService.WatchdogToolExecutionTimeoutSeconds);
}

// --- GetEventsFileRestoreHints tests ---

[Fact]
public void RestoreHints_MissingFile_ReturnsFalse()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
Directory.CreateDirectory(basePath);
try
{
var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("nonexistent-session", basePath);
Assert.False(isRecentlyActive);
Assert.False(hadToolActivity);
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_FreshFile_AssistantEvent_ReturnsRecentlyActiveOnly()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
// Write a fresh events.jsonl with a non-tool active event
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"),
"""{"type":"assistant.message_delta","data":{}}""");

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);
Assert.True(isRecentlyActive, "File was just written — should be recently active");
Assert.False(hadToolActivity, "Last event is not a tool event");
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_FreshFile_ToolEvent_ReturnsBothTrue()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
// Write a fresh events.jsonl with a tool execution event as the last line
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"),
"""{"type":"assistant.turn_start","data":{}}""" + "\n" +
"""{"type":"tool.execution_start","data":{"name":"bash"}}""");

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);
Assert.True(isRecentlyActive, "File was just written — should be recently active");
Assert.True(hadToolActivity, "Last event is tool.execution_start");
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_FreshFile_ToolProgressEvent_ReturnsBothTrue()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"),
"""{"type":"tool.execution_progress","data":{}}""");

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);
Assert.True(isRecentlyActive);
Assert.True(hadToolActivity, "Last event is tool.execution_progress");
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_StaleFile_ReturnsNotRecentlyActive()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
var eventsFile = Path.Combine(sessionDir, "events.jsonl");
File.WriteAllText(eventsFile,
"""{"type":"tool.execution_start","data":{"name":"bash"}}""");
// Make file older than inactivity timeout
File.SetLastWriteTimeUtc(eventsFile,
DateTime.UtcNow.AddSeconds(-(CopilotService.WatchdogInactivityTimeoutSeconds + 10)));

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);
Assert.False(isRecentlyActive, "File is stale — should not be recently active");
Assert.False(hadToolActivity, "Stale files should not report tool activity");
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_EmptyFile_ReturnsRecentlyActiveWithNoToolActivity()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"), "");
var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);
Assert.True(isRecentlyActive, "Fresh empty file is still recently active");
Assert.False(hadToolActivity, "Empty file has no tool events");
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_FreshToolActivity_BypassesQuiescenceTimeout()
{
// Integration-style test: When restore hints indicate recent tool activity,
// the effective watchdog timeout should NOT be the 30s quiescence timeout.
// Simulates the scenario from the bug: session is genuinely active on the server
// but SDK hasn't reconnected yet.
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"),
"""{"type":"tool.execution_start","data":{"name":"bash"}}""");

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);

// Simulate what the restore code does with these hints
bool hasReceivedEvents = isRecentlyActive; // Pre-seeded from hints
bool hasUsedTools = hadToolActivity; // Pre-seeded from hints

var effectiveTimeout = ComputeEffectiveTimeout(
hasActiveTool: false,
isResumed: true,
hasReceivedEvents: hasReceivedEvents,
hasUsedTools: hasUsedTools);

// Must NOT be the 30s quiescence — should be 600s tool timeout
Assert.NotEqual(CopilotService.WatchdogResumeQuiescenceTimeoutSeconds, effectiveTimeout);
Assert.Equal(CopilotService.WatchdogToolExecutionTimeoutSeconds, effectiveTimeout);
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_FreshNonToolActivity_BypassesQuiescenceTimeout()
{
// When restore hints indicate recent non-tool activity, the timeout should
// transition through the IsResumed clearing logic to 120s inactivity.
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"),
"""{"type":"assistant.message_delta","data":{}}""");

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);

bool hasReceivedEvents = isRecentlyActive;
bool hasUsedTools = hadToolActivity;

var effectiveTimeout = ComputeEffectiveTimeout(
hasActiveTool: false,
isResumed: true,
hasReceivedEvents: hasReceivedEvents,
hasUsedTools: hasUsedTools);

// Must NOT be the 30s quiescence — should be 600s (resumed + events = tool timeout)
Assert.NotEqual(CopilotService.WatchdogResumeQuiescenceTimeoutSeconds, effectiveTimeout);
Assert.Equal(CopilotService.WatchdogToolExecutionTimeoutSeconds, effectiveTimeout);
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_StaleFile_StillUsesQuiescenceTimeout()
{
// When the file is stale, the quiescence timeout should still apply —
// the turn probably finished long ago.
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
var eventsFile = Path.Combine(sessionDir, "events.jsonl");
File.WriteAllText(eventsFile,
"""{"type":"tool.execution_start","data":{"name":"bash"}}""");
File.SetLastWriteTimeUtc(eventsFile,
DateTime.UtcNow.AddSeconds(-(CopilotService.WatchdogInactivityTimeoutSeconds + 10)));

var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);

// Stale: no pre-seeding → quiescence still applies
bool hasReceivedEvents = isRecentlyActive; // false
bool hasUsedTools = hadToolActivity; // false

var effectiveTimeout = ComputeEffectiveTimeout(
hasActiveTool: false,
isResumed: true,
hasReceivedEvents: hasReceivedEvents,
hasUsedTools: hasUsedTools);

Assert.Equal(CopilotService.WatchdogResumeQuiescenceTimeoutSeconds, effectiveTimeout);
}
finally { Directory.Delete(basePath, true); }
}

[Fact]
public void RestoreHints_MalformedJson_PreservesFileAgeSignal()
{
var service = CreateService();
var basePath = Path.Combine(Path.GetTempPath(), $"restore-hints-{Guid.NewGuid()}");
var sessionDir = Path.Combine(basePath, "test-session");
Directory.CreateDirectory(sessionDir);
try
{
File.WriteAllText(Path.Combine(sessionDir, "events.jsonl"), "{{ bad json {{");
var (isRecentlyActive, hadToolActivity) = service.GetEventsFileRestoreHints("test-session", basePath);
// File was just written (age < 120s) so isRecentlyActive is true even though JSON is malformed.
// This ensures the quiescence bypass still works for recently-active sessions with corrupt events.
Assert.True(isRecentlyActive, "Recently-written file should preserve isRecentlyActive despite malformed JSON");
Assert.False(hadToolActivity, "Cannot detect tool activity from bad JSON");
}
finally { Directory.Delete(basePath, true); }
}
}
Loading