Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 45 additions & 1 deletion .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,40 @@ When a prompt is sent, the SDK emits events processed by `HandleSessionEvent` in
5. `AssistantIntentEvent` → intent/plan updates
6. `SessionIdleEvent` → turn complete, response finalized

### Processing Watchdog
The processing watchdog (`RunProcessingWatchdogAsync` in `CopilotService.Events.cs`) detects stuck sessions by checking how long since the last SDK event. It checks every 15 seconds and has two timeout tiers:
- **120 seconds** (inactivity timeout) — for sessions with no tool activity
- **600 seconds** (tool execution timeout) — used when ANY of these are true:
- A tool call is actively running (`ActiveToolCallCount > 0`)
- The session was resumed mid-turn after app restart (`IsResumed`)
- Tools have been used this turn (`HasUsedToolsThisTurn`) — even between tool rounds when the model is thinking

The 10-second resume timeout was removed — the watchdog handles all stuck-session detection.

When the watchdog fires, it marshals state mutations to the UI thread via `InvokeOnUI()` and adds a system warning message. All code paths that set `IsProcessing = false` must go through the UI thread.

### Diagnostic Log Tags
The event diagnostics log (`~/.polypilot/event-diagnostics.log`) uses these tags:
- `[SEND]` — prompt sent, IsProcessing set to true
- `[EVT]` — SDK event received (only SessionIdleEvent, AssistantTurnEndEvent, SessionErrorEvent)
- `[IDLE]` — SessionIdleEvent dispatched to CompleteResponse
- `[COMPLETE]` — CompleteResponse executed or skipped
- `[RECONNECT]` — session replaced after disconnect
- `[ERROR]` — SessionErrorEvent or SendAsync/reconnect failure cleared IsProcessing
- `[ABORT]` — user-initiated abort cleared IsProcessing
- `[BRIDGE-COMPLETE]` — bridge OnTurnEnd cleared IsProcessing
- `[INTERRUPTED]` — app restart detected interrupted turn (watchdog timeout after resume)

Every code path that sets `IsProcessing = false` MUST have a diagnostic log entry. This is critical for debugging stuck-session issues.

### Thread Safety: IsProcessing Mutations
All mutations to `state.Info.IsProcessing` must be marshaled to the UI thread. SDK events arrive on background threads. Use `InvokeOnUI()` (not bare `Invoke()`) to combine state mutation + notification in a single callback. Key patterns:
- **CompleteResponse**: Already runs on UI thread (dispatched via `Invoke()`)
- **Watchdog callback**: Uses `InvokeOnUI()` with generation guard
- **SessionErrorEvent**: Uses `InvokeOnUI()` to combine OnError + IsProcessing + OnStateChanged
- **Resume fallback**: Removed (watchdog handles it)
- **SendAsync error paths**: Run on UI thread inline (in SendPromptAsync's catch blocks)

### Model Selection
The model is set at **session creation time** via `SessionConfig.Model`. The SDK does **not** support changing models per-message or mid-session — `MessageOptions` has no `Model` property.

Expand All @@ -155,7 +189,7 @@ When a user changes the model via the UI dropdown:
Avoid `@bind:event="oninput"` — causes round-trip lag per keystroke. Use plain HTML inputs with JS event listeners and read values via `JS.InvokeAsync<string>("eval", "document.getElementById('id')?.value")` on submit.

### Session Persistence
- Active sessions: `~/.polypilot/active-sessions.json`
- Active sessions: `~/.polypilot/active-sessions.json` (includes `LastPrompt` — last user message if session was processing during save)
- Session state: `~/.copilot/session-state/<guid>/events.jsonl` (SDK-managed, stays in ~/.copilot)
- UI state: `~/.polypilot/ui-state.json`
- Settings: `~/.polypilot/settings.json`
Expand Down Expand Up @@ -206,6 +240,16 @@ Test files in `PolyPilot.Tests/`:
- `PlatformHelperTests.cs` — Platform detection
- `ToolResultFormattingTests.cs` — Tool output formatting
- `UiStatePersistenceTests.cs` — UI state save/load
- `ProcessingWatchdogTests.cs` — Watchdog constants, timeout selection, HasUsedToolsThisTurn, IsResumed
- `CliPathResolutionTests.cs` — CLI path resolution
- `InitializationModeTests.cs` — Mode initialization
- `PersistentModeTests.cs` — Persistent mode behavior
- `ReflectionCycleTests.cs` — Reflection cycle logic
- `SessionDisposalResilienceTests.cs` — Session disposal
- `RenderThrottleTests.cs` — Render throttling
- `DevTunnelServiceTests.cs` — DevTunnel service
- `WsBridgeServerAuthTests.cs` — Bridge auth
- `ModelSelectionTests.cs` — Model selection

UI scenario definitions live in `PolyPilot.Tests/Scenarios/mode-switch-scenarios.json` — executable via MauiDevFlow CDP commands against a running app.

Expand Down
69 changes: 69 additions & 0 deletions PolyPilot.Tests/ChatMessageTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,75 @@ public void DefaultProperties_AreCorrect()
Assert.Null(msg.ReasoningId);
Assert.Null(msg.ToolCallId);
}

[Fact]
public void Model_DefaultsToNull()
{
var msg = ChatMessage.AssistantMessage("test");
Assert.Null(msg.Model);
}

[Fact]
public void Model_CanBeSetViaInitializer()
{
var msg = new ChatMessage("assistant", "test", DateTime.Now) { Model = "gpt-4.1" };
Assert.Equal("gpt-4.1", msg.Model);
}

[Fact]
public void Model_PreservedOnAssistantMessages()
{
var msg = new ChatMessage("assistant", "response", DateTime.Now) { Model = "claude-sonnet-4.5" };
Assert.True(msg.IsAssistant);
Assert.Equal("claude-sonnet-4.5", msg.Model);
}

[Fact]
public void Model_NullForUserMessages()
{
var msg = ChatMessage.UserMessage("hello");
Assert.Null(msg.Model);
}

// --- Interrupted turn system messages ---

[Fact]
public void InterruptedTurn_SystemMessage_ContainsWarning()
{
var interruptMsg = "⚠️ Your previous request was interrupted by an app restart. You may need to resend your last message.";
var msg = ChatMessage.SystemMessage(interruptMsg);

Assert.Equal("system", msg.Role);
Assert.Equal(ChatMessageType.System, msg.MessageType);
Assert.Contains("interrupted by an app restart", msg.Content);
Assert.Contains("resend your last message", msg.Content);
Assert.True(msg.IsComplete);
}

[Fact]
public void InterruptedTurn_SystemMessage_IncludesLastPrompt()
{
var lastPrompt = "fix the authentication bug in UserController.cs";
var truncated = lastPrompt.Length > 80 ? lastPrompt[..80] + "…" : lastPrompt;
var interruptMsg = $"⚠️ Your previous request was interrupted by an app restart. You may need to resend your last message.\n📝 Last message: \"{truncated}\"";
var msg = ChatMessage.SystemMessage(interruptMsg);

Assert.Contains("Last message:", msg.Content);
Assert.Contains("fix the authentication bug", msg.Content);
}

[Fact]
public void InterruptedTurn_SystemMessage_TruncatesLongPrompt()
{
var longPrompt = new string('x', 200);
var truncated = longPrompt[..80] + "…";
var interruptMsg = $"⚠️ Your previous request was interrupted by an app restart. You may need to resend your last message.\n📝 Last message: \"{truncated}\"";
var msg = ChatMessage.SystemMessage(interruptMsg);

Assert.Contains("…", msg.Content);
// The truncated version should be 80 chars + ellipsis, not the full 200
Assert.DoesNotContain(longPrompt, msg.Content);
}
}

public class ToolActivityTests
Expand Down
227 changes: 227 additions & 0 deletions PolyPilot.Tests/ProcessingWatchdogTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -810,4 +810,231 @@ public async Task MultipleAbortResendCycles_MaintainCleanState()
Assert.True(session.History.Count >= 5,
$"Expected at least 5 history entries from 5 send cycles, got {session.History.Count}");
}

// ===========================================================================
// Watchdog timeout selection logic
// Tests the 3-way condition: hasActiveTool || IsResumed || HasUsedToolsThisTurn
// SessionState is private, so we replicate the decision logic inline using
// local variables that mirror the watchdog algorithm in CopilotService.Events.cs.
// ===========================================================================

[Fact]
public void HasUsedToolsThisTurn_DefaultsFalse()
{
// Mirrors SessionState.HasUsedToolsThisTurn default (bool default = false)
bool hasUsedToolsThisTurn = default;
Assert.False(hasUsedToolsThisTurn);
}

[Fact]
public void HasUsedToolsThisTurn_CanBeSet()
{
// Mirrors setting HasUsedToolsThisTurn = true on ToolExecutionStartEvent
bool hasUsedToolsThisTurn = false;
hasUsedToolsThisTurn = true;
Assert.True(hasUsedToolsThisTurn);
}

[Fact]
public void HasUsedToolsThisTurn_ResetByCompleteResponse()
{
// Mirrors CompleteResponse resetting HasUsedToolsThisTurn = false
bool hasUsedToolsThisTurn = true;
// CompleteResponse resets the field
hasUsedToolsThisTurn = false;
Assert.False(hasUsedToolsThisTurn);
}

[Fact]
public void WatchdogTimeoutSelection_NoTools_UsesInactivityTimeout()
{
// When no tool activity and not resumed → use shorter inactivity timeout
int activeToolCallCount = 0;
bool isResumed = false;
bool hasUsedToolsThisTurn = false;

var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || isResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(CopilotService.WatchdogInactivityTimeoutSeconds, effectiveTimeout);
Assert.Equal(120, effectiveTimeout);
}

[Fact]
public void WatchdogTimeoutSelection_ActiveTool_UsesToolTimeout()
{
// When ActiveToolCallCount > 0 → use longer tool execution timeout
int activeToolCallCount = 1;
bool isResumed = false;
bool hasUsedToolsThisTurn = false;

var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || isResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(CopilotService.WatchdogToolExecutionTimeoutSeconds, effectiveTimeout);
Assert.Equal(600, effectiveTimeout);
}

[Fact]
public void WatchdogTimeoutSelection_ResumedSession_UsesToolTimeout()
{
// When session is resumed (IsResumed=true) → use longer tool timeout
// because resumed sessions may have in-flight tool calls from before restart
int activeToolCallCount = 0;
bool isResumed = true;
bool hasUsedToolsThisTurn = false;

var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || isResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(CopilotService.WatchdogToolExecutionTimeoutSeconds, effectiveTimeout);
Assert.Equal(600, effectiveTimeout);
}

[Fact]
public void WatchdogTimeoutSelection_HasUsedTools_UsesToolTimeout()
{
// When tools have been used this turn (HasUsedToolsThisTurn=true) → use longer
// tool timeout even between tool rounds when the model is thinking
int activeToolCallCount = 0;
bool isResumed = false;
bool hasUsedToolsThisTurn = true;

var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || isResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(CopilotService.WatchdogToolExecutionTimeoutSeconds, effectiveTimeout);
Assert.Equal(600, effectiveTimeout);
}

[Fact]
public void HasUsedToolsThisTurn_ResetOnNewSend()
{
// SendPromptAsync resets HasUsedToolsThisTurn alongside ActiveToolCallCount
// to prevent stale tool-usage from a previous turn inflating the timeout
bool hasUsedToolsThisTurn = true;
// SendPromptAsync resets it
hasUsedToolsThisTurn = false;
int activeToolCallCount = 0;
bool isResumed = false;

var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || isResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(120, effectiveTimeout);
}

[Fact]
public void IsResumed_ClearedAfterFirstTurn()
{
// IsResumed is only set when session was mid-turn at restart,
// and should be cleared after the first successful CompleteResponse
var info = new AgentSessionInfo { Name = "test", Model = "test", IsResumed = true };
Assert.True(info.IsResumed);

// CompleteResponse clears it
info.IsResumed = false;
Assert.False(info.IsResumed);

// Subsequent turns use inactivity timeout (120s), not tool timeout (600s)
int activeToolCallCount = 0;
bool hasUsedToolsThisTurn = false;

var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || info.IsResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(120, effectiveTimeout);
}

[Fact]
public void IsResumed_OnlySetWhenStillProcessing()
{
// IsResumed should only be true when session was mid-turn at restart
// Idle-resumed sessions should NOT get the 600s timeout
var idleResumed = new AgentSessionInfo { Name = "idle", Model = "test", IsResumed = false };
var midTurnResumed = new AgentSessionInfo { Name = "mid", Model = "test", IsResumed = true };

Assert.False(idleResumed.IsResumed);
Assert.True(midTurnResumed.IsResumed);
}

[Fact]
public void IsResumed_ClearedOnAbort()
{
// Abort must clear IsResumed so subsequent turns use 120s timeout
var info = new AgentSessionInfo { Name = "t", Model = "m", IsResumed = true };
Assert.True(info.IsResumed);

// Simulate abort path
info.IsProcessing = false;
info.IsResumed = false;

Assert.False(info.IsResumed);
}

[Fact]
public void IsResumed_ClearedOnError()
{
// SessionErrorEvent must clear IsResumed
var info = new AgentSessionInfo { Name = "t", Model = "m", IsResumed = true };

// Simulate error path
info.IsProcessing = false;
info.IsResumed = false;

Assert.False(info.IsResumed);
}

[Fact]
public void IsResumed_ClearedOnWatchdogTimeout()
{
// Watchdog timeout must clear IsResumed so next turns don't get 600s
var info = new AgentSessionInfo { Name = "t", Model = "m", IsResumed = true };

// Simulate watchdog timeout path
info.IsProcessing = false;
info.IsResumed = false;

// Verify next turn would use 120s
int activeToolCallCount = 0;
bool hasUsedToolsThisTurn = false;
var hasActiveTool = Interlocked.CompareExchange(ref activeToolCallCount, 0, 0) > 0;
var useToolTimeout = hasActiveTool || info.IsResumed || hasUsedToolsThisTurn;
var effectiveTimeout = useToolTimeout
? CopilotService.WatchdogToolExecutionTimeoutSeconds
: CopilotService.WatchdogInactivityTimeoutSeconds;

Assert.Equal(120, effectiveTimeout);
}

[Fact]
public void HasUsedToolsThisTurn_VolatileConsistency()
{
// Verify that Volatile.Write/Read round-trips correctly
// (mirrors the cross-thread pattern: SDK thread writes, watchdog timer reads)
bool field = false;
Volatile.Write(ref field, true);
Assert.True(Volatile.Read(ref field));

Volatile.Write(ref field, false);
Assert.False(Volatile.Read(ref field));
}
}
Loading
Loading