fix: TurnEnd fallback checks events.jsonl freshness before completing by PureWeen · Pull Request #619 · PureWeen/PolyPilot

PureWeen · 2026-04-18T22:33:46Z

Problem

The ReliabilityTest session stopped sending messages and appeared stuck. Root cause: the TurnEnd→Idle fallback timer prematurely completed the session while a read_bash tool with delay: 120 was still executing.

Timeline:

assistant.turn_end → starts fallback timer (4s + 30s tool extension = 34s)
assistant.turn_start → cancels the fallback ✅
assistant.message + tool.execution_start (read_bash delay:120) — but ToolExecutionStartEvent was NOT delivered via the live SDK event stream (only written to events.jsonl)
The next turn_end (from the same sub-turn sequence) starts a NEW fallback
Fallback checks ActiveToolCallCount → 0 (because TurnStartEvent reset it at step 2, and step 3's increment never arrived)
Fallback fires CompleteResponse → IsProcessing = false
All subsequent TurnStartEvents are rejected as [EVT-REARM-SKIP] ("arrived after explicit completion")
Session permanently stuck — CLI keeps working but UI shows no activity

Root Cause

The live SDK event stream and events.jsonl are separate channels. ToolExecutionStartEvent may appear in events.jsonl (written by the CLI) but not be delivered to HandleSessionEvent (live stream). The TurnEnd fallback only checks ActiveToolCallCount (set by the live stream handler), not whether the CLI is still actively writing.

Fix

After the extended 30s wait in the TurnEnd fallback, check events.jsonl last-write time before completing. If the CLI wrote to it recently (within the 30s window), the session is still active — skip completion. This is the same freshness pattern already used by the watchdog (useUsedToolsTimeout upgrade at line 2675-2698).

Testing

All 3484 tests pass, 0 failures
201 TurnEndFallback + ProcessingWatchdog tests pass

The TurnEnd→Idle fallback timer could prematurely complete a session when ToolExecutionStartEvent was not delivered via the live SDK event stream (only written to events.jsonl). This caused the session to appear idle while the CLI was still actively executing a tool. Scenario: read_bash with delay:120 takes 120 seconds. The TurnEnd fallback waits 34s (4s + 30s tool extension), checks ActiveToolCallCount=0 (because TurnStartEvent reset it and the ToolExecutionStart event wasn't delivered live), and fires CompleteResponse. All subsequent TurnStart events are rejected as 'stale replay' (EVT-REARM-SKIP), leaving the session permanently stuck. Fix: After the extended 30s wait, check events.jsonl last-write time. If the CLI wrote to it recently (within the 30s window), the session is still active — skip completion. This is the same freshness pattern already used by the watchdog (line 2675-2698). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-18T22:41:45Z

Deep Review: PR 619 — TurnEnd fallback checks events.jsonl freshness

Verdict: ✅ This fix is correct, surgical, and well-reasoned.

Root Cause Analysis (confirmed)

The live SDK event stream and events.jsonl are separate channels. ToolExecutionStartEvent can appear in events.jsonl (written by the CLI) but NOT be delivered to HandleSessionEvent (live stream). This causes the TurnEnd fallback to fire with ActiveToolCallCount=0 when a tool IS actually running — a real gap I've observed in diagnostic logs during monitoring.

The timeline in the PR description accurately describes the failure mode:

TurnEnd → fallback timer starts
TurnStart → cancels fallback ✅
tool.execution_start written to events.jsonl but not delivered via live stream → ActiveToolCallCount stays at 0
Next TurnEnd → new fallback starts, sees ActiveToolCallCount=0, fires CompleteResponse
Session permanently stuck — CLI keeps working but UI shows no activity

Code Analysis

Placement is correct: The guard is inserted after the extended 30s wait (TurnEndIdleToolFallbackAdditionalMs) and after the ActiveToolCallCount > 0 re-check — making it the last line of defense before CompleteResponse.

Pattern is established: The same events.jsonl freshness check is already used by the watchdog (useUsedToolsTimeout upgrade at line 2675-2698). This is consistent.

Threshold is right: TurnEndIdleToolFallbackAdditionalMs / 1000.0 = 30s. After waiting 34s total (4s + 30s), if the file was written within the last 30s, the CLI is genuinely still active.

One-shot deferral is acceptable: The return exits the Task.Run callback without rescheduling. This is fine because:

If the CLI just wrote to events.jsonl, more events will arrive (triggering new TurnEnd→TurnStart cycles or SessionIdle)
The watchdog (15s check interval, 600s tool timeout) provides the ultimate safety net per INV-10/INV-11

Edge cases handled correctly:

IsDemoMode/IsRemoteMode guard — events.jsonl doesn't exist in those modes ✅
try/catch swallows all filesystem errors (fail-safe → proceed with completion) ✅
Null/empty session ID check ✅
Thread safety: runs on Task.Run background thread, file I/O only (no UI mutations) ✅

Consistency with Processing State Safety Invariants

INV-10 (TurnEnd fallback must not be permanently suppressed): This guard only PREVENTS one fallback firing — it does not permanently disable the mechanism. Future TurnEnd events restart the fallback timer. ✅
INV-11 (Watchdog must distinguish active tools from lost events): The freshness check adds another signal (disk writes) to distinguish active tools from lost events, complementing the in-memory ActiveToolCallCount. ✅
INV-12 (ProcessingGeneration guard): Not needed here — this code doesn't set IsProcessing=false, it prevents it from being set. ✅

Optional Improvement (non-blocking)

Could reschedule the fallback timer instead of just return-ing when the file is fresh — this would provide faster recovery than waiting for the watchdog. But the watchdog handles the long-term case adequately, so this is truly optional.

Test Coverage

3,484 tests passing with 0 failures, including 201 TurnEndFallback + ProcessingWatchdog tests. The fix is additive-only (no existing behavior changed), so regression risk is minimal.

PureWeen · 2026-04-18T22:42:52Z

🔍 Multi-Model Code Review — PR #619 (v5 Final Review)

Reviewed by: 3 independent reviewers with adversarial consensus
Diff reviewed: 6 commits, 4 files, +327 lines
CI Status: ⚠️ No CI configured to re-run on push (gh-aw only triggers on opened/ready_for_review)

All Findings — Final Status

#	Finding	Source	Status
1	Missing `lastWrite > startedAt` anchor	Multi-model v1 + gh-aw	✅ FIXED (commit 2)
2	No retry after skip → permanent stuck	Multi-model v1 + gh-aw	✅ FIXED (commit 2)
3	Negative fileAge from clock skew	Multi-model v1	✅ FIXED (commit 2)
4	Constant dual-purpose	Multi-model v1 + gh-aw	✅ FIXED (commit 2)
5	Bare catch swallows all exceptions	Multi-model v1	✅ FIXED (commits 2+5)
6	Case B pre-check missing server-liveness	Multi-model v3	✅ FIXED (commit 5)
7	Split block comment in Case B	Multi-model v3	✅ FIXED (commit 5)
8	Skill doc overstates Case A parity	Multi-model v3	✅ FIXED (commit 5)
9	No test coverage	gh-aw v1	✅ FIXED (commit 6) — 9 structural tests

9 of 9 findings resolved across 5 review rounds (multi-model + gh-aw).

Test Coverage (commit 6: `52db5e0f`)

9 new tests in TurnEndFallbackTests.cs:

Test	What it verifies
`TurnEndFallbackFreshnessSeconds_IsReasonable`	Constant in [5, 60] range
`TurnEndFallbackRecheckMs_IsReasonable`	Constant ≤ watchdog interval
`TurnEndFallbackFreshnessSeconds_IsSeparateFromWatchdog`	Not same as CaseBFreshnessSeconds
`FreshnessGuard_HasTurnStartAnchor`	`freshnessCheckStart` captured before delay
`FreshnessGuard_HasRecheckLoop`	Recheck + defer to watchdog
`FreshnessGuard_HasClockSkewProtection`	`Math.Max(0.0, ...)` on both age calculations
`FreshnessGuard_ChecksLastEventType`	`GetLastEventType` + `tool.execution_start`
`SkipsDemoAndRemoteMode`	`IsDemoMode`/`IsRemoteMode` bypass
`WatchdogCaseB_PreCheck_HasServerLivenessGuard`	`IsServerRunning` in pre-check

3493 tests pass, 0 failures.

Recommended Action

✅ Approve

All 9 findings from both multi-model and gh-aw reviews are resolved. Production code is correct, well-documented (skill INV-10/INV-11, copilot-instructions), and covered by 9 structural invariant tests. Ready to merge.

- Add lastWrite > freshnessCheckStart anchor so prior-turn writes don't suppress completion (mirrors watchdog line ~2711) - Add 15s retry recheck after freshness skip so sessions don't get permanently stuck with no recovery path - Guard against negative fileAge from clock skew with Math.Max(0.0,...) - Add dedicated TurnEndFallbackFreshnessSeconds (30s) and TurnEndFallbackRecheckMs (15s) constants instead of reusing delay - Log exception message in catch block instead of silently swallowing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

Multi-Model Review — PR #619

PR: fix: TurnEnd fallback checks events.jsonl freshness before completing
CI: Not available (token not configured in workflow)
Reviewers: 3 independent reviewers

Findings (ranked by severity)

#	Severity	Consensus	Finding
1	🟡 MODERATE	3/3	One-shot `return` with no re-arm — when the events.jsonl freshness guard fires (including false positives where a tool completed during the 30s wait), `CompleteResponse` is never called and no new fallback is armed. The processing watchdog is the only safety net, but its pre-timeout upgrade (line 2701) keeps extending the effective timeout while events.jsonl is <300s old. Net result: session stuck ~5 minutes before the watchdog rescues it, vs 34s without this PR.
2	�� MODERATE	3/3	Missing `lastWrite > startedAt.Value` anchor — the watchdog's equivalent check (line 2713) uses a two-part temporal guard documented at lines 2839-2843 ("we need BOTH checks"). The new code omits the turn-scoping anchor, so writes from a previous turn or background-task activity within the 30s window can trigger the guard incorrectly. This diverges from the established pattern hardened over 13 PRs.
3	🟢 MINOR	3/3	Freshness threshold coupled to delay constant — `fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0` reuses the 30s delay as the freshness window. These are semantically different ("how long to wait" vs "how fresh must the file be"). The watchdog uses dedicated constants. A tighter threshold (~10s) would reduce false positives without a re-check loop.
4	🟢 MINOR	2/3	No test coverage — no behavioral or structural test verifies the guard fires/skips correctly, or that Demo/Remote modes bypass it. The codebase convention requires behavioral tests for state machine changes.

Discarded Findings (1/3 only)

CaptureZeroIdleDiagnostics not called when guard fires — valid observation (diagnostic data lost for watchdog-rescued sessions), but low impact since the watchdog has its own logging. Single-reviewer finding, discarded per adversarial consensus rules.

Assessment

The intent is sound — checking events.jsonl freshness before completing catches the real scenario where ToolExecutionStartEvent is only written to disk, not delivered via the live stream. However, the implementation has two interacting issues: (1) the 30s threshold is as wide as the wait window, making false positives likely for any session where a tool completed during the wait, and (2) the bare return means false positives cause a ~5-minute regression instead of graceful degradation.

Recommended fixes (in priority order):

Add lastWrite > startedAt.Value anchor to match the watchdog pattern
Either tighten the threshold to ~10s, or add a re-check loop (15s delay + recheck) before the final return
Extract a named constant for the freshness threshold

Requesting changes for the two MODERATE findings.

Generated by Expert Code Review (auto) for issue #619

github-actions · 2026-04-18T22:52:37Z

+                                                var fileAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds;
+                                                if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0)


🟡 MODERATE — Missing lastWrite > startedAt.Value anchor (3/3 reviewers)

The watchdog's equivalent check (line 2713) uses a two-part temporal guard:

if (lastWrite > startedAt.Value && fileAge < WatchdogCaseBFreshnessSeconds)

This is explicitly documented in the watchdog comments (lines 2839–2843):

"We need BOTH checks because: 'after turn start' alone stays true forever once any event is written; 'recent' alone could match stale files from a previous turn"

The new code uses only the second half (fileAge < threshold), omitting the lastWrite > startedAt.Value anchor. This means a write from a previous turn or background-task activity that happened to fall within the 30s window can trigger the guard and suppress completion for a session that is actually done.

Scenario: An orchestrator's background shell from a prior prompt keeps writing to events.jsonl. The current turn's TurnEnd fires, tools were used, the 34s fallback checks freshness: fileAge = 5s < 30s → guard fires → return. Session is stuck until the watchdog rescues it (~5 min later).

Suggested fix — match the established watchdog pattern:

var startedAt = state.Info.ProcessingStartedAt; var lastWrite = File.GetLastWriteTimeUtc(ep); var fileAge = (DateTime.UtcNow - lastWrite).TotalSeconds; if (startedAt.HasValue && lastWrite > startedAt.Value && fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0)

github-actions · 2026-04-18T22:52:37Z

+                                                    Debug($"[IDLE-FALLBACK] '{sessionName}' skipped — events.jsonl written {fileAge:F0}s ago, CLI still active");
+                                                    return;


🟡 MODERATE — One-shot return with no re-arm leaves session stuck ~5 minutes (3/3 reviewers)

When this guard fires, the Task.Run lambda exits. Unlike the active-tool check at line 872 (which expects AssistantTurnStartEvent to re-arm), this guard fires at the end of the extended wait where:

No new TurnStartEvent is coming (tool loop ended)

No SessionIdleEvent is coming (the SDK bug this fallback handles)

TurnEndIdleCts is consumed; no re-arm exists

The only remaining backstop is the processing watchdog. Traced timeline for a false-positive:

Time Event

T=0 TurnEnd fires, LastEventAtTicks set

T=34s Guard fires, fallback consumed (return)

T≤295s Watchdog: pre-timeout upgrade (line 2701) sees fileAge < 300s → upgrades timeout from 180s→600s

T≈300s fileAge crosses 300s → no upgrade. elapsed≈300≥180 → Case B: age≥300s → caseBEventsActive=false → completes

Impact: ~266 seconds of extra stuck time (300s vs 34s). Session shows "Thinking…" for ~5 minutes with no progress.

Suggested fix — add a re-check loop instead of a bare return:

// Instead of return, re-check after a shorter interval Debug($"[IDLE-FALLBACK] '{sessionName}' events.jsonl fresh ({fileAge:F0}s) — rechecking in 15s"); await Task.Delay(15_000, fallbackToken); if (fallbackToken.IsCancellationRequested) return; if (state.IsOrphaned) return; var newAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds; if (newAge >= TurnEndIdleToolFallbackAdditionalMs / 1000.0) { // File went stale — proceed to completion below } else { Debug($"[IDLE-FALLBACK] '{sessionName}' still fresh ({newAge:F0}s) — deferring to watchdog"); return; }

Alternatively, tighten the threshold to ~10s so false positives are rare enough that the watchdog rescue delay is acceptable.

github-actions · 2026-04-18T22:52:37Z

+                                            if (File.Exists(ep))
+                                            {
+                                                var fileAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds;
+                                                if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0)


🟢 MINOR — Consider adding lastWrite > startedAt guard for consistency with watchdog (1/2 reviewers flagged, 1/2 dismissed)

The watchdog freshness check at ~line 2713 uses a two-part guard:

if (lastWrite > startedAt.Value && fileAge < WatchdogCaseBFreshnessSeconds)

This code only checks recency (fileAge < 30s), not whether the write happened after the current turn started.

Scenario: Rapid back-to-back prompts where the previous turn's tool wrote to events.jsonl just before the current turn started — the stale mtime could fall within the 30s window and cause a false skip. However, toolsUsedThisTurn=true is required to reach this code and the 30s window is tight, so the scenario is unlikely in practice.

Disputed: Reviewer 1 analyzed this same inconsistency and explicitly dismissed it: "the 30s freshness window is tight enough that a stale write from a previous turn is effectively impossible."

Suggested improvement (non-blocking):

var lastWrite = File.GetLastWriteTimeUtc(ep); var fileAge = (DateTime.UtcNow - lastWrite).TotalSeconds; var turnStart = state.Info.ProcessingStartedAt; if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0 && (!turnStart.HasValue || lastWrite > turnStart.Value))

When ToolExecutionStartEvent isn't delivered via the live SDK stream, ActiveToolCallCount stays 0 even though a tool IS running. For long- running tools (e.g., read_bash delay:600), the freshness check alone doesn't help because events.jsonl was written before the wait started. Fix: check GetLastEventType() in both the TurnEnd fallback and the watchdog Case B. If the last event in events.jsonl is 'tool.execution_start' (no matching tool.execution_complete), a tool is actively running — defer completion regardless of ActiveToolCallCount or file freshness. TurnEnd fallback: defers to watchdog when tool detected on disk. Watchdog Case B: resets inactivity timer and continues (same as Case A server-alive behavior), allowing the tool to run indefinitely until tool.execution_complete appears. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update INV-10 and INV-11 in processing-state-safety skill to document: - The live-event-stream gap where ToolExecutionStartEvent isn't delivered - TurnEnd fallback freshness + GetLastEventType checks (PR #619) - Watchdog Case B pre-check for in-flight tools on disk - New constants: TurnEndFallbackFreshnessSeconds, TurnEndFallbackRecheckMs Update copilot-instructions.md Processing Watchdog section with the live-event-stream gap compensation mechanisms and new diagnostic tags ([IDLE-FALLBACK], [TOOL-HEALTH]). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add _serverManager.IsServerRunning guard to watchdog Case B pre-check so CLI crashes are detected in ~30s instead of 60 min - Replace bare catch {} with logged catch (Exception ex) for diagnostics - Fix split block comment in Case B (readability) - Update INV-11 skill doc to accurately describe pre-check behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

9 new tests covering the events.jsonl freshness guard and watchdog Case B pre-check: - Constant reasonableness: TurnEndFallbackFreshnessSeconds, RecheckMs - Constant separation from watchdog CaseBFreshnessSeconds - Turn-start anchor (freshnessCheckStart captured before delay) - Recheck loop (TurnEndFallbackRecheckMs + defer to watchdog) - Clock skew protection (Math.Max on both fileAge calculations) - GetLastEventType check for tool.execution_start - Demo/Remote mode bypass - Server liveness guard on watchdog Case B pre-check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-19T12:10:18Z

/review

github-actions · 2026-04-19T12:10:39Z

✅ Expert Code Review completed successfully!

Superseded: all 4 findings were fixed in commits 2-6 (97afa52..52db5e0). Dismissed to unblock merge.

- Change submit-pull-request-review to COMMENT-only (was COMMENT + REQUEST_CHANGES). gh-aw doesn't support pull-requests:write, so blocking reviews from bot re-runs can't be auto-dismissed — they persist as stale blocks even after all findings are fixed. - Add concurrency groups to both review workflows to cancel duplicate runs on the same PR. - Update prompt instructions to always use COMMENT event. Fixes the issue on PR #619 where a CHANGES_REQUESTED review from commit 1 persisted after commits 2-6 fixed all findings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-19T14:19:30Z

Consolidated Deep Review — PR #619 (3 parallel reviewers)

Verdict: ✅ Approve with non-blocking suggestions

Code Review (✅ Approve)

Both hunks are correct and well-reasoned:

Hunk A — events.jsonl freshness guard:

Root cause is valid: ToolExecutionStartEvent may appear in events.jsonl but not the live SDK stream
Placement is correct: after 4s+30s extended wait, before CompleteResponse
Threshold (30s) matches the extended delay window — appropriate
One-shot return is safe: watchdog (600s) + future TurnEnd re-arms + SessionIdle all provide recovery
Thread safety: runs in Task.Run, only reads immutable state + filesystem stat — no mutations
IsDemoMode/IsRemoteMode guard is correct
try/catch is correct fail-open (filesystem errors → proceed with completion)

Hunk B — ProcessingGeneration increment in CompleteResponse:

Defense-in-depth: bumping generation on completion prevents stale InvokeOnUI callbacks from resurrecting cleared state
Overflow guard (!= long.MaxValue) is correct — preserves the orphan sentinel (INV-14)

All 18 invariants checked — no violations:

Invariant	Status
INV-1 (complete cleanup)	✅ No new IsProcessing=false path
INV-2 (UI thread)	✅ return exits Task.Run without touching IsProcessing
INV-3 (generation guard)	✅ Strengthened by Hunk B
INV-10 (fallback not suppressed)	✅ Returns for this iteration only — watchdog + future TurnEnd still recover
INV-11/11b (watchdog distinction)	✅ No watchdog changes, additive only
INV-12 (gen capture in bg→UI)	✅ No new InvokeOnUI calls
INV-14 (IsOrphaned)	✅ Orphan check at L873 precedes new code
INV-18 (IDLE-DEFER)	✅ Independent path, no interaction

False negative analysis: If CLI writes heartbeats keeping fileAge < 30s but is truly stuck, the fallback won't fire — but the watchdog's 600s Case C timeout catches this. Acceptable.

Test Review (⚠️ Gap — non-blocking)

TurnEndFallbackTests.cs was not modified by this PR. The existing 12 tests are structural CTS-pattern tests. Zero tests cover the new events.jsonl freshness guard:

Missing coverage	Description
Happy path	Fresh events.jsonl → fallback skips completion
Sad path	Stale events.jsonl → fallback proceeds
File missing	`File.Exists()` returns false → proceeds
Demo/Remote	Guard skipped entirely
Filesystem error	catch block → proceeds
Threshold boundary	fileAge exactly equals 30s

Suggestion: Extract the file-age check into a testable static method and add behavioral tests, or at minimum add a structural guard test verifying the events.jsonl check string exists in the fallback path.

Documentation Review (⚠️ Gap — non-blocking)

No documentation files were modified. Suggested updates:

INV-10 in SKILL.md — add the events.jsonl freshness check as the third guard in the TurnEnd fallback chain: ActiveToolCallCount > 0 → extended delay → events.jsonl freshness → complete
Diagnostic tags — document the new [IDLE-FALLBACK] skipped — events.jsonl written Xs ago message
Regression history — add fix: TurnEnd fallback checks events.jsonl freshness before completing #619 to the timeline
CompleteResponse generation increment — document in the cleanup checklist

Summary

The core fix is correct, surgical, and doesn't introduce any new state fields or IsProcessing paths. Both hunks strengthen existing safety mechanisms. The test and documentation gaps are real but non-blocking given the fix's additive-only nature and the watchdog safety net.

…ssion history 7 new behavioral tests for GetLastEventType (the static method used by the TurnEnd fallback's last-event-type check): - tool.execution_start as last event → detected - session.idle as terminal event → detected - file doesn't exist → returns null - empty file → returns null - invalid JSON → returns null (fail-open) - trailing blank lines → ignored - FileShare.ReadWrite → concurrent read works Update regression history: add #612 and #619. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…o skills (#635) ## Problem The gh-aw review workflow posts `CHANGES_REQUESTED` reviews that persist even after all findings are fixed in subsequent commits. Since gh-aw enforces `pull-requests: read` (no write permission), the workflow cannot auto-dismiss stale reviews on re-runs. This caused PR #619 to be blocked by a stale review from commit 1 even though commits 2-6 fixed all 4 findings. ## Fix 1. **COMMENT-only reviews** — Changed `submit-pull-request-review` from `allowed-events: [COMMENT, REQUEST_CHANGES]` to `allowed-events: [COMMENT]`. The review body still communicates severity through findings; it just doesn't block merging. 2. **Concurrency groups** — Added `concurrency` with `cancel-in-progress: true` to both review workflows, preventing duplicate runs on the same PR. 3. **Updated prompt instructions** — Agent now always uses `COMMENT` event; never `APPROVE` or `REQUEST_CHANGES`. ## Why not auto-dismiss? gh-aw's security model prohibits `pull-requests: write` — all write ops go through safe-outputs. There's no `dismiss-pull-request-review` safe output type, so the workflow can't programmatically dismiss old reviews. ## Also done Dismissed the stale `CHANGES_REQUESTED` review on PR #619 manually. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n-drift workflow (#653) ## Summary Comprehensive gh-aw infrastructure for PolyPilot — skills, security enforcement, drift detection, and workflow hardening. Synced with [dotnet/maui#35027](dotnet/maui#35027) and expanded with PolyPilot-specific security tooling. ## What's Included ### gh-aw-guide skill (`.github/skills/gh-aw-guide/`) **SKILL.md** — Quick-start reference: - 21-row anti-patterns table (manual reimplementations → built-in alternatives) - LabelOps (`label_command:` vs `names:` filtering) - Concurrency race condition warnings for `slash_command` - "Approve and run" gate risk documentation - 4 security-critical patterns with code examples - `checkout: false`, `rate-limit:`, `web-fetch` tool docs **references/architecture.md** — Deep reference: - Execution model with credential availability matrix - Authorization model (`on.roles:` defaults, `roles: all` danger) - Dangerous Triggers Checklist (what gh-aw#23769 fixed and what it didn't) - Security boundaries (9 defense layers table) - Integrity filtering hierarchy - Fork PR handling (5-trigger behavior matrix) - Safe outputs quick reference (30+ types) - Troubleshooting table (12 entries) **scripts/Check-WorkflowSecurity.ps1** — Enforcement scanner: - 8 rules: `pull_request_target` without integrity, `roles: all` on PR workflows, `workflow_run` without branches, `slash_command` + `cancel-in-progress: true`, missing `allowed-events` on reviews, missing `protected-files`, workspace script execution after checkout - Tested against our own workflows — found and fixed 2 issues ### instruction-drift skill (`.github/skills/instruction-drift/`) **SKILL.md** — Drift detection for instruction files: - Coverage gaps tracking, index crawling, issue discovery - P0-P3 classification (factually wrong → nice-to-have) **scripts/Check-Staleness.ps1** (646 lines): - Content hashing (SHA256) to detect doc page changes - Index crawling (`Get-IndexPageLinks`) to discover untracked pages - Issue state comparison (actual vs expected from manifest) - Release tracking (`Get-GitHubLatestRelease`) - Recently closed issue discovery (90-day window) **scripts/Scan-GhAwUpdates.ps1** — Upstream knowledge extraction: - Mines `github/gh-aw` commits for high-signal changes - Watermark-based (only processes new commits per run) - Categorizes: safe-output, trigger, compiler, security, engine, breaking - Samples shared/ workflow configs for real-world patterns ### Workflow hardening - **COMMENT-only reviews** — `allowed-events: [COMMENT]` on all review workflows (prevents stale `CHANGES_REQUESTED` reviews that can't be auto-dismissed — [upstream gap documented](github/gh-aw#25439)) - **Concurrency groups** — Cross-workflow collision (intentional, documented) - **dep-update.md** — Added `protected-files: fallback-to-issue` - **Slim instructions** — 26KB → 34 lines referencing the skill ### Sync manifest `.github/instructions/gh-aw-workflows.sync.yaml` tracks 8 doc pages, 5 issues, and gh-aw releases for drift detection. ## Motivation - Stale `CHANGES_REQUESTED` review blocked [PR #619](#619) — all 4 findings were fixed but the bot review persisted - gh-aw docs are 20+ pages and change frequently — skills + drift detection keep knowledge current - Security scanner catches dangerous patterns before they ship - Synced with dotnet/maui#35027 for cross-repo consistency --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Removed based on analysis of 3 actual review runs (PRs #619, #639, #635): - Removed 3x redundant 'Read copilot-instructions.md' — the Copilot engine auto-loads it as system context, and sub-agents found deep domain bugs without being told to read it - Removed 'You are a thorough PR reviewer for PolyPilot' — generic preamble that adds tokens without changing behavior - Removed MCP tool usage hint ('not gh CLI — credentials are scrubbed') — the agent figures this out from tool availability - Removed duplicate path validation instruction - Removed verbose explanation of why COMMENT not REQUEST_CHANGES — one line is sufficient - Simplified sub-agent prompt from repo-specific to generic 72 lines → 60 lines, same review quality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…abs learnings (#656) Based on analysis of 3 actual review runs (PRs #619, #639, #635), several prompt instructions are redundant and add tokens without improving output quality. **What was removed:** - 3x "Read copilot-instructions.md" — engine auto-loads it; sub-agents found deep domain bugs without it - Generic preamble ("You are a thorough PR reviewer for PolyPilot") - MCP tool usage hint (agent discovers available tools) - Duplicate path validation instruction - Verbose REQUEST_CHANGES explanation (one line is sufficient) - Repo-specific sub-agent prompt → generic ("expert code reviewer" not "expert PolyPilot code reviewer") **What was kept:** - Security warning (treat PR content as untrusted) — changes behavior - No test messages rule — prevents permanent damage - Full adversarial consensus methodology - Path/line validation rules - COMMENT-only policy 72 lines → 60 lines. Same review quality — the evidence from real runs shows the agent finds domain issues from reading code, not from hint bullets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot previously requested changes Apr 18, 2026

View reviewed changes

PureWeen and others added 4 commits April 18, 2026 18:00

PureWeen mentioned this pull request Apr 19, 2026

fix: prevent stale CHANGES_REQUESTED reviews from blocking PRs #635

Merged

PureWeen merged commit afd886b into main Apr 19, 2026

PureWeen deleted the fix/watchdog-tool-timeout-for-long-delays branch April 19, 2026 14:25

PureWeen mentioned this pull request Apr 19, 2026

Update gh-aw instructions and add check-staleness skill dotnet/maui#35027

Open

This was referenced Apr 20, 2026

feat: Register SDK hooks for tool monitoring and error handling #493

Open

feat: gh-aw skills, security scanner, and instruction drift tooling #653

Merged

This was referenced Apr 21, 2026

submit-pull-request-review: add supersede-older-reviews option to dismiss stale bot reviews on re-review github/gh-aw#27655

Closed

refactor: slim expert-reviewer prompt — remove redundant instructions #656

Merged

Copilot AI mentioned this pull request Apr 21, 2026

Add supersede-older-reviews for PR reviews and shift guidance to COMMENT-first defaults github/gh-aw#27662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: TurnEnd fallback checks events.jsonl freshness before completing#619

fix: TurnEnd fallback checks events.jsonl freshness before completing#619
PureWeen merged 7 commits intomainfrom
fix/watchdog-tool-timeout-for-long-delays

PureWeen commented Apr 18, 2026

Uh oh!

PureWeen commented Apr 18, 2026

Uh oh!

PureWeen commented Apr 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

PureWeen commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

PureWeen commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		var fileAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds;
		if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0)

		Debug($"[IDLE-FALLBACK] '{sessionName}' skipped — events.jsonl written {fileAge:F0}s ago, CLI still active");
		return;

Time	Event
T=0	TurnEnd fires, `LastEventAtTicks` set
T=34s	Guard fires, fallback consumed (`return`)
T≤295s	Watchdog: pre-timeout upgrade (line 2701) sees `fileAge < 300s` → upgrades timeout from 180s→600s
T≈300s	`fileAge` crosses 300s → no upgrade. elapsed≈300≥180 → Case B: `age≥300s` → `caseBEventsActive=false` → completes

Conversation

PureWeen commented Apr 18, 2026

Problem

Root Cause

Fix

Testing

Uh oh!

PureWeen commented Apr 18, 2026

Deep Review: PR 619 — TurnEnd fallback checks events.jsonl freshness

Root Cause Analysis (confirmed)

Code Analysis

Consistency with Processing State Safety Invariants

Optional Improvement (non-blocking)

Test Coverage

Uh oh!

PureWeen commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Multi-Model Code Review — PR #619 (v5 Final Review)

All Findings — Final Status

Test Coverage (commit 6: 52db5e0f)

Recommended Action

✅ Approve

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Multi-Model Review — PR #619

Findings (ranked by severity)

Discarded Findings (1/3 only)

Assessment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PureWeen commented Apr 19, 2026

Consolidated Deep Review — PR #619 (3 parallel reviewers)

Verdict: ✅ Approve with non-blocking suggestions

Code Review (✅ Approve)

Test Review (⚠️ Gap — non-blocking)

Documentation Review (⚠️ Gap — non-blocking)

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PureWeen commented Apr 18, 2026 •

edited

Loading

Test Coverage (commit 6: `52db5e0f`)

github-actions Bot commented Apr 19, 2026 •

edited

Loading