fix: TurnEnd fallback checks events.jsonl freshness before completing#619
fix: TurnEnd fallback checks events.jsonl freshness before completing#619
Conversation
The TurnEnd→Idle fallback timer could prematurely complete a session when ToolExecutionStartEvent was not delivered via the live SDK event stream (only written to events.jsonl). This caused the session to appear idle while the CLI was still actively executing a tool. Scenario: read_bash with delay:120 takes 120 seconds. The TurnEnd fallback waits 34s (4s + 30s tool extension), checks ActiveToolCallCount=0 (because TurnStartEvent reset it and the ToolExecutionStart event wasn't delivered live), and fires CompleteResponse. All subsequent TurnStart events are rejected as 'stale replay' (EVT-REARM-SKIP), leaving the session permanently stuck. Fix: After the extended 30s wait, check events.jsonl last-write time. If the CLI wrote to it recently (within the 30s window), the session is still active — skip completion. This is the same freshness pattern already used by the watchdog (line 2675-2698). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deep Review: PR 619 — TurnEnd fallback checks events.jsonl freshnessVerdict: ✅ This fix is correct, surgical, and well-reasoned. Root Cause Analysis (confirmed)The live SDK event stream and The timeline in the PR description accurately describes the failure mode:
Code AnalysisPlacement is correct: The guard is inserted after the extended 30s wait ( Pattern is established: The same events.jsonl freshness check is already used by the watchdog ( Threshold is right: One-shot deferral is acceptable: The
Edge cases handled correctly:
Consistency with Processing State Safety Invariants
Optional Improvement (non-blocking)Could reschedule the fallback timer instead of just Test Coverage3,484 tests passing with 0 failures, including 201 TurnEndFallback + ProcessingWatchdog tests. The fix is additive-only (no existing behavior changed), so regression risk is minimal. |
🔍 Multi-Model Code Review — PR #619 (v5 Final Review)Reviewed by: 3 independent reviewers with adversarial consensus All Findings — Final Status
9 of 9 findings resolved across 5 review rounds (multi-model + gh-aw). Test Coverage (commit 6:
|
| Test | What it verifies |
|---|---|
TurnEndFallbackFreshnessSeconds_IsReasonable |
Constant in [5, 60] range |
TurnEndFallbackRecheckMs_IsReasonable |
Constant ≤ watchdog interval |
TurnEndFallbackFreshnessSeconds_IsSeparateFromWatchdog |
Not same as CaseBFreshnessSeconds |
FreshnessGuard_HasTurnStartAnchor |
freshnessCheckStart captured before delay |
FreshnessGuard_HasRecheckLoop |
Recheck + defer to watchdog |
FreshnessGuard_HasClockSkewProtection |
Math.Max(0.0, ...) on both age calculations |
FreshnessGuard_ChecksLastEventType |
GetLastEventType + tool.execution_start |
SkipsDemoAndRemoteMode |
IsDemoMode/IsRemoteMode bypass |
WatchdogCaseB_PreCheck_HasServerLivenessGuard |
IsServerRunning in pre-check |
3493 tests pass, 0 failures.
Recommended Action
✅ Approve
All 9 findings from both multi-model and gh-aw reviews are resolved. Production code is correct, well-documented (skill INV-10/INV-11, copilot-instructions), and covered by 9 structural invariant tests. Ready to merge.
- Add lastWrite > freshnessCheckStart anchor so prior-turn writes don't suppress completion (mirrors watchdog line ~2711) - Add 15s retry recheck after freshness skip so sessions don't get permanently stuck with no recovery path - Guard against negative fileAge from clock skew with Math.Max(0.0,...) - Add dedicated TurnEndFallbackFreshnessSeconds (30s) and TurnEndFallbackRecheckMs (15s) constants instead of reusing delay - Log exception message in catch block instead of silently swallowing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Multi-Model Review — PR #619
PR: fix: TurnEnd fallback checks events.jsonl freshness before completing
CI: Not available (token not configured in workflow)
Reviewers: 3 independent reviewers
Findings (ranked by severity)
| # | Severity | Consensus | Finding |
|---|---|---|---|
| 1 | 🟡 MODERATE | 3/3 | One-shot return with no re-arm — when the events.jsonl freshness guard fires (including false positives where a tool completed during the 30s wait), CompleteResponse is never called and no new fallback is armed. The processing watchdog is the only safety net, but its pre-timeout upgrade (line 2701) keeps extending the effective timeout while events.jsonl is <300s old. Net result: session stuck ~5 minutes before the watchdog rescues it, vs 34s without this PR. |
| 2 | �� MODERATE | 3/3 | Missing lastWrite > startedAt.Value anchor — the watchdog's equivalent check (line 2713) uses a two-part temporal guard documented at lines 2839-2843 ("we need BOTH checks"). The new code omits the turn-scoping anchor, so writes from a previous turn or background-task activity within the 30s window can trigger the guard incorrectly. This diverges from the established pattern hardened over 13 PRs. |
| 3 | 🟢 MINOR | 3/3 | Freshness threshold coupled to delay constant — fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0 reuses the 30s delay as the freshness window. These are semantically different ("how long to wait" vs "how fresh must the file be"). The watchdog uses dedicated constants. A tighter threshold (~10s) would reduce false positives without a re-check loop. |
| 4 | 🟢 MINOR | 2/3 | No test coverage — no behavioral or structural test verifies the guard fires/skips correctly, or that Demo/Remote modes bypass it. The codebase convention requires behavioral tests for state machine changes. |
Discarded Findings (1/3 only)
CaptureZeroIdleDiagnosticsnot called when guard fires — valid observation (diagnostic data lost for watchdog-rescued sessions), but low impact since the watchdog has its own logging. Single-reviewer finding, discarded per adversarial consensus rules.
Assessment
The intent is sound — checking events.jsonl freshness before completing catches the real scenario where ToolExecutionStartEvent is only written to disk, not delivered via the live stream. However, the implementation has two interacting issues: (1) the 30s threshold is as wide as the wait window, making false positives likely for any session where a tool completed during the wait, and (2) the bare return means false positives cause a ~5-minute regression instead of graceful degradation.
Recommended fixes (in priority order):
- Add
lastWrite > startedAt.Valueanchor to match the watchdog pattern - Either tighten the threshold to ~10s, or add a re-check loop (15s delay + recheck) before the final
return - Extract a named constant for the freshness threshold
Requesting changes for the two MODERATE findings.
Generated by Expert Code Review (auto) for issue #619
| var fileAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds; | ||
| if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0) |
There was a problem hiding this comment.
🟡 MODERATE — Missing lastWrite > startedAt.Value anchor (3/3 reviewers)
The watchdog's equivalent check (line 2713) uses a two-part temporal guard:
if (lastWrite > startedAt.Value && fileAge < WatchdogCaseBFreshnessSeconds)This is explicitly documented in the watchdog comments (lines 2839–2843):
"We need BOTH checks because: 'after turn start' alone stays true forever once any event is written; 'recent' alone could match stale files from a previous turn"
The new code uses only the second half (fileAge < threshold), omitting the lastWrite > startedAt.Value anchor. This means a write from a previous turn or background-task activity that happened to fall within the 30s window can trigger the guard and suppress completion for a session that is actually done.
Scenario: An orchestrator's background shell from a prior prompt keeps writing to events.jsonl. The current turn's TurnEnd fires, tools were used, the 34s fallback checks freshness: fileAge = 5s < 30s → guard fires → return. Session is stuck until the watchdog rescues it (~5 min later).
Suggested fix — match the established watchdog pattern:
var startedAt = state.Info.ProcessingStartedAt;
var lastWrite = File.GetLastWriteTimeUtc(ep);
var fileAge = (DateTime.UtcNow - lastWrite).TotalSeconds;
if (startedAt.HasValue && lastWrite > startedAt.Value
&& fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0)| Debug($"[IDLE-FALLBACK] '{sessionName}' skipped — events.jsonl written {fileAge:F0}s ago, CLI still active"); | ||
| return; |
There was a problem hiding this comment.
🟡 MODERATE — One-shot return with no re-arm leaves session stuck ~5 minutes (3/3 reviewers)
When this guard fires, the Task.Run lambda exits. Unlike the active-tool check at line 872 (which expects AssistantTurnStartEvent to re-arm), this guard fires at the end of the extended wait where:
- No new
TurnStartEventis coming (tool loop ended) - No
SessionIdleEventis coming (the SDK bug this fallback handles) TurnEndIdleCtsis consumed; no re-arm exists
The only remaining backstop is the processing watchdog. Traced timeline for a false-positive:
| Time | Event |
|---|---|
| T=0 | TurnEnd fires, LastEventAtTicks set |
| T=34s | Guard fires, fallback consumed (return) |
| T≤295s | Watchdog: pre-timeout upgrade (line 2701) sees fileAge < 300s → upgrades timeout from 180s→600s |
| T≈300s | fileAge crosses 300s → no upgrade. elapsed≈300≥180 → Case B: age≥300s → caseBEventsActive=false → completes |
Impact: ~266 seconds of extra stuck time (300s vs 34s). Session shows "Thinking…" for ~5 minutes with no progress.
Suggested fix — add a re-check loop instead of a bare return:
// Instead of return, re-check after a shorter interval
Debug($"[IDLE-FALLBACK] '{sessionName}' events.jsonl fresh ({fileAge:F0}s) — rechecking in 15s");
await Task.Delay(15_000, fallbackToken);
if (fallbackToken.IsCancellationRequested) return;
if (state.IsOrphaned) return;
var newAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds;
if (newAge >= TurnEndIdleToolFallbackAdditionalMs / 1000.0)
{
// File went stale — proceed to completion below
}
else
{
Debug($"[IDLE-FALLBACK] '{sessionName}' still fresh ({newAge:F0}s) — deferring to watchdog");
return;
}Alternatively, tighten the threshold to ~10s so false positives are rare enough that the watchdog rescue delay is acceptable.
| if (File.Exists(ep)) | ||
| { | ||
| var fileAge = (DateTime.UtcNow - File.GetLastWriteTimeUtc(ep)).TotalSeconds; | ||
| if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0) |
There was a problem hiding this comment.
🟢 MINOR — Consider adding lastWrite > startedAt guard for consistency with watchdog (1/2 reviewers flagged, 1/2 dismissed)
The watchdog freshness check at ~line 2713 uses a two-part guard:
if (lastWrite > startedAt.Value && fileAge < WatchdogCaseBFreshnessSeconds)This code only checks recency (fileAge < 30s), not whether the write happened after the current turn started.
Scenario: Rapid back-to-back prompts where the previous turn's tool wrote to events.jsonl just before the current turn started — the stale mtime could fall within the 30s window and cause a false skip. However, toolsUsedThisTurn=true is required to reach this code and the 30s window is tight, so the scenario is unlikely in practice.
Disputed: Reviewer 1 analyzed this same inconsistency and explicitly dismissed it: "the 30s freshness window is tight enough that a stale write from a previous turn is effectively impossible."
Suggested improvement (non-blocking):
var lastWrite = File.GetLastWriteTimeUtc(ep);
var fileAge = (DateTime.UtcNow - lastWrite).TotalSeconds;
var turnStart = state.Info.ProcessingStartedAt;
if (fileAge < TurnEndIdleToolFallbackAdditionalMs / 1000.0
&& (!turnStart.HasValue || lastWrite > turnStart.Value))When ToolExecutionStartEvent isn't delivered via the live SDK stream, ActiveToolCallCount stays 0 even though a tool IS running. For long- running tools (e.g., read_bash delay:600), the freshness check alone doesn't help because events.jsonl was written before the wait started. Fix: check GetLastEventType() in both the TurnEnd fallback and the watchdog Case B. If the last event in events.jsonl is 'tool.execution_start' (no matching tool.execution_complete), a tool is actively running — defer completion regardless of ActiveToolCallCount or file freshness. TurnEnd fallback: defers to watchdog when tool detected on disk. Watchdog Case B: resets inactivity timer and continues (same as Case A server-alive behavior), allowing the tool to run indefinitely until tool.execution_complete appears. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update INV-10 and INV-11 in processing-state-safety skill to document: - The live-event-stream gap where ToolExecutionStartEvent isn't delivered - TurnEnd fallback freshness + GetLastEventType checks (PR #619) - Watchdog Case B pre-check for in-flight tools on disk - New constants: TurnEndFallbackFreshnessSeconds, TurnEndFallbackRecheckMs Update copilot-instructions.md Processing Watchdog section with the live-event-stream gap compensation mechanisms and new diagnostic tags ([IDLE-FALLBACK], [TOOL-HEALTH]). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add _serverManager.IsServerRunning guard to watchdog Case B pre-check
so CLI crashes are detected in ~30s instead of 60 min
- Replace bare catch {} with logged catch (Exception ex) for diagnostics
- Fix split block comment in Case B (readability)
- Update INV-11 skill doc to accurately describe pre-check behavior
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9 new tests covering the events.jsonl freshness guard and watchdog Case B pre-check: - Constant reasonableness: TurnEndFallbackFreshnessSeconds, RecheckMs - Constant separation from watchdog CaseBFreshnessSeconds - Turn-start anchor (freshnessCheckStart captured before delay) - Recheck loop (TurnEndFallbackRecheckMs + defer to watchdog) - Clock skew protection (Math.Max on both fileAge calculations) - GetLastEventType check for tool.execution_start - Demo/Remote mode bypass - Server liveness guard on watchdog Case B pre-check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
✅ Expert Code Review completed successfully! |
- Change submit-pull-request-review to COMMENT-only (was COMMENT + REQUEST_CHANGES). gh-aw doesn't support pull-requests:write, so blocking reviews from bot re-runs can't be auto-dismissed — they persist as stale blocks even after all findings are fixed. - Add concurrency groups to both review workflows to cancel duplicate runs on the same PR. - Update prompt instructions to always use COMMENT event. Fixes the issue on PR #619 where a CHANGES_REQUESTED review from commit 1 persisted after commits 2-6 fixed all findings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Consolidated Deep Review — PR #619 (3 parallel reviewers)Verdict: ✅ Approve with non-blocking suggestionsCode Review (✅ Approve)Both hunks are correct and well-reasoned: Hunk A — events.jsonl freshness guard:
Hunk B — ProcessingGeneration increment in CompleteResponse:
All 18 invariants checked — no violations:
False negative analysis: If CLI writes heartbeats keeping fileAge < 30s but is truly stuck, the fallback won't fire — but the watchdog's 600s Case C timeout catches this. Acceptable. Test Review (
|
| Missing coverage | Description |
|---|---|
| Happy path | Fresh events.jsonl → fallback skips completion |
| Sad path | Stale events.jsonl → fallback proceeds |
| File missing | File.Exists() returns false → proceeds |
| Demo/Remote | Guard skipped entirely |
| Filesystem error | catch block → proceeds |
| Threshold boundary | fileAge exactly equals 30s |
Suggestion: Extract the file-age check into a testable static method and add behavioral tests, or at minimum add a structural guard test verifying the events.jsonl check string exists in the fallback path.
Documentation Review (⚠️ Gap — non-blocking)
No documentation files were modified. Suggested updates:
- INV-10 in SKILL.md — add the events.jsonl freshness check as the third guard in the TurnEnd fallback chain:
ActiveToolCallCount > 0→ extended delay → events.jsonl freshness → complete - Diagnostic tags — document the new
[IDLE-FALLBACK] skipped — events.jsonl written Xs agomessage - Regression history — add fix: TurnEnd fallback checks events.jsonl freshness before completing #619 to the timeline
- CompleteResponse generation increment — document in the cleanup checklist
Summary
The core fix is correct, surgical, and doesn't introduce any new state fields or IsProcessing paths. Both hunks strengthen existing safety mechanisms. The test and documentation gaps are real but non-blocking given the fix's additive-only nature and the watchdog safety net.
…ssion history 7 new behavioral tests for GetLastEventType (the static method used by the TurnEnd fallback's last-event-type check): - tool.execution_start as last event → detected - session.idle as terminal event → detected - file doesn't exist → returns null - empty file → returns null - invalid JSON → returns null (fail-open) - trailing blank lines → ignored - FileShare.ReadWrite → concurrent read works Update regression history: add #612 and #619. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o skills (#635) ## Problem The gh-aw review workflow posts `CHANGES_REQUESTED` reviews that persist even after all findings are fixed in subsequent commits. Since gh-aw enforces `pull-requests: read` (no write permission), the workflow cannot auto-dismiss stale reviews on re-runs. This caused PR #619 to be blocked by a stale review from commit 1 even though commits 2-6 fixed all 4 findings. ## Fix 1. **COMMENT-only reviews** — Changed `submit-pull-request-review` from `allowed-events: [COMMENT, REQUEST_CHANGES]` to `allowed-events: [COMMENT]`. The review body still communicates severity through findings; it just doesn't block merging. 2. **Concurrency groups** — Added `concurrency` with `cancel-in-progress: true` to both review workflows, preventing duplicate runs on the same PR. 3. **Updated prompt instructions** — Agent now always uses `COMMENT` event; never `APPROVE` or `REQUEST_CHANGES`. ## Why not auto-dismiss? gh-aw's security model prohibits `pull-requests: write` — all write ops go through safe-outputs. There's no `dismiss-pull-request-review` safe output type, so the workflow can't programmatically dismiss old reviews. ## Also done Dismissed the stale `CHANGES_REQUESTED` review on PR #619 manually. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n-drift workflow (#653) ## Summary Comprehensive gh-aw infrastructure for PolyPilot — skills, security enforcement, drift detection, and workflow hardening. Synced with [dotnet/maui#35027](dotnet/maui#35027) and expanded with PolyPilot-specific security tooling. ## What's Included ### gh-aw-guide skill (`.github/skills/gh-aw-guide/`) **SKILL.md** — Quick-start reference: - 21-row anti-patterns table (manual reimplementations → built-in alternatives) - LabelOps (`label_command:` vs `names:` filtering) - Concurrency race condition warnings for `slash_command` - "Approve and run" gate risk documentation - 4 security-critical patterns with code examples - `checkout: false`, `rate-limit:`, `web-fetch` tool docs **references/architecture.md** — Deep reference: - Execution model with credential availability matrix - Authorization model (`on.roles:` defaults, `roles: all` danger) - Dangerous Triggers Checklist (what gh-aw#23769 fixed and what it didn't) - Security boundaries (9 defense layers table) - Integrity filtering hierarchy - Fork PR handling (5-trigger behavior matrix) - Safe outputs quick reference (30+ types) - Troubleshooting table (12 entries) **scripts/Check-WorkflowSecurity.ps1** — Enforcement scanner: - 8 rules: `pull_request_target` without integrity, `roles: all` on PR workflows, `workflow_run` without branches, `slash_command` + `cancel-in-progress: true`, missing `allowed-events` on reviews, missing `protected-files`, workspace script execution after checkout - Tested against our own workflows — found and fixed 2 issues ### instruction-drift skill (`.github/skills/instruction-drift/`) **SKILL.md** — Drift detection for instruction files: - Coverage gaps tracking, index crawling, issue discovery - P0-P3 classification (factually wrong → nice-to-have) **scripts/Check-Staleness.ps1** (646 lines): - Content hashing (SHA256) to detect doc page changes - Index crawling (`Get-IndexPageLinks`) to discover untracked pages - Issue state comparison (actual vs expected from manifest) - Release tracking (`Get-GitHubLatestRelease`) - Recently closed issue discovery (90-day window) **scripts/Scan-GhAwUpdates.ps1** — Upstream knowledge extraction: - Mines `github/gh-aw` commits for high-signal changes - Watermark-based (only processes new commits per run) - Categorizes: safe-output, trigger, compiler, security, engine, breaking - Samples shared/ workflow configs for real-world patterns ### Workflow hardening - **COMMENT-only reviews** — `allowed-events: [COMMENT]` on all review workflows (prevents stale `CHANGES_REQUESTED` reviews that can't be auto-dismissed — [upstream gap documented](github/gh-aw#25439)) - **Concurrency groups** — Cross-workflow collision (intentional, documented) - **dep-update.md** — Added `protected-files: fallback-to-issue` - **Slim instructions** — 26KB → 34 lines referencing the skill ### Sync manifest `.github/instructions/gh-aw-workflows.sync.yaml` tracks 8 doc pages, 5 issues, and gh-aw releases for drift detection. ## Motivation - Stale `CHANGES_REQUESTED` review blocked [PR #619](#619) — all 4 findings were fixed but the bot review persisted - gh-aw docs are 20+ pages and change frequently — skills + drift detection keep knowledge current - Security scanner catches dangerous patterns before they ship - Synced with dotnet/maui#35027 for cross-repo consistency --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removed based on analysis of 3 actual review runs (PRs #619, #639, #635): - Removed 3x redundant 'Read copilot-instructions.md' — the Copilot engine auto-loads it as system context, and sub-agents found deep domain bugs without being told to read it - Removed 'You are a thorough PR reviewer for PolyPilot' — generic preamble that adds tokens without changing behavior - Removed MCP tool usage hint ('not gh CLI — credentials are scrubbed') — the agent figures this out from tool availability - Removed duplicate path validation instruction - Removed verbose explanation of why COMMENT not REQUEST_CHANGES — one line is sufficient - Simplified sub-agent prompt from repo-specific to generic 72 lines → 60 lines, same review quality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…abs learnings (#656) Based on analysis of 3 actual review runs (PRs #619, #639, #635), several prompt instructions are redundant and add tokens without improving output quality. **What was removed:** - 3x "Read copilot-instructions.md" — engine auto-loads it; sub-agents found deep domain bugs without it - Generic preamble ("You are a thorough PR reviewer for PolyPilot") - MCP tool usage hint (agent discovers available tools) - Duplicate path validation instruction - Verbose REQUEST_CHANGES explanation (one line is sufficient) - Repo-specific sub-agent prompt → generic ("expert code reviewer" not "expert PolyPilot code reviewer") **What was kept:** - Security warning (treat PR content as untrusted) — changes behavior - No test messages rule — prevents permanent damage - Full adversarial consensus methodology - Path/line validation rules - COMMENT-only policy 72 lines → 60 lines. Same review quality — the evidence from real runs shows the agent finds domain issues from reading code, not from hint bullets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
The ReliabilityTest session stopped sending messages and appeared stuck. Root cause: the TurnEnd→Idle fallback timer prematurely completed the session while a
read_bashtool withdelay: 120was still executing.Timeline:
assistant.turn_end→ starts fallback timer (4s + 30s tool extension = 34s)assistant.turn_start→ cancels the fallback ✅assistant.message+tool.execution_start(read_bash delay:120) — butToolExecutionStartEventwas NOT delivered via the live SDK event stream (only written to events.jsonl)turn_end(from the same sub-turn sequence) starts a NEW fallbackActiveToolCallCount→ 0 (becauseTurnStartEventreset it at step 2, and step 3's increment never arrived)CompleteResponse→IsProcessing = falseTurnStartEvents are rejected as[EVT-REARM-SKIP]("arrived after explicit completion")Root Cause
The live SDK event stream and
events.jsonlare separate channels.ToolExecutionStartEventmay appear inevents.jsonl(written by the CLI) but not be delivered toHandleSessionEvent(live stream). The TurnEnd fallback only checksActiveToolCallCount(set by the live stream handler), not whether the CLI is still actively writing.Fix
After the extended 30s wait in the TurnEnd fallback, check
events.jsonllast-write time before completing. If the CLI wrote to it recently (within the 30s window), the session is still active — skip completion. This is the same freshness pattern already used by the watchdog (useUsedToolsTimeoutupgrade at line 2675-2698).Testing