Executive Summary
4 workflow failures detected in the 6-hour window ending 2026-04-27 ~08:00 UTC. Two failures require actionable fixes (P0/P1); two are transient or partially-successful. One sub-issue created for the blocking P0 configuration error.
Failure Clusters
| Workflow |
Run |
Engine |
Root Cause |
Priority |
Tracking |
| GitHub Remote MCP Authentication Test |
§24976660123 |
Copilot |
400 The requested model is not supported — workflow requests a model not available on the subscription tier |
P1 |
#28660 |
| Documentation Unbloat |
§24975734231 |
Claude Code |
Execute Claude Code CLI timed out after 30 minutes — container post-processing (threat detection + artifact upload) took 19 min after claude-code exited; orphan process awf-cmd-1.sh killed at deadline |
P2 |
#28659 |
| Daily CLI Tools Exploratory Tester |
§24978441315 |
Copilot |
API rate limit exceeded for installation — safe_outputs create_issue failed after 4 attempts; agent's valid bug report (compile --workflow_name naming inconsistency) was lost |
P1 |
(none) |
| Schema Feature Coverage Checker |
§24981796377 |
Codex |
All 10 create_pull_request calls blocked: patch touches .github/workflows/schema-demo-*.md which are protected files by default |
P0 |
#28671, #28674 |
Evidence
GitHub Remote MCP Auth Test — model not supported
From agent-stdio.log of run §24976660123:
400 The requested model is not supported.
Copilot-driver message:
model not supported — not retrying (the requested model is unavailable for this subscription tier;
specify a supported model in the workflow frontmatter)
Duration: 1.8 minutes. Agent never started. 1 error, 0 turns.
Documentation Unbloat — 30-minute step timeout
From 5_agent.txt (agent job log) for run §24975734231:
##[error]The action 'Execute Claude Code CLI' has timed out after 30 minutes.
Timeline:
- Container started: ~03:56 UTC
- Claude Code agent finished (exit 0): 04:07:31 UTC (11 min)
- Container still running (post-processing): 04:07–04:26 UTC (19 more minutes)
- Step timeout hit: 04:26:32 UTC (30 min total)
- Orphan processes killed:
awf-cmd-1.sh, bash
PR was successfully created: branch docs/unbloat-daily-ops-2a3a65767cbfabee, PR #28658. The agent's actual work completed; the failure is an instrumentation-level false positive.
Daily CLI Tools Exploratory Tester — API rate limit in safe_outputs
From 1_safe_outputs.txt for run §24978441315:
##[warning]create_issue in github/gh-aw failed (attempt 1/4): API rate limit exceeded for installation.
##[warning]create_issue in github/gh-aw failed (attempt 2/4): API rate limit exceeded for installation.
##[warning]create_issue in github/gh-aw failed (attempt 3/4): API rate limit exceeded for installation.
##[error]✗ Failed to create issue "[cli-tools-test] compile tool: using `--workflow_name`..."
All 4 attempts at 36–53s retry intervals exhausted. The agent successfully identified a real bug (the compile tool uses --workflows while logs uses --workflow_name, causing a cryptic MCP schema error), but the report was never filed.
Agent itself concluded "success" (copilot-driver exit 0); the safe_outputs job failed, which caused the overall run conclusion to be "failure".
Schema Feature Coverage Checker — protected files block all PRs
From 0_conclusion.txt (conclusion job) for run §24981796377:
GH_AW_CODE_PUSH_FAILURE_COUNT: 10
Each of 10 branches failed with:
Cannot create pull request: patch modifies protected files (.github/workflows/schema-demo-*.md).
Add them to the allowed-files configuration field or set protected-files: fallback-to-issue.
The agent correctly identified 10 uncovered schema fields and prepared valid patches; all were blocked by the default protected-files policy which covers **.github/workflows/**.
Existing Issue Correlation
Proposed Fix Roadmap
| Priority |
Item |
Effort |
| P0 |
Schema Feature Coverage Checker: add .github/workflows/schema-demo-*.md to allowed-files (sub-issue #28674) |
Low |
| P1 |
GitHub Remote MCP Auth Test: update frontmatter to supported Copilot model (see #28660) |
Low |
| P1 |
Daily CLI Tools: investigate installation API rate limit budget — safe_outputs create_issue rate-limited at 05:46–05:49 UTC |
Medium |
| P2 |
Documentation Unbloat: review 30-min step timeout for 58-turn runs — either raise timeout or optimize post-processing |
Medium |
Sub-Issues Created
References:
6h Window Update — 2026-04-27 ~07:13–13:13 UTC
Overview
37 runs in window (16 Claude, 17 Copilot, 4 Codex) · 0 classified failures · 3 runs with error_count > 0 · all individual failures auto-tracked via [aw] issues · overall health improved vs. prior window.
Failure Clusters
| Pattern |
Affected workflows |
Runs |
Tracking |
node: command not found (Copilot engine) |
Daily News, Daily Issues Report Generator |
§24986870660, §24990655972 |
Sub-issue #aw_node1 |
codex: command not found (Codex engine) |
Daily Fact About gh-aw |
§24992928191 |
#28703 |
| No safe outputs emitted |
Package Specification Enforcer |
§24991256961 |
#28692 |
Missing mcp__playwright__browser_run_code |
Multi-Device Docs Tester |
§24994599602 |
#28717 |
Missing agentic-workflows MCP status tool |
Daily Rendering Scripts Verifier |
§24992350068 |
(not tracked) |
| Broken links in Copilot PRs (starlight-links-validator) |
Visual Regression Checker |
§24993828520, §24995461013 |
#28677 |
Previously Tracked Items — Status
| Item |
Status |
| P0 Schema Feature Coverage Checker (#28674) |
Open — config fix not yet merged |
| P1 GitHub Remote MCP Auth Test (#28660) |
Open — no retry observed in window |
| P1 Daily CLI Tools rate limit |
No recurrence in current window |
| P2 Documentation Unbloat (#28659) |
No recurrence in current window |
Observability
Firewall block rate: 15% (139/916 requests blocked) — improved significantly from the 48% noted in §24978441315. Dominant blocked domain: (unknown) (119 requests); proxy.golang.org blocked in Refactoring Cadence (10 req) despite being in the global allowlist — per-workflow firewall config likely more restrictive; run completed successfully.
Sub-Issues Added
- #aw_node1 — Engine binary missing at runner startup (node/codex not found) — P1
References:
- §24992928191 — Daily Fact About gh-aw (codex not found)
- §24991256961 — Package Specification Enforcer (no safe output)
- §24992350068 — Daily Rendering Scripts Verifier (missing aw-mcp status tool)
Generated by [aw] Failure Investigator (6h) · ● 631.2K · ◷
6h Window Update — 2026-04-27 ~19:30 – 2026-04-28 ~01:30 UTC
Overview
2 auto-generated failure issues in the window. 1 is a true failure (recurrence of existing P1); 1 is a likely false positive (agent completed, workflow still concluded failure). No P0 failures; no new blocking issues.
Failure Clusters
| Workflow |
Run |
Engine |
Conclusion |
Root Cause |
Priority |
Tracking |
| Go Logger Enhancement |
§25020571393 |
Claude Code |
failure |
Engine killed at 21:46:43 UTC mid-API-call; mcpscripts.make called twice immediately before kill (21:46:19, 21:46:37) |
P1 |
Recurrence of #28653 |
| Agentic Workflow Audit Agent |
§25019817167 |
Claude Code |
failure |
Agent completed (terminal_reason: completed, 53 turns, $1.91, created discussion #28804) but workflow concluded failure; auto-issue #28806 created — likely false positive |
P1 |
New — sub-issue #aw_audit1 |
Evidence
Go Logger Enhancement — engine killed mid-session (§25020571393)
From agent-stdio.log:
2026-04-27T21:46:19.739Z mcpscripts.make called (call 1)
2026-04-27T21:46:37.706Z mcpscripts.make called (call 2)
2026-04-27T21:46:43.546Z [DEBUG] autocompact: tokens=58283 threshold=167000
2026-04-27T21:46:43.548Z [DEBUG] [API REQUEST] /v1/messages source=sdk
Log ends abruptly — no API response, no terminal_reason. Token count (58K) is well below the 167K compaction threshold, ruling out context pressure. mcpscripts.make was invoked (MCP was alive) then engine was killed 6s later during the next API call. Pattern matches #28653.
Agentic Workflow Audit Agent — agent success, workflow failure (§25019817167)
From agent-stdio.log:
2026-04-27T21:44:49.870Z create_discussion completed successfully in 76ms
2026-04-27T21:45:00.080Z {"type":"result","subtype":"success","terminal_reason":"completed","num_turns":53,"total_cost_usd":1.91}
Agent completed 53 turns, created discussion #28804 (audits category), exited cleanly. Auto-issue #28806 was created at 21:47 UTC (2 min later) with "Engine Failure: The claude engine terminated unexpectedly." The harness auto-issue body contains the complete terminal_reason: completed JSON, yet still fired the failure signal. This indicates the workflow's GitHub Actions conclusion was set to failure by a step other than the agent job — possibly the conclusion/safe-outputs job or a post-processing step.
Previously Tracked Items — Status
| Item |
Status |
| P1 Go Logger MCP timeout (#28653) |
Recurrence confirmed — run §25020571393 shows identical failure signature |
| P0 Schema Feature Coverage protected-files (#28674) |
Open — config fix not yet merged |
| P1 Copilot/Codex binary missing (#28726) |
Active — Issue Monster triaged (22:54 UTC), firewall tracking linked |
| Design Decision Gate safeoutputs drop (#28740) |
Outside window; auto-issue expires 2026-04-28 02:29 UTC |
Sub-Issues Added
References:
Generated by [aw] Failure Investigator (6h) · ● 347.8K · ◷
Executive Summary
4 workflow failures detected in the 6-hour window ending 2026-04-27 ~08:00 UTC. Two failures require actionable fixes (P0/P1); two are transient or partially-successful. One sub-issue created for the blocking P0 configuration error.
Failure Clusters
400 The requested model is not supported— workflow requests a model not available on the subscription tierExecute Claude Code CLI timed out after 30 minutes— container post-processing (threat detection + artifact upload) took 19 min after claude-code exited; orphan processawf-cmd-1.shkilled at deadlineAPI rate limit exceeded for installation— safe_outputscreate_issuefailed after 4 attempts; agent's valid bug report (compile--workflow_namenaming inconsistency) was lostcreate_pull_requestcalls blocked: patch touches.github/workflows/schema-demo-*.mdwhich are protected files by defaultEvidence
GitHub Remote MCP Auth Test — model not supported
From
agent-stdio.logof run §24976660123:Copilot-driver message:
Duration: 1.8 minutes. Agent never started. 1 error, 0 turns.
Documentation Unbloat — 30-minute step timeout
From
5_agent.txt(agent job log) for run §24975734231:Timeline:
awf-cmd-1.sh,bashPR was successfully created: branch
docs/unbloat-daily-ops-2a3a65767cbfabee, PR #28658. The agent's actual work completed; the failure is an instrumentation-level false positive.Daily CLI Tools Exploratory Tester — API rate limit in safe_outputs
From
1_safe_outputs.txtfor run §24978441315:All 4 attempts at 36–53s retry intervals exhausted. The agent successfully identified a real bug (the
compiletool uses--workflowswhilelogsuses--workflow_name, causing a cryptic MCP schema error), but the report was never filed.Agent itself concluded "success" (copilot-driver exit 0); the safe_outputs job failed, which caused the overall run conclusion to be "failure".
Schema Feature Coverage Checker — protected files block all PRs
From
0_conclusion.txt(conclusion job) for run §24981796377:Each of 10 branches failed with:
The agent correctly identified 10 uncovered schema fields and prepared valid patches; all were blocked by the default protected-files policy which covers
**.github/workflows/**.Existing Issue Correlation
[aw-failures]issue. Transient; if it recurs, file an infrastructure issue for the installation rate limit budget.Proposed Fix Roadmap
.github/workflows/schema-demo-*.mdtoallowed-files(sub-issue #28674)Sub-Issues Created
References:
6h Window Update — 2026-04-27 ~07:13–13:13 UTC
Overview
37 runs in window (16 Claude, 17 Copilot, 4 Codex) · 0 classified failures · 3 runs with
error_count > 0· all individual failures auto-tracked via[aw]issues · overall health improved vs. prior window.Failure Clusters
node: command not found(Copilot engine)codex: command not found(Codex engine)mcp__playwright__browser_run_codeagentic-workflowsMCP status toolPreviously Tracked Items — Status
Observability
Firewall block rate: 15% (139/916 requests blocked) — improved significantly from the 48% noted in §24978441315. Dominant blocked domain:
(unknown)(119 requests);proxy.golang.orgblocked in Refactoring Cadence (10 req) despite being in the global allowlist — per-workflow firewall config likely more restrictive; run completed successfully.Sub-Issues Added
References:
6h Window Update — 2026-04-27 ~19:30 – 2026-04-28 ~01:30 UTC
Overview
2 auto-generated failure issues in the window. 1 is a true failure (recurrence of existing P1); 1 is a likely false positive (agent completed, workflow still concluded
failure). No P0 failures; no new blocking issues.Failure Clusters
mcpscripts.makecalled twice immediately before kill (21:46:19, 21:46:37)terminal_reason: completed, 53 turns, $1.91, created discussion #28804) but workflow concludedfailure; auto-issue #28806 created — likely false positiveEvidence
Go Logger Enhancement — engine killed mid-session (§25020571393)
From
agent-stdio.log:Log ends abruptly — no API response, no
terminal_reason. Token count (58K) is well below the 167K compaction threshold, ruling out context pressure.mcpscripts.makewas invoked (MCP was alive) then engine was killed 6s later during the next API call. Pattern matches #28653.Agentic Workflow Audit Agent — agent success, workflow failure (§25019817167)
From
agent-stdio.log:Agent completed 53 turns, created discussion #28804 (
auditscategory), exited cleanly. Auto-issue #28806 was created at 21:47 UTC (2 min later) with "Engine Failure: The claude engine terminated unexpectedly." The harness auto-issue body contains the completeterminal_reason: completedJSON, yet still fired the failure signal. This indicates the workflow's GitHub Actions conclusion was set tofailureby a step other than the agent job — possibly the conclusion/safe-outputs job or a post-processing step.Previously Tracked Items — Status
Sub-Issues Added
References: