[aw-failures] [aw] Failure Investigator (6h) - Issue Group

# [aw] Failure Investigator (6h)

Parent issue for grouping related issues from [[aw] Failure Investigator (6h)](#).



Sub-issues are automatically linked below (max 64 per parent).


> Workflow: [[aw] Failure Investigator (6h)]()
> - [x] expires  on Apr 27, 2026, 1:18 AM UTC

---

---

### 6h Cycle Update — 2026-04-19 07:09 UTC ([§24623487281](https://github.com/github/gh-aw/actions/runs/24623487281))

#### Executive Summary

47 runs in the last 6h (36 copilot, 11 claude). **5 failures** detected; 2 are new untracked bugs.

#### Failure Cluster Table

| Run | Workflow | Engine | Failure Mode | Status |
|-----|----------|--------|-------------|--------|
| [§24623096622](https://github.com/github/gh-aw/actions/runs/24623096622) | Issue Monster | copilot | `assign_to_agent` uses `issue-number` instead of `issue_number` (3 errors) | **New → sub-issue created** |
| [§24622541959](https://github.com/github/gh-aw/actions/runs/24622541959) | Artifacts Summary | copilot | Engine terminated unexpectedly (Copilot stuck in Read loop) | Tracked in #27155 |
| [§24622002167](https://github.com/github/gh-aw/actions/runs/24622002167) | Contribution Check | copilot | `add_labels` missing `item_number` (no issue/PR number) | **New → sub-issue created** |
| [§24620886472](https://github.com/github/gh-aw/actions/runs/24620886472) | GitHub Remote MCP Auth Test | copilot | Transient Copilot API server errors (5 retries) | Transient — not tracked |
| [§24618004586](https://github.com/github/gh-aw/actions/runs/24618004586) | Contribution Check | copilot | `Too many noop items. Maximum allowed: 1.` | **Covered in Contribution Check sub-issue** |

#### Existing Issue Correlation

- **Codex 401** (#27127): Codex engine workflows (ai-moderator, daily-observability-report) did not run in this 6h window — issue remains open and unresolved.
- **Artifacts Summary** (#27155): Already tracked, expires today.
- **Lock files out of sync** (#27140): Ongoing, unrelated to agent failures.

#### Sub-Issues Created This Cycle

Two new sub-issues created and linked to this parent:
1. Issue Monster: `assign_to_agent` field naming bug (`issue-number` vs `issue_number`)
2. Contribution Check: recurring safe-output validation failures (noop limit + `add_labels` missing number)

**Note:** The `agenticworkflows` CLI bridge `count` integer parameter bug (#27149) was independently discovered by `daily-cli-tools-tester` — this is a known issue.

> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24623487281/agentic_workflow) · ● 719.8K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)



---

---

### 6h Cycle Update — 2026-04-19 13:09 UTC ([§24629827117](https://github.com/github/gh-aw/actions/runs/24629827117))

#### Executive Summary

33 runs in the last 6h (26 success, 5 failure, 2 in-progress). **5 failures** detected across 3 engines. 4 already have existing tracking; 1 new pattern identified (false-failure in Multi-Device Docs Tester).

#### Failure Cluster Table

| Run | Workflow | Engine | Duration | Failure Mode | Tracked |
|-----|----------|--------|----------|-------------|---------|
| [§24626840251](https://github.com/github/gh-aw/actions/runs/24626840251) | Daily Issues Report Generator | copilot | 5m | `node: command not found` | #27165 |
| [§24627195348](https://github.com/github/gh-aw/actions/runs/24627195348) | Daily Community Attribution Updater | copilot | 32.9m | Permission denied running Python script | #27173 |
| [§24628420682](https://github.com/github/gh-aw/actions/runs/24628420682) | Duplicate Code Detector | codex | 2.3m | 401 Unauthorized — OPENAI_API_KEY missing/expired | #27177 / #27127 |
| [§24628589681](https://github.com/github/gh-aw/actions/runs/24628589681) | Multi-Device Docs Tester | claude | 6.9m | Agent succeeded (all 10 devices OK), `safe_outputs` failed — `upload_artifact` error + 60% firewall block rate | **New → #aw_mdtfail** |
| [§24629165601](https://github.com/github/gh-aw/actions/runs/24629165601) | GitHub MCP Remote Server Tools Report Generator | claude | 9.2m | Protected files + discussion creation failed | #27185 |

#### Existing Issue Correlation

- **Codex 401** (#27127): Still unresolved P0 — another Duplicate Code Detector failure confirms OPENAI_API_KEY is still missing
- **Copilot `node: command not found`** (#27165): Recurring Copilot runner environment issue — node binary absent from execution context
- **Copilot permission denied** (#27173): Recurring Copilot environment issue — Python script cannot execute
- **GitHub MCP Remote Tools** (#27185): Protected files issue — configure `protected-files: fallback-to-issue` or add `.github/aw/github-mcp-server.md` to `allowed-files`

#### Sub-Issues Created This Cycle

1. **#aw_mdtfail** — Multi-Device Docs Tester: `upload_artifact` fails in `safe_outputs` job despite agent reporting all 10 device tests passed. 60% firewall block rate from Chrome/Playwright Google domain calls (SafeBrowsing, telemetry).

#### Key Observations

- 15% failure rate (5/33) consistent with previous cycles
- Codex 401 (#27127) remains **unresolved P0** — every Codex workflow continues to fail
- Multi-Device Docs Tester is a **false-failure**: agent completes successfully but workflow reports failure due to `upload_artifact` mismatch
- Copilot Copilot environment crashes (`node: command not found`, permission denied) affect 2 workflows — possible regression in Copilot CLI v1.0.21 or runner setup

> [!NOTE]
> <details>
> <summary>🔒 Integrity filter blocked 1 item</summary>
>
> The following item were blocked because they don't meet the GitHub integrity level.
>
> - [#19099](https://github.com/github/gh-aw/issues/19099) `search_issues`: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
>
> To allow these resources, lower `min-integrity` in your GitHub frontmatter:
>
> ```yaml
> tools:
>   github:
>     min-integrity: approved  # merged | approved | unapproved | none
> ```
>
> </details>


> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24629827117/agentic_workflow) · ● 269.2K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)



---

---

### 6h Cycle Update — 2026-04-19 19:09 UTC ([§24636791490](https://github.com/github/gh-aw/actions/runs/24636791490))

#### Executive Summary

15 runs in the last 6h (10 copilot, 5 claude). **0 failures** — all completed runs concluded `success`. No new untracked failure patterns detected. Previous cycles' failures (at 07:09 and 13:09 UTC) are fully tracked.

#### Run Summary

| Run | Workflow | Engine | Duration | Turns | Conclusion |
|-----|----------|--------|----------|-------|------------|
| [§24633926754](https://github.com/github/gh-aw/actions/runs/24633926754) | Smoke CI | copilot | 1.8m | 0 | ✅ success |
| [§24634412711](https://github.com/github/gh-aw/actions/runs/24634412711) | Contribution Check | copilot | 5.5m | 19 | ✅ success |
| [§24635766604](https://github.com/github/gh-aw/actions/runs/24635766604) | Q | copilot | 12.7m | 61 | ✅ success |
| [§24635898491](https://github.com/github/gh-aw/actions/runs/24635898491) | Design Decision Gate | claude | 3.9m | 7 | ✅ success |
| [§24635898531](https://github.com/github/gh-aw/actions/runs/24635898531) | Test Quality Sentinel | copilot | 6.0m | 19 | ✅ success |
| [§24635906916](https://github.com/github/gh-aw/actions/runs/24635906916) | Smoke CI | copilot | 2.1m | 0 | ✅ success |
| [§24635987425](https://github.com/github/gh-aw/actions/runs/24635987425) | PR Triage Agent | copilot | 4.2m | 7 | ✅ success |
| [§24635994760](https://github.com/github/gh-aw/actions/runs/24635994760) | Test Quality Sentinel | copilot | 3.2m | 5 | ✅ success |
| [§24635994763](https://github.com/github/gh-aw/actions/runs/24635994763) | Design Decision Gate | claude | 3.8m | 9 | ✅ success |
| [§24635997175](https://github.com/github/gh-aw/actions/runs/24635997175) | Smoke CI | copilot | 2.1m | 0 | ✅ success |
| [§24636300105](https://github.com/github/gh-aw/actions/runs/24636300105) | CI Failure Doctor | claude | 4.6m | 10 | ✅ success |
| [§24636318510](https://github.com/github/gh-aw/actions/runs/24636318510) | Test Quality Sentinel | copilot | 4.5m | 7 | ✅ success |
| [§24636318512](https://github.com/github/gh-aw/actions/runs/24636318512) | Design Decision Gate | claude | 4.0m | 11 | ✅ success |
| [§24636386852](https://github.com/github/gh-aw/actions/runs/24636386852) | Auto-Triage Issues | copilot | 3.7m | 9 | ✅ success |

#### Quality Signals (no failures, but worth monitoring)

<details>
<summary>View Efficiency Flags</summary>

**Q workflow** ([§24635766604](https://github.com/github/gh-aw/actions/runs/24635766604)) — `resource_heavy_for_domain` (HIGH), `poor_agentic_control` (MEDIUM):
- 61 turns, 4.2M tokens, 12.7m for an issue_response task
- 1 firewall-blocked request to `invalid.example.invalid:443` (likely URL from issue content — blocked correctly)
- ~95% of turns were data-gathering; agentic_fraction=0.05

**Design Decision Gate** ([§24636318512](https://github.com/github/gh-aw/actions/runs/24636318512)) — `resource_heavy_for_domain` (HIGH):
- `github.pull_request_read` called 11 times across 11 turns (same PR re-fetched every turn)
- Integrity filter blocked `pull_request_read` 3 times; agent retried each block rather than branching

**Contribution Check** ([§24634412711](https://github.com/github/gh-aw/actions/runs/24634412711)) — `resource_heavy_for_domain` (HIGH):
- 19 inference-only turns, 0 external tool calls — agent re-reasoned over static context repeatedly

</details>

#### Existing Issue Correlation

- **Codex 401** (#27127): No Codex runs in this 6h window — issue remains open and unresolved P0.
- **Other tracked failures** (#27165, #27173, #27177, #27185): No recurrence in this window.

#### Sub-Issues Created This Cycle

None. All runs succeeded; efficiency flags are recurring patterns not yet meeting threshold for new sub-issues.

> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24636791490/agentic_workflow) · ● 9.0M total tokens across window

> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24636791490/agentic_workflow) · ● 341.1K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)



---

---

### 6h Cycle Update — 2026-04-20 01:12 UTC ([§24643864137](https://github.com/github/gh-aw/actions/runs/24643864137))

#### Executive Summary

29 runs in the last 6h (19 copilot, 9 claude, 1 codex). **2 real failures** detected at 23:44 UTC; root cause is GitHub App installation rate limit exhaustion from concurrent scheduling. 1 Smoke CI cancel was benign (push-superseded).

#### Failure Cluster Table

| Run | Workflow | Engine | Failure Mode | Status |
|-----|----------|--------|-------------|--------|
| [§24642041999](https://github.com/github/gh-aw/actions/runs/24642041999) | Daily Observability Report for AWF Firewall and MCP Gateway | codex | `API rate limit exceeded for installation` at guard policy init | **New → sub-issue created** |
| [§24642045134](https://github.com/github/gh-aw/actions/runs/24642045134) | Daily Safe Output Tool Optimizer | claude | exit code 1, 0 tokens — likely same rate limit event (same SHA, same time) | **Covered in same sub-issue** |
| [§24642364577](https://github.com/github/gh-aw/actions/runs/24642364577) | Smoke CI | copilot | Cancelled — push superseded by newer push; next run succeeded | Transient — not tracked |

#### Root Cause Detail

Both failures fired at 23:44 UTC on 2026-04-19 on the same SHA (`5285e620`). The Daily Observability Report emitted an explicit rate limit error: `Failed to determine automatic guard policy: API rate limit exceeded for installation` (Request ID: `BC40:E2DAD:30788C:C36940:69E5694A`). The Daily Safe Output Tool Optimizer failed with just `exit code 1` and 0 agent turns — audit confirmed no behavioral regression vs. prior success, pointing to the same pre-agent infrastructure failure.

#### Existing Issue Correlation

- **Codex 401** (#27127): Still open P0. No new Codex 401 runs in this window; rate limit issue is a separate root cause from the 401 auth failure tracked there.
- **All other open issues**: No recurrence in this window.

#### Sub-Issues Created This Cycle

1. **#aw_ratelim1** — GitHub App installation rate limit exhaustion from co-scheduled workflows — proposes staggering cron schedules and adding retry with backoff at guard policy layer.

> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24643864137/agentic_workflow)

> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24643864137/agentic_workflow) · ● 328.8K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)



---

---

### 6h Cycle Update — 2026-04-20 07:25 UTC ([§24653877408](https://github.com/github/gh-aw/actions/runs/24653877408))

#### Executive Summary

36 runs in the last 6h (17 copilot, 5 claude, 7 codex). **7 failures** detected — all Codex engine `401 Unauthorized` errors from `api.openai.com`. No new failure patterns; all failures are covered by existing [#27127](https://github.com/github/gh-aw/issues/27127) (Codex 401 P0) and [#27233](https://github.com/github/gh-aw/issues/27233) (AI Moderator). 3 cancelled runs were benign (PR-guard skips).

#### Failure Cluster Table

| Run | Workflow | Engine | Duration | Failure Mode | Tracked |
|-----|----------|--------|----------|-------------|---------|
| [§24652594821](https://github.com/github/gh-aw/actions/runs/24652594821) | AI Moderator | codex | 9.4m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27233 |
| [§24649110190](https://github.com/github/gh-aw/actions/runs/24649110190) | AI Moderator | codex | 16.4m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27233 |
| [§24649107899](https://github.com/github/gh-aw/actions/runs/24649107899) | AI Moderator | codex | 9.7m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27233 |
| [§24649050664](https://github.com/github/gh-aw/actions/runs/24649050664) | AI Moderator | codex | 9.5m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27233 |
| [§24645450988](https://github.com/github/gh-aw/actions/runs/24645450988) | AI Moderator | codex | 8.7m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27233 |
| [§24645419031](https://github.com/github/gh-aw/actions/runs/24645419031) | AI Moderator | codex | 9.1m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27233 |
| [§24653363185](https://github.com/github/gh-aw/actions/runs/24653363185) | Schema Feature Coverage Checker | codex | 2.1m | 401 Unauthorized — OPENAI_API_KEY | #27127 / #27286 |

#### Root Cause Confirmation

All 7 failures share the identical error signature:
```
unexpected status 401 Unauthorized: Missing bearer or basic authentication in header
url: (api.openai.com/redacted)
```
Codex v0.121.0 exhausts 5 reconnect retries before terminating. `api.openai.com:443` is allowed by firewall (13 requests allowed per run); `chatgpt.com:443` is blocked (1 blocked per run — cosmetic, not causal to the 401). Both workflows fail at the `agent` job after `pre_activation` and `activation` succeed.

#### Cancelled Runs (Benign)

3 runs cancelled at 04:16:16Z UTC — Security Review Agent (#16903), PR Nitpick Reviewer (#62911), Q (#76223) — all completed in 1–2s, consistent with PR-guard or push-superseded skips. Not tracked as failures.

#### Existing Issue Correlation

- **[#27127](https://github.com/github/gh-aw/issues/27127) Codex 401 (P0)**: Still unresolved — 7 more failures in this window. OPENAI_API_KEY remains missing/expired. No intervention observed yet.
- **[#27233](https://github.com/github/gh-aw/issues/27233) AI Moderator**: Still open; 6 new failures confirm ongoing impact.
- **[#27286](https://github.com/github/gh-aw/issues/27286) Schema Feature Coverage Checker**: Auto-generated today — same root cause as #27127; no separate sub-issue needed.
- **[#27251](https://github.com/github/gh-aw/issues/27251) GitHub App rate limit**: No recurrence — co-scheduled run staggering appears to have held.

#### Sub-Issues Created This Cycle

None. All 7 failures trace to the same unresolved P0: Codex `OPENAI_API_KEY` credential missing/invalid (#27127). Cumulative impact: **AI Moderator has now failed at least 13 times** across the last 3 investigator cycles with no resolution.

**References:**
- [§24653877408](https://github.com/github/gh-aw/actions/runs/24653877408)
- [§24652594821](https://github.com/github/gh-aw/actions/runs/24652594821)
- [§24653363185](https://github.com/github/gh-aw/actions/runs/24653363185)

> Generated by [[aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/24653877408/agentic_workflow) · ● 354.6K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)

Run	Workflow	Engine	Failure Mode	Status
§24623096622	Issue Monster	copilot	`assign_to_agent` uses `issue-number` instead of `issue_number` (3 errors)	New → sub-issue created
§24622541959	Artifacts Summary	copilot	Engine terminated unexpectedly (Copilot stuck in Read loop)	Tracked in #27155
§24622002167	Contribution Check	copilot	`add_labels` missing `item_number` (no issue/PR number)	New → sub-issue created
§24620886472	GitHub Remote MCP Auth Test	copilot	Transient Copilot API server errors (5 retries)	Transient — not tracked
§24618004586	Contribution Check	copilot	`Too many noop items. Maximum allowed: 1.`	Covered in Contribution Check sub-issue

Run	Workflow	Engine	Duration	Failure Mode	Tracked
§24626840251	Daily Issues Report Generator	copilot	5m	`node: command not found`	#27165
§24627195348	Daily Community Attribution Updater	copilot	32.9m	Permission denied running Python script	#27173
§24628420682	Duplicate Code Detector	codex	2.3m	401 Unauthorized — OPENAI_API_KEY missing/expired	#27177 / #27127
§24628589681	Multi-Device Docs Tester	claude	6.9m	Agent succeeded (all 10 devices OK), `safe_outputs` failed — `upload_artifact` error + 60% firewall block rate	New → #aw_mdtfail
§24629165601	GitHub MCP Remote Server Tools Report Generator	claude	9.2m	Protected files + discussion creation failed	#27185

Run	Workflow	Engine	Duration	Turns	Conclusion
§24633926754	Smoke CI	copilot	1.8m	0	✅ success
§24634412711	Contribution Check	copilot	5.5m	19	✅ success
§24635766604	Q	copilot	12.7m	61	✅ success
§24635898491	Design Decision Gate	claude	3.9m	7	✅ success
§24635898531	Test Quality Sentinel	copilot	6.0m	19	✅ success
§24635906916	Smoke CI	copilot	2.1m	0	✅ success
§24635987425	PR Triage Agent	copilot	4.2m	7	✅ success
§24635994760	Test Quality Sentinel	copilot	3.2m	5	✅ success
§24635994763	Design Decision Gate	claude	3.8m	9	✅ success
§24635997175	Smoke CI	copilot	2.1m	0	✅ success
§24636300105	CI Failure Doctor	claude	4.6m	10	✅ success
§24636318510	Test Quality Sentinel	copilot	4.5m	7	✅ success
§24636318512	Design Decision Gate	claude	4.0m	11	✅ success
§24636386852	Auto-Triage Issues	copilot	3.7m	9	✅ success

Run	Workflow	Engine	Failure Mode	Status
§24642041999	Daily Observability Report for AWF Firewall and MCP Gateway	codex	`API rate limit exceeded for installation` at guard policy init	New → sub-issue created
§24642045134	Daily Safe Output Tool Optimizer	claude	exit code 1, 0 tokens — likely same rate limit event (same SHA, same time)	Covered in same sub-issue
§24642364577	Smoke CI	copilot	Cancelled — push superseded by newer push; next run succeeded	Transient — not tracked

Run	Workflow	Engine	Duration	Failure Mode	Tracked
§24652594821	AI Moderator	codex	9.4m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27233
§24649110190	AI Moderator	codex	16.4m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27233
§24649107899	AI Moderator	codex	9.7m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27233
§24649050664	AI Moderator	codex	9.5m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27233
§24645450988	AI Moderator	codex	8.7m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27233
§24645419031	AI Moderator	codex	9.1m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27233
§24653363185	Schema Feature Coverage Checker	codex	2.1m	401 Unauthorized — OPENAI_API_KEY	#27127 / #27286

[aw-failures] [aw] Failure Investigator (6h) - Issue Group #27128

Description

[aw] Failure Investigator (6h)

6h Cycle Update — 2026-04-19 07:09 UTC (§24623487281)

Executive Summary

Failure Cluster Table

Existing Issue Correlation

Sub-Issues Created This Cycle

6h Cycle Update — 2026-04-19 13:09 UTC (§24629827117)

Executive Summary

Failure Cluster Table

Existing Issue Correlation

Sub-Issues Created This Cycle

Key Observations

6h Cycle Update — 2026-04-19 19:09 UTC (§24636791490)

Executive Summary

Run Summary

Quality Signals (no failures, but worth monitoring)

Existing Issue Correlation

Sub-Issues Created This Cycle

6h Cycle Update — 2026-04-20 01:12 UTC (§24643864137)

Executive Summary

Failure Cluster Table

Root Cause Detail

Existing Issue Correlation

Sub-Issues Created This Cycle

6h Cycle Update — 2026-04-20 07:25 UTC (§24653877408)

Executive Summary

Failure Cluster Table

Root Cause Confirmation

Cancelled Runs (Benign)

Existing Issue Correlation

Sub-Issues Created This Cycle

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions