Workflow Health Dashboard — 2026-04-28 ⚠️ Regression: 57% success rate

### Overview

Today's health check reveals a **significant regression**: scheduled workflow success rate dropped from 93% (Apr 27) to **57%** (17/30 runs succeeded, 13 failed). The dominant new failure is a systemic `THREAT_DETECTION_RESULT` parse error spreading to multiple previously-healthy workflows. All 204 workflows compile successfully.

### Summary
- **Total workflows:** 204 (all compiling ✅)
- **Scheduled runs today:** 30 total — 17 success, 13 failure
- **Success rate:** 57% (↓ from 93% yesterday)
- **Score:** 57/100 (↓ from 74/100)
- **Run:** [§25052372422](https://github.com/github/gh-aw/actions/runs/25052372422)

---

### Critical Issues 🚨

#### 🔴 P1 (Escalated) — THREAT_DETECTION_RESULT Parse Failure — Now Systemic

**Affected workflows (newly failing today):**
- Dead Code Removal Agent — `detection` job: `No THREAT_DETECTION_RESULT found in detection log`
- Daily Testify Uber Super Expert — same
- Update Astro — same

**Pattern:** Detection model ran (101 lines of output logged) but did not emit the expected `THREAT_DETECTION_RESULT:{...}` JSON token. This was a P2 "watch" item on Apr 27 (1-2 workflows), now hitting ≥3 on Apr 28.

**Tracking:** #28866 ([aw] Detection Runs) — comments posted there automatically

**Impact:** Workflows that use `continue-on-error: true` (default) will still emit warnings but won't fail safe outputs. However, the `detection` job is blocking the run conclusion, causing full workflow failure.

**Recommended action:** Investigate whether the detection model (gpt-4o or equivalent) has been updated or is experiencing degraded instruction-following. Consider retry logic or fallback for detection parse failures.

---

#### 🟠 P0 (Ongoing) — Daily Fact About gh-aw — Codex Engine Failure

Continues to fail daily. Auto-issues created by failure-investigator. Root cause: codex engine crash or binary issue.

---

### Agent Job Failures ⚠️

The following workflows failed in the `agent` job today (exact errors not captured in tail logs — agents appear to have crashed or exited early):

| Workflow | Likely Cause |
|---|---|
| Sub-Issue Closer | Agent crash (no OTEL = different runner) |
| Daily Team Evolution Insights | Agent crash |
| Daily AstroStyleLite Markdown Spellcheck | Agent crash (no OTEL) |
| Daily Rendering Scripts Verifier | Agent crash (Docker/Playwright env) |
| Developer Documentation Consolidator | Agent crash (Docker env) |
| Semantic Function Refactoring | Agent crash |
| Daily Documentation Updater | Safe outputs job failure |

Several of these may share a common root cause (model unavailability, runner image issue, or infrastructure outage around 11:00-12:00 UTC).

---

### CI Integration Tests ⚠️

The scheduled CI run (25050573450) had 4 integration test jobs fail. This may indicate test infrastructure issues or recent code changes breaking integration tests.

---

### Healthy Workflows ✅

17 scheduled workflows ran successfully today, including:
- Various daily code quality workflows
- Campaign management workflows
- Documentation workflows not affected by detection failures

### Open P1+ Issues (Carry-Forward)

| Issue | Status | Age |
|---|---|---|
| [#28659](https://github.com/github/gh-aw/issues/28659) Documentation Unbloat claude auth | OPEN | Day 2+ |
| [#27965](https://github.com/github/gh-aw/issues/27965) GitHub Remote MCP Auth Test model error | OPEN | Day 7+ |
| [#23153](https://github.com/github/gh-aw/issues/23153) MCP gateway session drops | OPEN | ~30 days |
| [#27888](https://github.com/github/gh-aw/issues/27888) awf-api-proxy sidecar unhealthy | OPEN | ~10 days |
| [#27251](https://github.com/github/gh-aw/issues/27251) GitHub App rate limit exhaustion | OPEN | ~18 days |
| [#27512](https://github.com/github/gh-aw/issues/27512) CODEX_HOME collision | OPEN | ~14 days |
| [#28866](https://github.com/github/gh-aw/issues/28866) Detection Runs (parse failures) | OPEN | Ongoing |

### Actions Taken This Run
- Updated `workflow-health-latest.md` and `shared-alerts.md` in repo memory
- No new issues created (existing tracking issues cover identified failures)

**References:**
- [§25052372422](https://github.com/github/gh-aw/actions/runs/25052372422)
- [§25052188092](https://github.com/github/gh-aw/actions/runs/25052188092) Dead Code Removal Agent (detection failure)
- [§25050573450](https://github.com/github/gh-aw/actions/runs/25050573450) CI (integration test failure)




> [!NOTE]
> <details>
> <summary>🔒 Integrity filter blocked 1 item</summary>
>
> The following item was blocked because it doesn't meet the GitHub integrity level.
>
> - [#19099](https://github.com/github/gh-aw/issues/19099) `search_issues`: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
>
> To allow these resources, lower `min-integrity` in your GitHub frontmatter:
>
> ```yaml
> tools:
>   github:
>     min-integrity: approved  # merged | approved | unapproved | none
> ```
>
> </details>


> Generated by [Workflow Health Manager - Meta-Orchestrator](https://github.com/github/gh-aw/actions/runs/25052372422/agentic_workflow) · ● 2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fworkflow-health-manager%22&type=issues)
> - [x] expires  on Apr 29, 2026, 12:26 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Health Dashboard — 2026-04-28 ⚠️ Regression: 57% success rate #28939

Overview

Summary

Critical Issues 🚨

🔴 P1 (Escalated) — THREAT_DETECTION_RESULT Parse Failure — Now Systemic

🟠 P0 (Ongoing) — Daily Fact About gh-aw — Codex Engine Failure

Agent Job Failures ⚠️

CI Integration Tests ⚠️

Healthy Workflows ✅

Open P1+ Issues (Carry-Forward)

Actions Taken This Run

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workflow	Likely Cause
Sub-Issue Closer	Agent crash (no OTEL = different runner)
Daily Team Evolution Insights	Agent crash
Daily AstroStyleLite Markdown Spellcheck	Agent crash (no OTEL)
Daily Rendering Scripts Verifier	Agent crash (Docker/Playwright env)
Developer Documentation Consolidator	Agent crash (Docker env)
Semantic Function Refactoring	Agent crash
Daily Documentation Updater	Safe outputs job failure

Issue	Status	Age
#28659 Documentation Unbloat claude auth	OPEN	Day 2+
#27965 GitHub Remote MCP Auth Test model error	OPEN	Day 7+
#23153 MCP gateway session drops	OPEN	~30 days
#27888 awf-api-proxy sidecar unhealthy	OPEN	~10 days
#27251 GitHub App rate limit exhaustion	OPEN	~18 days
#27512 CODEX_HOME collision	OPEN	~14 days
#28866 Detection Runs (parse failures)	OPEN	Ongoing

Workflow Health Dashboard — 2026-04-28 ⚠️ Regression: 57% success rate #28939

Description

Overview

Summary

Critical Issues 🚨

🔴 P1 (Escalated) — THREAT_DETECTION_RESULT Parse Failure — Now Systemic

🟠 P0 (Ongoing) — Daily Fact About gh-aw — Codex Engine Failure

Agent Job Failures ⚠️

CI Integration Tests ⚠️

Healthy Workflows ✅

Open P1+ Issues (Carry-Forward)

Actions Taken This Run

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions