Skip to content

Workflow Health Dashboard — 2026-04-28 ⚠️ Regression: 57% success rate #28939

@github-actions

Description

@github-actions

Overview

Today's health check reveals a significant regression: scheduled workflow success rate dropped from 93% (Apr 27) to 57% (17/30 runs succeeded, 13 failed). The dominant new failure is a systemic THREAT_DETECTION_RESULT parse error spreading to multiple previously-healthy workflows. All 204 workflows compile successfully.

Summary

  • Total workflows: 204 (all compiling ✅)
  • Scheduled runs today: 30 total — 17 success, 13 failure
  • Success rate: 57% (↓ from 93% yesterday)
  • Score: 57/100 (↓ from 74/100)
  • Run: §25052372422

Critical Issues 🚨

🔴 P1 (Escalated) — THREAT_DETECTION_RESULT Parse Failure — Now Systemic

Affected workflows (newly failing today):

  • Dead Code Removal Agent — detection job: No THREAT_DETECTION_RESULT found in detection log
  • Daily Testify Uber Super Expert — same
  • Update Astro — same

Pattern: Detection model ran (101 lines of output logged) but did not emit the expected THREAT_DETECTION_RESULT:{...} JSON token. This was a P2 "watch" item on Apr 27 (1-2 workflows), now hitting ≥3 on Apr 28.

Tracking: #28866 ([aw] Detection Runs) — comments posted there automatically

Impact: Workflows that use continue-on-error: true (default) will still emit warnings but won't fail safe outputs. However, the detection job is blocking the run conclusion, causing full workflow failure.

Recommended action: Investigate whether the detection model (gpt-4o or equivalent) has been updated or is experiencing degraded instruction-following. Consider retry logic or fallback for detection parse failures.


🟠 P0 (Ongoing) — Daily Fact About gh-aw — Codex Engine Failure

Continues to fail daily. Auto-issues created by failure-investigator. Root cause: codex engine crash or binary issue.


Agent Job Failures ⚠️

The following workflows failed in the agent job today (exact errors not captured in tail logs — agents appear to have crashed or exited early):

Workflow Likely Cause
Sub-Issue Closer Agent crash (no OTEL = different runner)
Daily Team Evolution Insights Agent crash
Daily AstroStyleLite Markdown Spellcheck Agent crash (no OTEL)
Daily Rendering Scripts Verifier Agent crash (Docker/Playwright env)
Developer Documentation Consolidator Agent crash (Docker env)
Semantic Function Refactoring Agent crash
Daily Documentation Updater Safe outputs job failure

Several of these may share a common root cause (model unavailability, runner image issue, or infrastructure outage around 11:00-12:00 UTC).


CI Integration Tests ⚠️

The scheduled CI run (25050573450) had 4 integration test jobs fail. This may indicate test infrastructure issues or recent code changes breaking integration tests.


Healthy Workflows ✅

17 scheduled workflows ran successfully today, including:

  • Various daily code quality workflows
  • Campaign management workflows
  • Documentation workflows not affected by detection failures

Open P1+ Issues (Carry-Forward)

Issue Status Age
#28659 Documentation Unbloat claude auth OPEN Day 2+
#27965 GitHub Remote MCP Auth Test model error OPEN Day 7+
#23153 MCP gateway session drops OPEN ~30 days
#27888 awf-api-proxy sidecar unhealthy OPEN ~10 days
#27251 GitHub App rate limit exhaustion OPEN ~18 days
#27512 CODEX_HOME collision OPEN ~14 days
#28866 Detection Runs (parse failures) OPEN Ongoing

Actions Taken This Run

  • Updated workflow-health-latest.md and shared-alerts.md in repo memory
  • No new issues created (existing tracking issues cover identified failures)

References:

Note

🔒 Integrity filter blocked 1 item

The following item was blocked because it doesn't meet the GitHub integrity level.

  • #19099 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Workflow Health Manager - Meta-Orchestrator · ● 2M ·

  • expires on Apr 29, 2026, 12:26 PM UTC

Metadata

Metadata

Labels

cookieIssue Monster Loves Cookies!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions