-
Notifications
You must be signed in to change notification settings - Fork 295
Closed as not planned
Labels
Description
Overview
| Metric | Value |
|---|---|
| Total executable workflows | 166 (stable) |
| Compiled with lock files | 166/166 (100% ✅) |
| Outdated lock files | 0 ✅ (13 with 0s diffs are checkout artifacts, same false-positive as Mar 8) |
| Healthy | ~153 (92%) |
| Critical/Failing (P1) | 6 workflows |
| Overall health score | 72/100 (↓4 from 76 — P2 failure spike today) |
⚠️ DEGRADED — Lockdown failures week 5+ + OpenAI restriction day 12. Previous dashboard #20036 expired at 07:29Z today.
Critical Issues 🚨
P1: Lockdown Token Missing (4 workflows, ongoing week 5+)
All 4 workflows require GH_AW_GITHUB_TOKEN which is not provisioned. All fix paths closed (#17414, #17807 both CLOSED "not_planned"). No current fix path — manual admin intervention required.
- Issue Monster — run [WIP] Add explicit permissions to seven workflows #2609 failed (2026-03-09T07:29Z) — ~48 failures/day — all tracking issues auto-expired
- PR Triage Agent — run output labels #184 failed (2026-03-09T06:28Z) — no active tracking issue
- Daily Issues Report — run Add GITHUB_AW_OUTPUT artifact upload with non-empty file validation and logs command awareness #127 failed (2026-03-09T01:59Z) — 127+ consecutive failures, no tracking issue
- Org Health Report — last run Small docs fixes #27 failed (2026-03-02) — has never had a tracking issue; no recent runs
P1: AI Moderator — Day 12 OpenAI Restriction
- Status: Still failing on
issuesevents — run Add issue creation to deep-report workflow for actionable tasks #9855 failed (2026-03-09T06:47Z) - Partial recovery: Succeeds on
pull_requestand someissue_commentevents - Error: OpenAI cybersecurity restriction on
gpt-5.3-codexmodel - Tracking: [aw] AI Moderator failed (pre-agent) #20113 OPEN (auto-generated Mar 8); previous [aw] AI Moderator failed (pre-agent) #19551 CLOSED "not_planned" by dsyme (Mar 8)
P1: Smoke Codex — Day 12 OpenAI Restriction
- Status: Still failing — run Add schema consistency checker agentic workflow #2185 failed (2026-03-09T05:14Z)
- Tracking: [aw] Smoke Codex failed (pre-agent) #19514 OPEN (expires Mar 11, 2026 — 37 comments)
New Failures (P2) 📋
8 new auto-generated failure issues (Mar 8–9)
| Issue | Workflow | Pattern |
|---|---|---|
| #20158 | Agent Container Smoke Test | |
| #20156 | Duplicate Code Detector | Standard failure |
| #20154 | Multi-Device Docs Tester | |
| #20153 | GPL Dependency Cleaner (gpclean) | Standard failure |
| #20152 | Agent Persona Explorer | Pre-agent failure |
| #20142 | Smoke Update Cross-Repo PR | Pre-agent failure |
| #20102 | Security Alert Burndown | Pre-agent failure |
| #20046 | Daily Code Metrics |
Also: #20037 (Workflow Health Manager itself) — repo-memory push fail on previous run.
Notable patterns:
- 2 workflows showing "No Safe Outputs Generated" — possible safe-output infrastructure issue
- 2 workflows with repo-memory push failures — memory size limit enforcement issue
- Several pre-agent failures — investigate if these are recurring or one-off
Issue Tracking Summary
| Workflow | Status | Tracking Issue |
|---|---|---|
| Issue Monster | ❌ Failing | Auto-generates its own issues |
| PR Triage Agent | ❌ Failing | None (expired) |
| Daily Issues Report | ❌ Failing | None (expired) |
| Org Health Report | ❌ Failing | None (never had one) |
| AI Moderator | ❌ Partial | #20113 ✅ OPEN |
| Smoke Codex | ❌ Failing | #19514 ✅ OPEN (exp Mar 11) |
Compilation Health ✅
All 166 workflows have .lock.yml files. The 13 detected "outdated" lock files all show 0-second timestamp differences — confirmed filesystem checkout artifacts (same false-positive pattern as seen Mar 8).
Healthy Workflows ✅
Key healthy workflows
- Smoke Copilot: ✅ run [smoke-detector] 🔍 Smoke Test Investigation - Smoke Copilot: Permission Denied for Safe-Outputs Tools #2288 success (2026-03-09T05:14Z on PR branch; consistent)
- Smoke Claude: mostly passing (run [q] Optimize CLI version checker based on performance analysis #2203 success Mar 8 on main schedule; run Optimize CLI version checker workflow based on performance analysis #2208 failed Mar 9 00:53 — likely transient)
- Metrics Collector: ✅ run Proposal: File system permissions for MCP servers running in Docker containers #81 success (2026-03-08T18:14Z) — operational since recovery
Systemic Issues
Lockdown Token (GH_AW_GITHUB_TOKEN) — Week 5+
- Pattern: Chronic failure, all fix paths declined
- Recommendation: Accept as known failure or escalate to admin
OpenAI Cybersecurity Restriction — Day 12
- Affected: AI Moderator, Smoke Codex
- Fix path: Switch to different engine — tracked in [aw] Smoke Codex failed (pre-agent) #19514 for Smoke Codex
Repo-Memory Push Failures — NEW
- Affected: Workflow Health Manager, Daily Code Metrics (at minimum)
- Pattern:
push_repo_memoryvalidation tool appears to include.gitdirectory objects in size calculation, making the configured 10KB limit unachievable - Impact: Memory not persisted between runs; reduces coordination between meta-orchestrators
Health Trends
| Date | Score | Key Change |
|---|---|---|
| 2026-03-01 | 73/100 | Metrics Collector regression |
| 2026-03-03 | 76/100 | Metrics Collector recovered |
| 2026-03-07 | 74/100 | False positive: 12 "outdated" locks |
| 2026-03-08 | 76/100 | Corrected false positive; all locks current |
| 2026-03-09 | 72/100 | P2 spike: 8 new failure issues |
Recommendations
High Priority
- Lockdown workflows ([P1] Lockdown mode failing: GH_AW_GITHUB_TOKEN not configured — 5 workflows affected #17414, [q] fix(workflows): remove explicit lockdown:true to stop recurring failures #17807 closed) — requires admin escalation; 4 workflows failing indefinitely
- OpenAI restriction — Smoke Codex/AI Moderator Day 12; escalate model switch in [aw] Smoke Codex failed (pre-agent) #19514
- Repo-memory push validation bug —
push_repo_memorycounts.gitobjects, making limit unachievable; needs fix
Medium Priority
- Investigate "No Safe Outputs Generated" pattern (Agent Container, Multi-Device Docs)
- Monitor new P2 failures — if recurring, create dedicated tracking issues
Actions Taken This Run
- ✅ Verified 166/166 workflows compiled
- ✅ Confirmed 0 real outdated lock files (13 false positives)
- ✅ Confirmed P1 status: all 6 workflows still failing
- ✅ Identified 8 new P2 failures (auto-tracked by issue-monster)
- ✅ Identified repo-memory push validation issue
- ✅ Created this dashboard (replacing Workflow Health Dashboard - 2026-03-08 #20036 which expired 07:29Z)
References:
- §22842967314 — This run
- §22816368749 — Previous run (dashboard Workflow Health Dashboard - 2026-03-08 #20036)
- Previous dashboard: Workflow Health Dashboard - 2026-03-08 #20036 (expired 2026-03-09T07:29Z)
- expires on Mar 10, 2026, 7:33 AM UTC
Generated by Workflow Health Manager - Meta-Orchestrator · ◷
- expires on Mar 10, 2026, 7:44 AM UTC
Reactions are currently unavailable