Skip to content

Workflow Health Dashboard - 2026-03-18 #21537

@github-actions

Description

@github-actions

Overview

Workflow health assessment for 2026-03-18. 174 workflows monitored, 7 stale lock files (down from 16 last run). Score: 62/100 (↓6 from 68).

P0 failures persist (GH_AW_GITHUB_TOKEN still missing). New P1 escalation: Daily Workflow Updater failing for 9 consecutive days.

Critical Issues 🚨

P0: Issue Monster / PR Triage Agent / Issue Triage Agent

  • Status: 100% failure rate (all recent runs failing)
  • Error: GH_AW_GITHUB_TOKEN secret missing — pre_activation step fails to generate GitHub App token for skip-if checks
  • Duration: Ongoing since March 15
  • Impact: Issue management, PR triage, and issue triage workflows completely non-functional
  • Action Required: Configure GH_AW_GITHUB_TOKEN repository secret (GitHub App token)

Escalated Issues ⬆️

P1: Daily Workflow Updater — 9 consecutive failures (NEW ESCALATION)

Recoveries ✅

Bot Detection — RECOVERED (was P1)

  • 2 consecutive successes today (runs at 00:24 and 06:24 UTC)
  • After cluster of failures Mar 15-17, now healthy
  • Status: Downgraded from P1 to Healthy

Warnings ⚠️

P2: Smoke Gemini — Intermittent failures (50% rate)

  • Alternating success/failure pattern: success Mar 14-15, failure Mar 16, success Mar 17T00:51, failure Mar 18T00:54
  • May indicate intermittent Gemini API availability issues
  • Monitoring recommended

P2: Stale Lock Files (7 files)

  • daily-architecture-diagram.md, daily-compiler-quality.md, daily-mcp-concurrency-analysis.md, daily-secrets-analysis.md, github-mcp-structural-analysis.md, repo-audit-analyzer.md, smoke-call-workflow.md
  • Action: Run make recompile to rebuild

Healthy Workflows ✅

Core infrastructure healthy:

  • Smoke Copilot ✅ | Smoke Claude ✅ | Smoke Codex ✅
  • Auto-Triage Issues ✅ | Contribution Check ✅ | Metrics Collector ✅
  • AI Moderator ✅

Systemic Patterns

Systemic GitHub Actions disruption (Mar 17 15:00–22:00 UTC):

  • Most workflows show failures in this window, then recovery after 22:54 UTC
  • Auto-triage, Smoke Copilot, Contribution Check, WHM itself all affected
  • Not a workflow bug — infrastructure disruption

Metrics Summary

Category Count %
Healthy (≥80) ~165 ~95%
Warning (60-79) ~3 ~2%
Critical (<60) ~3 ~2%
Stale locks 7 4%

Actions Taken This Run

  • Created P1 issue for Daily Workflow Updater (9 days failing)
  • Bot Detection downgraded from P1 → Healthy
  • Updated shared memory with current state
  • Stale lock count: 7 (↓ from 16)

Run: §23233873324
Timestamp: 2026-03-18T07:32Z
Next check: 2026-03-19 ~07:30Z

Metadata

Metadata

Labels

cookieIssue Monster Loves Cookies!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions