Skip to content

Workflow Health Dashboard - 2026-03-19 #21757

@github-actions

Description

@github-actions

Overview

Workflow health assessment for 2026-03-19. 175 workflows monitored (+1 from last run), 15 stale lock files (↑8 from 7 last run). Score: 56/100 (↓6 from 62).

P0 failures persist for 4+ days (GH_AW_GITHUB_TOKEN still missing). P1 escalation: Smoke Gemini moved from intermittent (P2) to consistent failures (P1, 4 consecutive). Daily Workflow Updater now at 10+ consecutive failures.

Critical Issues 🚨

P0: Issue Monster / PR Triage Agent / Issue Triage Agent (Day 4)

  • Status: 100% failure rate — all recent runs failing
  • Error: GH_AW_GITHUB_TOKEN secret missing — pre_activation step cannot generate GitHub App token
  • Duration: Ongoing since March 15 (4+ days)
  • Issue Monster today: 5 failures in 6h (runs: 07:19, 06:46, 06:17, 05:49, 05:27 UTC)
  • PR Triage today: 5 consecutive failures
  • Issue Triage: Last run Mar 18T14:34 — failure
  • Impact: Issue creation, PR triage, and issue triage workflows completely non-functional
  • Action Required: Configure GH_AW_GITHUB_TOKEN repository secret (GitHub App installation token)
  • Priority: P0

Escalated Issues ⬆️

P1: Smoke Gemini — Escalated from P2 (4 consecutive failures)

  • Status: ESCALATED — no longer alternating, now consistent failure
  • Last success: Mar 17 00:51 UTC (run#453)
  • Failures since: Mar 17 12:36, Mar 18 00:54, Mar 18 12:36, Mar 19 00:55 UTC
  • Pattern: 4 consecutive schedule failures. Previously alternating success/failure, now degrading.
  • Impact: Gemini smoke tests not validating Gemini engine integration
  • Action: Investigate Gemini API availability / model endpoint changes
  • Priority: P1

P1: Daily Workflow Updater — 10+ consecutive failures

  • Status: Still failing every day. Now run#131, last success was run#109 (March 8).
  • Duration: 11+ days of failures
  • Impact: GitHub Actions version updates not being applied automatically
  • Action: See previously created investigation issue

Warnings ⚠️

P2: Stale Lock Files INCREASED (7 → 15)

  • Previous stale files from last run appear FIXED ✅ (daily-architecture-diagram etc. no longer stale)
  • New stale files (15): agent-performance-analyzer, blog-auditor, brave, ci-doctor, contribution-check, daily-semgrep-scan, dependabot-go-checker, duplicate-code-detector, functional-pragmatist, instructions-janitor, repo-audit-analyzer, smoke-copilot-arm, smoke-project, technical-doc-writer, tidy
  • Action: Run make recompile to rebuild all stale lock files

Recoveries ✅

Bot Detection — FULLY HEALTHY (confirmed)

  • 5/5 consecutive successes (today through Mar 18)
  • Timestamps: 06:22 UTC today ✅, 00:24 ✅, 18:21 ✅, 12:17 ✅, 06:24 ✅
  • Status: Permanently moved to Healthy

Healthy Workflows ✅

Core infrastructure operating normally:

  • Smoke Copilot ✅ | Smoke Claude ✅ | Smoke Codex ✅
  • Auto-Triage Issues ✅ | Contribution Check ✅
  • Metrics Collector ✅ (last success: Mar 18T18:29 UTC)
  • AI Moderator ✅ | Bot Detection ✅

Metrics Summary

Category Count Change
P0 Critical 3 → same
P1 High 2 ↑1 (Smoke Gemini escalated)
P2 Warning 1 (stale locks) ↑8 more stale files
Stale lock files 15 ↑8 from 7
Healthy ~155 → stable
Score 56/100 ↓6 from 62

Systemic Patterns

  • GH_AW_GITHUB_TOKEN remains the rejig docs #1 systemic issue — blocks 3 critical workflows
  • Stale lock files spike: From 7 → 15 suggests batch of .md workflow updates committed without running make recompile. The previous 7 were fixed, but 14 new ones appeared.
  • Smoke Gemini degradation: Transition from intermittent to consistent failure warrants investigation of Gemini API/model configuration

Recommendations

High Priority

  1. Configure GH_AW_GITHUB_TOKEN secret (P0) — restores 3 critical workflows instantly
  2. Investigate Smoke Gemini (P1) — check Gemini API key, model endpoint, or configuration changes since Mar 17
  3. Run make recompile (P2) — fixes 15 stale lock files immediately

Medium Priority

  1. Investigate Daily Workflow Updater root cause (10+ days failing)
  2. Add Contribution Check turn guard (max 20 turns) per Agent Performance recommendations

Actions Taken This Run

  • Created this dashboard issue for 2026-03-19
  • Escalated Smoke Gemini from P2 → P1 in shared alerts
  • Updated shared memory with current health state

Run: §23284419210
Timestamp: 2026-03-19T07:30Z
Next check: 2026-03-20 ~07:30Z
Previous dashboard: #21537

Generated by Workflow Health Manager - Meta-Orchestrator ·

  • expires on Mar 20, 2026, 7:35 AM UTC

Metadata

Metadata

Labels

cookieIssue Monster Loves Cookies!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions