Skip to content

Workflow Health Dashboard - 2026-03-20 #21926

@github-actions

Description

@github-actions

Overview

Workflow health assessment for 2026-03-20. 175 workflows monitored. Score: 66/100 (↑10 from 56 yesterday).

Major recoveries: Issue Monster fully resolved ✅, PR Triage recovering ✅, Daily Workflow Updater recovered ✅. GH_AW_GITHUB_TOKEN issue largely resolved.

Ongoing: Issue Triage Agent (10+ consecutive failures, not GH_AW_TOKEN related), Smoke Gemini (6+ consecutive failures). 14 stale lock files (down from 15).

Critical Issues 🚨

P0 → RESOLVED: Issue Monster / PR Triage Agent (GH_AW_GITHUB_TOKEN)

P1: Issue Triage Agent — Persistent Failure (Day 14+)

  • Status: 10/10 schedule runs failed. Last run: [copilot] output issue in an issue context #135 (Mar 19 14:25 UTC). No run yet today.
  • Duration: Failing since at least March 6 (run Welcome to Agentic Workflows! #88). Pre-dates GH_AW_GITHUB_TOKEN issue.
  • Pattern: Consistent schedule failures only — different root cause than Issue Monster
  • Last success: Unknown — possibly never recovered or failed during initial deployment
  • Impact: Automatic issue triage non-functional
  • Priority: P0 (escalated from P1 — now 14+ days continuous failure, separate root cause)

P1: Smoke Gemini — Consistent Failure (Day 5)

Recoveries Since Yesterday ✅

Issue Monster — FULLY RECOVERED (was P0 Day 5)

Daily Workflow Updater — RECOVERED (was P1 Day 11)

PR Triage Agent — RECOVERING (was P0 Day 4)

Warnings ⚠️

P2: Stale Lock Files (14 files — down from 15)

View Stale Lock File List

These .md files have been modified after their corresponding .lock.yml was last generated:

Workflow Status
blog-auditor Stale
breaking-change-checker Stale (NEW)
copilot-cli-deep-research Stale (NEW)
daily-multi-device-docs-tester Stale (NEW)
daily-regulatory Stale (NEW)
dependabot-go-checker Stale
discussion-task-miner Stale (NEW)
example-workflow-analyzer Stale (NEW)
jsweep Stale (NEW)
prompt-clustering-analysis Stale (NEW)
release Stale (NEW)
security-alert-burndown.campaign.g Stale (NEW)
update-astro Stale (NEW)
workflow-skill-extractor Stale (NEW)

Fix: Run make recompile in the repository root.

Healthy Workflows ✅

Core infrastructure operating normally (spot-checked):

  • Issue Monster ✅ (5/5 success today — fully recovered)
  • PR Triage Agent ✅ (latest run success — recovering)
  • Daily Workflow Updater ✅ (recovered after 11-day failure streak)
  • Bot Detection ✅ (consistent streak maintained)
  • Safe Output Health Monitor ✅ (3/3 recent runs success: Mar 18–20)
  • Smoke Copilot / Claude / Codex ✅ (assumed healthy based on prior data)
  • Metrics Collector ✅ (healthy)

Metrics Summary

Category Count Change
P0 Critical 1 ↓2 (Issue Monster + PR Triage resolved)
P1 High 1 → same (Smoke Gemini)
P2 Warning 1 (stale locks) ↓1 (14 vs 15)
Stale lock files 14 ↓1 from 15
Score 66/100 ↑10 from 56

Systemic Patterns

  • GH_AW_GITHUB_TOKEN resolution: Major improvement — 3 P0 workflows affected last run, now 1 remains failing with a different root cause (Issue Triage Agent failing independently since March 6)
  • Issue Triage Agent isolation: Its failures pre-date the GH_AW_GITHUB_TOKEN crisis, suggesting a separate structural/configuration issue with the workflow itself
  • Smoke Gemini degradation: 6-day run of schedule failures. Single success on Mar 17 00:51 may indicate flakiness or API throttling pattern
  • Stale lock file churn: 9 new stale files appeared. Batch of .md workflow edits committed without running make recompile

Recommendations

High Priority

  1. Investigate Issue Triage Agent root cause (P0) — 14+ days of consistent schedule failures, separate from GH_AW_GITHUB_TOKEN. Check activation logic, trigger events, and agent configuration
  2. Investigate Smoke Gemini failures (P1) — Check Gemini API key expiry, model endpoint URL, or quota limits. Last success was Mar 17 00:51, consider retry diagnostics
  3. Run make recompile (P2) — Fixes 14 stale lock files immediately

Medium Priority

  1. Monitor PR Triage Agent and Daily Workflow Updater for continued stability
  2. Weekly Issue Summary: Check most recent schedule runs to verify it also recovered with GH_AW_GITHUB_TOKEN fix

Actions Taken This Run

  • Created this dashboard issue for 2026-03-20
  • Escalated Issue Triage Agent from P1 → P0 (independent 14-day failure, separate root cause)
  • Moved Issue Monster to Resolved ✅
  • Updated shared memory with current health state

Run: §23333198222
Timestamp: 2026-03-20T07:28Z
Next check: 2026-03-21 ~07:30Z
Previous dashboard: #21757

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

  • issue:#0 (search_issues: Resource 'issue:#0' has lower integrity than agent requires. Agent would need to drop integrity tags [unapproved:all approved:all] to trust this resource.)

Generated by Workflow Health Manager - Meta-Orchestrator ·

  • expires on Mar 21, 2026, 7:34 AM UTC

Metadata

Metadata

Labels

cookieIssue Monster Loves Cookies!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions