-
Notifications
You must be signed in to change notification settings - Fork 308
Closed
Labels
cookieIssue Monster Loves Cookies!Issue Monster Loves Cookies!
Description
Overview
Workflow health assessment for 2026-03-19. 175 workflows monitored (+1 from last run), 15 stale lock files (↑8 from 7 last run). Score: 56/100 (↓6 from 62).
P0 failures persist for 4+ days (GH_AW_GITHUB_TOKEN still missing). P1 escalation: Smoke Gemini moved from intermittent (P2) to consistent failures (P1, 4 consecutive). Daily Workflow Updater now at 10+ consecutive failures.
Critical Issues 🚨
P0: Issue Monster / PR Triage Agent / Issue Triage Agent (Day 4)
- Status: 100% failure rate — all recent runs failing
- Error:
GH_AW_GITHUB_TOKENsecret missing —pre_activationstep cannot generate GitHub App token - Duration: Ongoing since March 15 (4+ days)
- Issue Monster today: 5 failures in 6h (runs: 07:19, 06:46, 06:17, 05:49, 05:27 UTC)
- PR Triage today: 5 consecutive failures
- Issue Triage: Last run Mar 18T14:34 — failure
- Impact: Issue creation, PR triage, and issue triage workflows completely non-functional
- Action Required: Configure
GH_AW_GITHUB_TOKENrepository secret (GitHub App installation token) - Priority: P0
Escalated Issues ⬆️
P1: Smoke Gemini — Escalated from P2 (4 consecutive failures)
- Status: ESCALATED — no longer alternating, now consistent failure
- Last success: Mar 17 00:51 UTC (run#453)
- Failures since: Mar 17 12:36, Mar 18 00:54, Mar 18 12:36, Mar 19 00:55 UTC
- Pattern: 4 consecutive schedule failures. Previously alternating success/failure, now degrading.
- Impact: Gemini smoke tests not validating Gemini engine integration
- Action: Investigate Gemini API availability / model endpoint changes
- Priority: P1
P1: Daily Workflow Updater — 10+ consecutive failures
- Status: Still failing every day. Now run#131, last success was run#109 (March 8).
- Duration: 11+ days of failures
- Impact: GitHub Actions version updates not being applied automatically
- Action: See previously created investigation issue
Warnings ⚠️
P2: Stale Lock Files INCREASED (7 → 15)
- Previous stale files from last run appear FIXED ✅ (daily-architecture-diagram etc. no longer stale)
- New stale files (15):
agent-performance-analyzer,blog-auditor,brave,ci-doctor,contribution-check,daily-semgrep-scan,dependabot-go-checker,duplicate-code-detector,functional-pragmatist,instructions-janitor,repo-audit-analyzer,smoke-copilot-arm,smoke-project,technical-doc-writer,tidy - Action: Run
make recompileto rebuild all stale lock files
Recoveries ✅
Bot Detection — FULLY HEALTHY (confirmed)
- 5/5 consecutive successes (today through Mar 18)
- Timestamps: 06:22 UTC today ✅, 00:24 ✅, 18:21 ✅, 12:17 ✅, 06:24 ✅
- Status: Permanently moved to Healthy
Healthy Workflows ✅
Core infrastructure operating normally:
- Smoke Copilot ✅ | Smoke Claude ✅ | Smoke Codex ✅
- Auto-Triage Issues ✅ | Contribution Check ✅
- Metrics Collector ✅ (last success: Mar 18T18:29 UTC)
- AI Moderator ✅ | Bot Detection ✅
Metrics Summary
| Category | Count | Change |
|---|---|---|
| P0 Critical | 3 | → same |
| P1 High | 2 | ↑1 (Smoke Gemini escalated) |
| P2 Warning | 1 (stale locks) | ↑8 more stale files |
| Stale lock files | 15 | ↑8 from 7 |
| Healthy | ~155 | → stable |
| Score | 56/100 | ↓6 from 62 |
Systemic Patterns
- GH_AW_GITHUB_TOKEN remains the rejig docs #1 systemic issue — blocks 3 critical workflows
- Stale lock files spike: From 7 → 15 suggests batch of .md workflow updates committed without running
make recompile. The previous 7 were fixed, but 14 new ones appeared. - Smoke Gemini degradation: Transition from intermittent to consistent failure warrants investigation of Gemini API/model configuration
Recommendations
High Priority
- Configure
GH_AW_GITHUB_TOKENsecret (P0) — restores 3 critical workflows instantly - Investigate Smoke Gemini (P1) — check Gemini API key, model endpoint, or configuration changes since Mar 17
- Run
make recompile(P2) — fixes 15 stale lock files immediately
Medium Priority
- Investigate Daily Workflow Updater root cause (10+ days failing)
- Add Contribution Check turn guard (max 20 turns) per Agent Performance recommendations
Actions Taken This Run
- Created this dashboard issue for 2026-03-19
- Escalated Smoke Gemini from P2 → P1 in shared alerts
- Updated shared memory with current health state
Run: §23284419210
Timestamp: 2026-03-19T07:30Z
Next check: 2026-03-20 ~07:30Z
Previous dashboard: #21537
Generated by Workflow Health Manager - Meta-Orchestrator · ◷
- expires on Mar 20, 2026, 7:35 AM UTC
Reactions are currently unavailable
Metadata
Metadata
Labels
cookieIssue Monster Loves Cookies!Issue Monster Loves Cookies!
Type
Fields
Give feedbackNo fields configured for issues without a type.