-
Notifications
You must be signed in to change notification settings - Fork 296
Description
Performance Summary
- Agents analyzed: 152 workflows (100% compiled β )
- Agentic runs this period: 25 (22 success, 3 failure)
- Run success rate: 88% (β +2% from last week)
- Overall agent quality score: 93/100 (β stable)
- Overall effectiveness score: 88/100 (β -1, minor)
- Critical agent issues: 0 β 17th consecutive zero-critical period! π
- Weekly token cost: ~$6.87 (~14% less than previous week)
- Safe items created: 14
Critical Findings
β No critical blocking issues this period.
Two notable failures (neither is an agent quality regression):
- Root cause:
gh pr listinvoked withmerged:>=DATEas positional arg instead of--search "merged:>=DATE" - Impact: Daily PR merged report not published; safe_outputs job skipped
- Action: Fix safe-inputs command in workflow prompt (30-min fix)
- Root cause: Missing
/tmp/gh-aw/aw-prompts/prompt.txtβ environment/infrastructure issue - Impact: macOS ARM64 smoke tests not executing
- Action: Investigate upstream trigger conditions
β Previous Alert RESOLVED: Slide Deck Maintainer network config fixed β running successfully.
View Agent Quality Rankings
Top Performing Agents π
-
Daily Safe Outputs Conformance Checker (Quality: 95/100)
- 39 turns, 2.11M tokens, $2.09 β zero errors
- Consistent precise analysis with actionable bug reports
- Today's run: Β§22191779279
-
Lockfile Statistics Analysis Agent (Quality: 92/100)
- 34 turns, 1.84M tokens, $2.34 β comprehensive analysis
- Created insightful statistics discussion; strong depth
- Run: Β§22190979083
-
Semantic Function Refactoring (Quality: 90/100)
- 75 turns, 1.32M tokens, $1.66 β created issue [refactor] Semantic Function Clustering Analysis: Duplicates and OutliersΒ #16889
- High turn count but thorough code analysis using Serena 52Γ
- Run: Β§22192254281
-
Daily Team Evolution Insights (Quality: 90/100)
- 9 turns β highly efficient, relevant insights
- Run: Β§22189781206
-
Smoke Codex (Quality: 90/100)
- Two successful runs (17 turns + 7 turns) β reliable and consistent
Agents with Notable Issues π
| Agent | Issue | Severity |
|---|---|---|
| Daily Copilot PR Merged Report | gh pr list arg parsing failure |
π΄ High |
| Smoke macOS ARM64 | Missing prompt file (infra) | π‘ Medium |
| Duplicate Code Detector | FORBIDDEN GraphQL error | π‘ Medium |
View Cost & Efficiency Analysis
Token Cost Breakdown
| Agent | Tokens | Cost | Turns | Status |
|---|---|---|---|---|
| Lockfile Statistics Analysis | 1.84M | $2.34 | 34 | β Success |
| Daily Safe Outputs Conformance | 2.11M | $2.09 | 39 | β Success |
| Semantic Function Refactoring | 1.32M | $1.66 | 75 | β Success |
| All Others (~21 runs) | ~60.6M | ~$0.78 | ~41 | Mixed |
| TOTAL | 65.9M | $6.87 | 189 | 88% success |
Key observation: Top 3 analytical agents consume 89% of weekly token budget ($6.09 of $6.87). All three are Claude-engine agents doing deep repository analysis.
Efficiency Trends
- Cost per run: $0.275 (β from $0.571 last week β improved efficiency)
- Turns per run: 7.6 avg (reasonable)
- Blocked network requests: high for all Claude agents (64β96 blocked per run) β firewall working as intended
View Behavioral Patterns & Coverage
Productive Patterns β
- Claude analytical agents: Deep repository analysis pattern (Serena + bash + safeoutputs) working well
- Scheduled smoke tests: Codex smoke tests running reliably on both Linux and macOS (when ARM64 env available)
- Daily reporters: Chronicle, Team Evolution, Copilot PR Report (mostly) β good content cadence
- Conformance checker + Plan Command: High-quality issue creation with clear acceptance criteria
Patterns to Monitor β οΈ
- High turn counts: Semantic Function Refactoring at 75 turns;
serena_get_symbols_overviewcalled 52 times in one run β may indicate over-exploration - Blocked requests: 64β96 blocked requests per Claude run β these are likely internal DNS lookups; worth monitoring for anomalies
- Safe-inputs gh CLI usage: Daily Copilot PR Merged Report hit a
gh pr listargument parsing bug that was not caught before execution
Coverage Analysis
- Well covered: Code quality analysis, safe outputs conformance, team metrics, smoke testing, PR/issue management
- Currently impaired: macOS ARM64 smoke testing (infra issue), lockdown-auth workflows (missing token), PR merged report
- Ecosystem balance: 104 Copilot (68%), 37 Claude (24%), 11 Codex (7%) β healthy diversity
View Infrastructure Health Context
Workflow Health Snapshot (from Workflow Health Manager, 2026-02-19)
- Overall health score: 88/100 (β -7 from 95)
- 152/152 compiled (100% β )
- 16 outdated lock files (MD newer than lock β needs
make recompile) - 3 scheduled workflow failures:
- PR Triage Agent + Daily Issues Report Generator:
GH_AW_GITHUB_TOKENsecret missing for lockdown mode - Duplicate Code Detector: FORBIDDEN error via GraphQL (
replaceActorsForAssignable)
- PR Triage Agent + Daily Issues Report Generator:
Previous Recommendations Follow-up
| Recommendation | Status |
|---|---|
| Fix Slide Deck Maintainer network config | β RESOLVED |
| Audit 9 uncompiled workflows | β RESOLVED (100% compiled now) |
| Add token monitoring to high-cost agents | π‘ Pending |
| Document AI Moderator race condition | π‘ Pending |
Recommendations
π΄ High Priority
- Fix Daily Copilot PR Merged Report β use
--search "merged:>=DATE"flag ingh pr listcall (30-min fix) - Set
GH_AW_GITHUB_TOKENsecret β fixes PR Triage Agent + Daily Issues Report Generator (15+ lockdown workflows at risk)
π‘ Medium Priority
- Recompile 16 outdated lock files β run
make recompile - Investigate Smoke macOS ARM64 prompt file missing issue (2 consecutive failures)
- Optimize Lockfile Statistics Agent β most expensive run at $2.34; review
pip installon every run - Fix Duplicate Code Detector FORBIDDEN GraphQL error for Copilot assignment
π’ Low Priority
- Document expected turn count ranges for high-turn agents (Semantic Refactoring at 75 turns)
Trends
- Agent quality: 93/100 (β stable, sustained excellence)
- Run success rate: 88% (β from 86%)
- Cost efficiency: β improved ($6.87 vs ~$8.00 previous week)
- Zero-critical streak: 17 consecutive periods π
References:
- Β§22192877278 β This analysis run
- Β§22187864127 β Daily Copilot PR Merged Report failure
- Β§22191779279 β Daily Safe Outputs Conformance Checker
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by Agent Performance Analyzer - Meta-Orchestrator
- expires on Feb 26, 2026, 5:45 PM UTC