[agentic-optimization-kit] Weekly Agentic Optimization Report — 2026-04-21 to 2026-04-27 #28742

2026-04-27T14:35:22Z

github-actions[bot]
Bot Apr 27, 2026

📊 Executive Summary

Period: 2026-04-21 to 2026-04-27
Total runs: 119 completed (122 total) | Total tokens: 67,657,831 | Total cost: $0.00 (Copilot-billed) | Action minutes: 778m
Active workflows: 61 | Episodes analyzed: 54 (30-day window) | High-confidence episodes: 54
Heavy-hitters (>30% token share): None — token load is well-distributed; top workflow (Contribution Check) holds 7.9%
Optimization target this week: Daily Syntax Error Quality Check — estimated savings: ~1,360,000 tokens/run

📈 Visual Diagnostics

1. Token Usage by Workflow

Decision: Contribution Check leads at 5.4M tokens (6 runs), followed by Test Quality Sentinel (3.8M, 16 runs) and Daily Syntax Error Quality Check (3.4M, 1 run) — all flagged expensive-per-run (>100k avg); the distribution is broad across 61 workflows with no single dominant consumer.

2. Historical Token Trend

Decision: Token consumption is trending downward week-over-week (76.6M on Apr 22 → 67.7M on Apr 27), suggesting recent optimizations on Test Quality Sentinel, Smoke CI, and Architecture Guardian are taking effect.

3. Episode Risk–Cost Frontier

Decision: All 54 analyzed episodes scored risk=0 with zero estimated cost — the 30-day window shows a stable, low-risk portfolio with no Pareto-frontier outliers.
Why it matters: Zero risk scores confirm that recent investments in reliability and tool-scoping are holding; the cost axis is suppressed by Copilot billing, but the risk axis is the primary safety signal.

4. Workflow Stability Matrix

Decision: Instability is concentrated in the Smoke CI and Visual Regression Checker workflows which drive 100% and 140% incident rates respectively (errors > runs) — all other workflows show clean stability profiles.
Why it matters: Smoke CI's error concentration (22 errors / 22 runs) is a known issue from trigger misconfiguration; Visual Regression Checker has infrastructure dependencies that cause intermittent failures. Neither indicates agentic control degradation.

5. Repository Portfolio Map

Decision: High-value/low-cost quadrant (KEEP) holds Test Quality Sentinel and Issue Monster; high-value/high-cost (OPTIMIZE) holds Contribution Check and daily analysis workflows; most single-run workflows cluster in SIMPLIFY/REVIEW.
Why it matters: The dominant tradeoff is between high-frequency low-cost workflows (high ROI) vs. single-run deep-analysis workflows (high cost, uncertain recurrence value) — the OPTIMIZE quadrant is the primary cost lever.

🚨 Escalation Targets

No workflows crossed escalation thresholds in the 14-day analysis window. Zero episodes marked escalation_eligible. No repeated risky classifications, new MCP failures, or blocked request increases detected.

🎯 Optimization Target: Daily Syntax Error Quality Check

Why selected: Highest total tokens (3,399,393) among workflows not optimized in the last 14 days; not a self-targeting workflow.
Runs analyzed: 1 over 7 days | Avg tokens/run: 3,399,393 | Avg cost/run: $0.00 (Copilot) | Avg turns/run: 56

Key signals from assessment:

resource_heavy_for_domain — high severity (56 turns for a 2-workflow test; expected ≤35)
partially_reducible — agentic_fraction=0.50, meaning ~28 turns are pure data-gathering suitable for deterministic pre-steps

Rank	Recommendation	Est. Savings/Run	Action
1	Add bash pre-step: discover + list candidate workflows deterministically	~700,000 tokens	Add `steps:` pre-step running `find .github/workflows -name '.md' ! -name 'daily-' ! -name '*-test.md'` and writing results to `/tmp/gh-aw/agent/candidates.txt`; agent reads file directly
2	Add `max-turns: 35` cap to prevent turn runaway	~400,000 tokens	Add `timeout-minutes: 15` + prompt instruction `Complete in ≤ 35 turns`
3	Install gh-aw extension in pre-step (not agentically)	~200,000 tokens	Move the `gh extension install/upgrade gh-aw` block to a workflow `steps:` pre-step so it doesn't count against agentic turns
4	Trim Category C error scenarios from prompt	~60,000 tokens	Remove the "Conflicting configuration", "Invalid value", "Missing dependency" sub-sections from the Category C block in Phase 2 — they are not tested (only A+B) but add ~400 lines of prompt context

💡 5 Actionable Prompts

🔧 Prompt 1 — Optimization (Daily Syntax Error Quality Check)

Optimize the Daily Syntax Error Quality Check workflow (.github/workflows/daily-syntax-error-quality.md) to reduce its per-run token cost from ~3.4M to ~2.0M tokens (target: ≤35 turns vs current 56).

Three concrete edits:
1. Add a bash pre-step before the agent job that runs:
   find .github/workflows -name '*.md' -type f ! -name 'daily-*.md' ! -name '*-test.md' > /tmp/gh-aw/agent/candidates.txt
   This eliminates Phase 1 discovery turns (~8-10 turns saved).

2. Move the `Install gh-aw CLI` step from within the agent prompt to a workflow `steps:` pre-step block — the current prompt instructs the agent to install the extension, burning ~3-5 turns.

3. Add `max-turns: 35` to the frontmatter and change the "Target: Complete the full analysis in ≤ 40 turns" prompt instruction to ≤ 30 turns. The 14-day prior average for this task shape is ~35 turns.

Evidence: Run 24991120216 — 56 turns, resource_heavy_for_domain HIGH, agentic_fraction=0.50.
Expected savings: ~1.36M tokens/run.

🛡️ Prompt 2 — Stability Fix (Smoke CI)

Fix the Smoke CI workflow (.github/workflows/smoke-ci.md) to eliminate its 100% error rate (22 errors from 22 runs in the last 7 days).

The known root cause from the 2026-04-22 optimization audit: the workflow triggers on all push-to-main events without a paths filter, causing cascading cancellations (64% cancellation rate in that audit). The remaining runs fail with missing-tool or infrastructure errors.

Concrete fix:
1. Add a paths filter to the push trigger in the frontmatter `on:` block:
   on:
     push:
       branches: [main]
       paths: ['cmd/**', 'pkg/**', '*.go', 'go.mod']
2. Remove the reaction and status-comment triggers if present — these fire on every PR event and are not needed for CI smoke testing.
3. Add `concurrency: group: smoke-ci-$\{\{ github.ref }}, cancel-in-progress: true` to prevent queue pile-ups.

Evidence: 22/22 runs errored in the 7-day window; prior optimization log (2026-04-22) documents the trigger misconfiguration.

🔀 Prompt 3 — Consolidation (Daily Analysis Workflows)

Evaluate consolidating the Daily DIFC Integrity-Filtered Events Analyzer and Daily Firewall Logs Collector and Reporter workflows into a single "Daily Security Observability" workflow.

Why they overlap:
- Same task domain: security/observability analysis
- Same schedule: daily
- Similar behavioral fingerprint: read_only actuation, 35-43 turns, 2.5-2.7M tokens/run
- Both write to cache-memory and create discussions/issues as outputs
- Both use the same reporting-otlp shared import

What the surviving workflow should absorb:
1. Run firewall log collection first (faster, ~35 turns), then DIFC analysis (~43 turns) in sequence
2. Merge the prompt phases: Firewall collection becomes Phase 1-2, DIFC analysis becomes Phase 3-4
3. Single shared `cache-memory` path: `/tmp/gh-aw/cache-memory/security-observability/`
4. One combined discussion output covering both firewall and DIFC signals

Expected savings: ~2.7M tokens/week (eliminating one daily run at ~2.7M avg tokens).
Base the surviving workflow on the DIFC analyzer (more complex prompt) and merge the firewall sections in.

✂️ Prompt 4 — Right-sizing (Package Specification Extractor)

Right-size the Package Specification Extractor workflow (.github/workflows/package-specification-extractor.md) which consumed 3.0M tokens in a single run (51 turns, resource_heavy signal expected based on domain pattern).

Why it is overkill:
- 51 turns for what is essentially a deterministic file-parsing + enrichment task
- agentic_fraction is likely 0.50+ (half turns are data-gathering)
- Package specification extraction can be partially replaced with deterministic scripts

Recommended right-sizing:
1. Add bash pre-steps to:
   - List all packages: `cat go.mod | grep -E '^\s+[a-z]' > /tmp/packages.txt`
   - Fetch existing specs: `ls pkg/parser/schemas/*.json > /tmp/existing-specs.txt`
   - Pre-load only the top 10 most-changed packages from git log
2. Reduce the agentic phase to enrichment-only (not discovery)
3. Add max-turns: 25 to cap runaway

Alternative: If the workflow runs less than weekly and produces archival output, consider routing to a `codex` engine (cheaper per-token) or scheduling it monthly instead of more frequently.

Evidence: Run token profile (3.0M/run, 51 turns) matches the partially_reducible pattern seen in Architecture Guardian (47 turns) and Daily Community Attribution Updater (54 turns) — all amenable to pre-step deterministic collection.

🚀 Prompt 5 — Portfolio Maintenance (REVIEW-quadrant workflows)

Audit the following single-run workflows in the REVIEW quadrant (low value proxy, high token cost) to determine if they should be scheduled less frequently, converted to on-demand, or retired:

Priority order (by token cost):
1. Daily Community Attribution Updater — 3.24M tokens, 54 turns, 1 run (recently optimized; check if optimization landed)
2. Documentation Noob Tester — 2.85M tokens, 54 turns, 1 run (recently optimized; verify max-turns landed)
3. Daily DIFC Integrity-Filtered Events Analyzer — 2.68M tokens, 43 turns, 1 run

For each workflow:
1. Check if the most recent optimization recommendation was implemented (read optimization-log.json)
2. If not implemented: create a follow-up issue linking to the prior recommendation
3. If implemented: measure turn reduction vs. baseline and update rolling-summary.json
4. If the workflow has run ≤ 1 time in 7 days with no downstream consumers (no issues/PRs created): propose converting from `schedule: daily` to `schedule: weekly` or `workflow_dispatch`

This portfolio review will surface ~5-8M tokens/week in scheduling waste if even 2 daily workflows convert to weekly cadence.

Full Per-Workflow Baseline Breakdown (7 days)

Workflow	Runs	Total Tokens	Avg Tokens/Run	Avg Turns	Errors	Flags
Contribution Check	6	5,378,332	896,389	27.5	0	EXPENSIVE
Test Quality Sentinel	16	3,838,714	239,920	5.7	0	EXPENSIVE
Daily Syntax Error Quality Check	1	3,399,393	3,399,393	56.0	0	EXPENSIVE
Workflow Health Manager - Meta-Orchestrator	2	3,339,880	1,669,940	29.0	0	EXPENSIVE
Daily Community Attribution Updater	1	3,239,894	3,239,894	54.0	0	EXPENSIVE
Package Specification Extractor	1	3,003,796	3,003,796	51.0	0	EXPENSIVE
Documentation Noob Tester	1	2,851,307	2,851,307	54.0	0	EXPENSIVE
Daily DIFC Integrity-Filtered Events Analyzer	1	2,682,489	2,682,489	43.0	0	EXPENSIVE
Daily Firewall Logs Collector and Reporter	1	2,486,927	2,486,927	35.0	0	EXPENSIVE
Agentic Observability Kit	1	1,986,286	1,986,286	33.0	0	EXPENSIVE
Copilot PR Conversation NLP Analysis	1	1,902,834	1,902,834	36.0	1	EXPENSIVE
Copilot CLI Deep Research Agent	1	1,828,354	1,828,354	29.0	0	EXPENSIVE
Layout Specification Maintainer	1	1,574,855	1,574,855	28.0	0	EXPENSIVE
Daily Project Performance Summary Generator	1	1,468,298	1,468,298	29.0	0	EXPENSIVE
Daily Copilot Token Usage Audit	1	1,398,586	1,398,586	28.0	0	EXPENSIVE
Organization Health Report	1	1,301,614	1,301,614	25.0	0	EXPENSIVE
Issue Monster	7	1,238,213	176,888	4.6	0	EXPENSIVE
Daily MCP Tool Concurrency Analysis	1	1,216,157	1,216,157	21.0	0	EXPENSIVE
PR Triage Agent	4	1,157,745	289,436	6.8	0	EXPENSIVE
Daily Testify Uber Super Expert	1	1,072,672	1,072,672	20.0	0	EXPENSIVE
Glossary Maintainer	1	1,054,882	1,054,882	18.0	0	EXPENSIVE
Daily Compiler Quality Check	1	1,021,501	1,021,501	23.0	0	EXPENSIVE
Draft PR Cleanup	1	1,016,517	1,016,517	25.0	0	EXPENSIVE
Copilot PR Prompt Pattern Analysis	1	981,721	981,721	20.0	0	EXPENSIVE
jsweep - JavaScript Unbloater	1	974,246	974,246	17.0	0	EXPENSIVE
Workflow Normalizer	1	969,746	969,746	19.0	0	EXPENSIVE
Delight	1	929,741	929,741	14.0	0	EXPENSIVE
Copilot Opt	1	853,663	853,663	18.0	0	EXPENSIVE
Weekly Blog Post Writer	1	811,362	811,362	16.0	0	EXPENSIVE
Metrics Collector - Infrastructure Agent	1	797,910	797,910	19.0	0	EXPENSIVE
Daily CLI Performance Agent	1	759,592	759,592	15.0	0	EXPENSIVE
Daily CLI Tools Exploratory Tester	1	721,404	721,404	14.0	1	EXPENSIVE
Package Specification Librarian	1	717,693	717,693	15.0	0	EXPENSIVE
Architecture Diagram Generator	1	664,842	664,842	14.0	0	EXPENSIVE
Daily Malicious Code Scan Agent	1	664,050	664,050	15.0	0	EXPENSIVE
Agent Persona Explorer	1	660,888	660,888	14.0	0	EXPENSIVE
GPL Dependency Cleaner (gpclean)	1	656,472	656,472	14.0	0	EXPENSIVE
Daily Safe Output Integrator	1	622,033	622,033	13.0	0	EXPENSIVE
Auto-Triage Issues	4	575,002	143,750	4.8	0	EXPENSIVE
Sub-Issue Closer	1	543,610	543,610	13.0	0	EXPENSIVE
Repository Quality Improvement Agent	1	510,594	510,594	10.0	0	EXPENSIVE
Dev	1	459,439	459,439	9.0	0	—
Agent Performance Analyzer - Meta-Orchestrator	1	447,592	447,592	9.0	0	—
Terminal Stylist	1	442,148	442,148	9.0	0	—
Code Simplifier	1	427,198	427,198	10.0	0	—
Refactoring Cadence	1	406,306	406,306	10.0	0	—
Dependabot Dependency Checker	1	404,617	404,617	8.0	0	—
Weekly Safe Outputs Specification Review	1	373,772	373,772	10.0	0	—
Daily Hippo Learn	1	323,618	323,618	11.0	0	—
Daily Semgrep Scan	1	318,611	318,611	9.0	0	—
Dead Code Removal Agent	1	314,606	314,606	7.0	0	—
Daily Workflow Updater	1	310,853	310,853	8.0	0	—
Daily Secrets Analysis Agent	1	168,053	168,053	4.0	0	—
Daily Skill Optimizer Improvements	1	151,335	151,335	4.0	0	—
Daily Regulatory Report Generator	1	138,714	138,714	3.0	0	—
Constraint Solving — Problem of the Day	1	97,154	97,154	4.0	0	—
Smoke CI	22	0	0	0.0	22	NOISY
Visual Regression Checker	5	0	0	0.0	7	NOISY
Daily Issues Report Generator	1	0	0	0.0	1	—
Daily News	1	0	0	0.0	1	—
GitHub Remote MCP Authentication Test	1	0	0	0.0	1	—

Episode Detail (30-day)

All 54 analyzed episodes returned episode_risk_score = 0 and escalation_eligible = false. This reflects the 30-day window being dominated by standalone episodes (1 run each) with clean execution profiles — no MCP failures, no blocked requests, no risky classifications, and no control degradation signals.

The 30-day dataset returned 54 episodes from 54 runs (all standalone). The 7-day baseline contains 119 runs, indicating episode stitching operates on a longer lookback than the download window captured. No multi-run episodes with compound risk signals were detected in the available data.

Portfolio Opportunities

Consolidation candidates (same domain + schedule + behavioral fingerprint):

Daily DIFC Integrity-Filtered Events Analyzer + Daily Firewall Logs Collector and Reporter — both daily security observability workflows, read_only actuation, ~35-43 turns, ~2.5-2.7M tokens. Overlap score: ~0.70 (same domain, same schedule, similar behavior). See Prompt 3 for consolidation plan.

Stale workflows (1 run in 7 days, no errors, low turn count — candidates for weekly cadence):

Daily Regulatory Report Generator (3 turns, 138k tokens — very lightweight, unclear value)
Constraint Solving — Problem of the Day (4 turns, 97k tokens — experimental)
Daily Hippo Learn (11 turns, 323k tokens — educational, not production)

Overkill candidates (not selected as optimization target this week):

Package Specification Extractor — 51 turns, 3.0M tokens for a package-listing task
Daily Community Attribution Updater — 54 turns, 3.2M tokens (recently optimized; check if landed)
Documentation Noob Tester — 54 turns, 2.85M tokens (recently optimized; check if landed)

Optimization Analysis Detail: Daily Syntax Error Quality Check

Workflow: daily-syntax-error-quality.md
Run analyzed: §24991120216
Tokens: 3,399,393 | Turns: 56 | Duration: 9m1s | Action minutes: 10

Area 1: Tool Usage

Only 1 run available (below the ≥5 run threshold for tool-removal recommendations). Tools configured: bash (5 allowlisted commands: find, compile, cat, head, cp, mkdir), mount-as-clis: true, safe-outputs: create-issue.

All configured tools are used by design. No tool removal recommended with current sample size.

Area 2: Token Efficiency

56 turns for a 2-workflow compiler test is 40-60% above expected for this task shape (estimate: 30-35 turns). agentic_fraction=0.50 means ~28 turns are read-gather operations. Cache hit rate: not available from single run. The workflow prompt is ~400 lines, with ~150 lines devoted to scoring rubrics that are read but not iteratively consulted.

Key waste drivers:

Phase 1 (workflow discovery): ~8-10 turns to find and evaluate candidate workflows
Phase 2 (error introduction + documentation): ~6-8 turns per test case = 12-16 total
Phase 3 (compile + capture): 2-3 turns per test case
Phase 4 (scoring): 6-8 turns for rubric evaluation
Phase 5 (report generation): 5-8 turns

Opportunity: Move Phase 1 to a deterministic pre-step (save ~8-10 turns).

Area 3: Reliability

0 errors, 0 warnings. The resource_heavy_for_domain assessment is HIGH severity — this is a turn-count signal, not a functional failure. The partially_reducible assessment is LOW severity. The workflow has not failed in the observed run.

Risk: Without a max-turns cap, turn runaway is possible if the agent encounters unexpected compiler output or retries. The current timeout-minutes: 20 provides a safety net but allows up to ~80 turns at ~15s/turn.

Area 4: Prompt Efficiency

The prompt has several over-specified sections:

Category C error scenarios (~80 lines): Only Categories A and B are used in the 2-test protocol. Category C definitions are loaded into context but never exercised.
Scoring rubric detail (~120 lines): The 5-dimension rubric with sub-scores is thorough but verbose. The agent reads it once per evaluation but the examples within it (~60 lines) could be condensed.
"Important Guidelines" and "Best Practices" sections (~50 lines): Largely redundant with the main instructions.

Prompt reduction target: ~200 lines → saves approximately 50-80k tokens in prompt context per run.

References:

§24991120216 — Daily Syntax Error Quality Check (optimization target run)
§24997761991 — This analysis run

Generated by Agentic Optimization Kit · ● 3.8M · ◷

expires on May 4, 2026, 2:35 PM UTC

2026-04-27T22:12:59Z

github-actions[bot]
Bot Apr 27, 2026
Author

This discussion has been marked as outdated by Agentic Optimization Kit.

A newer discussion is available at Discussion #28810.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agentic-optimization-kit] Weekly Agentic Optimization Report — 2026-04-21 to 2026-04-27 #28742

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[agentic-optimization-kit] Weekly Agentic Optimization Report — 2026-04-21 to 2026-04-27 #28742

Uh oh!

github-actions[bot] Bot Apr 27, 2026

📊 Executive Summary

📈 Visual Diagnostics

1. Token Usage by Workflow

2. Historical Token Trend

3. Episode Risk–Cost Frontier

4. Workflow Stability Matrix

5. Repository Portfolio Map

🚨 Escalation Targets

🎯 Optimization Target: Daily Syntax Error Quality Check

💡 5 Actionable Prompts

🔧 Prompt 1 — Optimization (Daily Syntax Error Quality Check)

🛡️ Prompt 2 — Stability Fix (Smoke CI)

🔀 Prompt 3 — Consolidation (Daily Analysis Workflows)

✂️ Prompt 4 — Right-sizing (Package Specification Extractor)

🚀 Prompt 5 — Portfolio Maintenance (REVIEW-quadrant workflows)

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Apr 27, 2026 Author

github-actions[bot]
Bot Apr 27, 2026

github-actions[bot]
Bot Apr 27, 2026
Author