Skip to content

feat: add agentic-optimization-kit workflow consolidating token audit, optimizer, and observability kit#28009

Merged
mnkiefer merged 2 commits intomainfrom
copilot/consolidate-optimization-workflows
Apr 23, 2026
Merged

feat: add agentic-optimization-kit workflow consolidating token audit, optimizer, and observability kit#28009
mnkiefer merged 2 commits intomainfrom
copilot/consolidate-optimization-workflows

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

Summary

Creates agentic-optimization-kit.md — a single weekly workflow that consolidates all analysis capabilities from three existing source workflows:

Source workflow Schedule What it contributes
copilot-token-audit.md Daily weekdays Baseline per-workflow aggregates, rolling 90-day history, heavy-hitter flags
copilot-token-optimizer.md Daily weekdays Target selection, deep 4-area analysis, ranked recommendations, cooldown log
agentic-observability-kit.md Weekly Monday Episode/DAG lineage, risk/control signals, portfolio map, escalation gating

Consolidation principles applied

  • Source-of-truth precedence: Episodes (observability) → Optimization actions → Baseline numbers
  • Unified duplicates: Single log download, single ranking pass, shared repo-memory branch (memory/token-audit)
  • Graphs first: All 5 charts embedded before any text walls
  • Details in <details>: Per-workflow breakdown, episode detail, portfolio opportunities, optimization evidence all collapsed by default
  • 5 actionable prompt artifacts always emitted: optimization, stability, consolidation, right-sizing, escalation — each a copyable fenced instruction block

What's new beyond the source workflows

  1. 5 charts instead of 2+4: token bar + trend line (from audit) + episode frontier + stability matrix + portfolio map (from observability) — all in one discussion
  2. Actionable prompt mining: Five ready-to-paste prompts generated every run for continuous system improvement
  3. Documented scoring rationale: Threshold values (30%, 100k, ≥5 runs), formula weights (sum-to-1, min-max normalization), and overlap score component types are all explained inline

Security review

New actions introduced via copilot-setup-steps.yml import (same as every other Copilot workflow in this repository):

  • docker/build-push-action — standard Docker action, already pinned to SHA digest across all Copilot lock files in this repo ✅
  • docker/setup-buildx-action — same, already approved and in use ✅

No new secrets introduced. Both actions are already reviewed and approved for all other Copilot workflows.

Pre-existing test failures

Two test failures in pkg/workflow (TestWasmGolden_CompileFixtures/basic-copilot and TestCopilotDetectionDefaultModel) exist on the base branch before this change — they are golden file diffs about COPILOT_MODEL/COPILOT_API_KEY env vars unrelated to workflow authoring.

…, optimizer, and observability kit

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e3b520b9-3b91-4e17-be93-7b8388e17a11

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@mnkiefer mnkiefer marked this pull request as ready for review April 23, 2026 07:43
Copilot AI review requested due to automatic review settings April 23, 2026 07:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new weekly “Agentic Optimization Kit” workflow that consolidates token auditing, optimization targeting, and observability/episode analysis into a single discussion report with embedded charts and prompt artifacts.

Changes:

  • Introduces a consolidated weekly workflow (agentic-optimization-kit.md) that downloads recent Copilot run logs, maintains repo-memory history, and drives a multi-phase analysis/reporting prompt.
  • Generates the compiled workflow lock file (agentic-optimization-kit.lock.yml) with pinned actions/containers and the resolved job graph.
Show a summary per file
File Description
.github/workflows/agentic-optimization-kit.md New consolidated weekly kit workflow definition + agent instructions (baseline → optimization target → observability → charts → discussion/issue).
.github/workflows/agentic-optimization-kit.lock.yml Compiled lock file for the new workflow (generated by gh aw compile).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

error_count: (.error_count // 0),
warning_count: (.warning_count // 0)
}
]
Comment thread .github/workflows/agentic-optimization-kit.md Outdated
@mnkiefer
Copy link
Copy Markdown
Collaborator

@copilot check source of truth and correctness of consolidation and create a table to list the resulting report components, their source of truth and their significance for COGS reduction.

…tions minutes' typo

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1d70519c-b19b-45ed-b7ea-8e1d857bb1fc

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 23, 2026

@copilot check source of truth and correctness of consolidation and create a table to list the resulting report components, their source of truth and their significance for COGS reduction.

Here's the consolidation correctness check and component table with COGS significance:

Source-of-truth verification — each capability is taken from its authoritative source as defined by the plan's precedence rules (Episodes > Optimization actions > Baseline numbers):

Report Component Source of Truth Origin Workflow COGS Reduction Significance
Per-workflow token/cost/turn/action-minute aggregates (7-day) Audit (baseline numeric truth) copilot-token-audit.md High — identifies which workflows consume the most tokens/cost; directly pinpoints where to cut spend
Rolling 90-day trend (daily totals) Audit (baseline numeric truth) copilot-token-audit.md Medium — reveals whether overall token spend is growing or shrinking; validates whether previous optimizations landed
Top-10 workflow ranking by total tokens Audit (baseline numeric truth) copilot-token-audit.md (pre-step jq, now with sort_by fix in 1c95112) High — feeds optimizer target selection; ensures agent time goes to the highest-ROI candidate
Heavy-hitter flags (is_dominant >30%, is_expensive_per_run >100k avg, is_noisy) Audit (baseline numeric truth) copilot-token-audit.md High — fast triage; dominant workflows can account for disproportionate COGS, catching them early prevents runaway spend
Optimization target selection (14-day cooldown, self-exclusion) Optimizer (action truth) copilot-token-optimizer.md High — prevents churning the same workflow repeatedly; focuses per-run investment on the next highest-savings candidate
4-area deep analysis matrix (tool usage, token efficiency, reliability, prompt efficiency) Optimizer (action truth) copilot-token-optimizer.md High — root-cause attribution; maps spend to specific removable tools or redundant prompt sections
Ranked recommendations with estimated savings/run Optimizer (action truth) copilot-token-optimizer.md High — directly drives token/cost reduction by giving concrete actionable edits with predicted ROI
Optimization history log + cooldown windows Optimizer (action truth) copilot-token-optimizer.md Medium — prevents rework and gives a cumulative savings ledger; tracks COGS impact over time
Episode/DAG-level lineage (episodes[], edges[], confidence) Observability Kit (episode/context truth) agentic-observability-kit.md Medium — reveals whether expensive runs are isolated or systemic across a DAG; prevents optimizing a leaf node when the root episode is the true cost driver
Regression/control-risk analysis (new MCP failures, blocked-request increases, write-posture drift) Observability Kit (episode/context truth) agentic-observability-kit.md Medium — identifies reliability-driven token waste (failed retries, safety overruns); fixing control issues cuts cost and risk together
Domain-aware interpretation guardrails (triage/research/code_fix/etc.) Observability Kit (episode/context truth) agentic-observability-kit.md Medium — prevents false optimization calls on legitimately expensive domains (e.g., research); avoids cuts that would degrade quality rather than reduce waste
Portfolio overlap/stale workflow analysis Observability Kit (episode/context truth) agentic-observability-kit.md High — identifies redundant workflows that run in parallel and double-spend tokens on the same problem; consolidation directly cuts COGS
5 decision charts (token bar, trend line, risk frontier, stability matrix, portfolio map) Observability Kit (episode/context truth) agentic-observability-kit.md Low (reporting) — no direct COGS impact; accelerates human decision-making so fixes happen sooner
Escalation gating + single-issue policy Observability Kit (episode/context truth) agentic-observability-kit.md Low (governance) — keeps signal-to-noise high; ensures owner attention goes to real regressions, not spurious alerts
5 actionable prompt artifacts New (kit-level synthesis) All three sources combined High — each prompt is a ready-to-apply COGS reduction action; the mining capability compounds value across subsequent agent runs

Consolidation correctness: the precedence hierarchy (Episodes → Optimization actions → Baseline numbers) is respected throughout all six phases. No capability has been dropped from any source workflow. The only deduplication was the log-download step and the token-ranking logic, which previously ran independently in both Audit and Optimizer.

Copilot AI requested a review from mnkiefer April 23, 2026 07:53
@mnkiefer
Copy link
Copy Markdown
Collaborator

@copilot include a checkbox in the table you just created if it has a graph in the report.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 23, 2026

@copilot include a checkbox in the table you just created if it has a graph in the report.

Updated table with a Graph column (☑ = has a dedicated chart in the report):

Report Component Source of Truth Origin Workflow Graph COGS Reduction Significance
Per-workflow token/cost/turn/action-minute aggregates (7-day) Audit (baseline numeric truth) copilot-token-audit.md ☑ Chart 1: Token Usage by Workflow High — identifies which workflows consume the most tokens/cost; directly pinpoints where to cut spend
Rolling 90-day trend (daily totals) Audit (baseline numeric truth) copilot-token-audit.md ☑ Chart 2: Historical Token Trend Medium — reveals whether overall token spend is growing or shrinking; validates whether previous optimizations landed
Top-10 workflow ranking by total tokens Audit (baseline numeric truth) copilot-token-audit.md ☑ Chart 1 (feeds ranking) High — feeds optimizer target selection; ensures agent time goes to the highest-ROI candidate
Heavy-hitter flags (is_dominant >30%, is_expensive_per_run >100k avg, is_noisy) Audit (baseline numeric truth) copilot-token-audit.md High — fast triage; dominant workflows can account for disproportionate COGS, catching them early prevents runaway spend
Optimization target selection (14-day cooldown, self-exclusion) Optimizer (action truth) copilot-token-optimizer.md High — prevents churning the same workflow repeatedly; focuses per-run investment on the next highest-savings candidate
4-area deep analysis matrix (tool usage, token efficiency, reliability, prompt efficiency) Optimizer (action truth) copilot-token-optimizer.md High — root-cause attribution; maps spend to specific removable tools or redundant prompt sections
Ranked recommendations with estimated savings/run Optimizer (action truth) copilot-token-optimizer.md High — directly drives token/cost reduction by giving concrete actionable edits with predicted ROI
Optimization history log + cooldown windows Optimizer (action truth) copilot-token-optimizer.md Medium — prevents rework and gives a cumulative savings ledger; tracks COGS impact over time
Episode/DAG-level lineage (episodes[], edges[], confidence) Observability Kit (episode/context truth) agentic-observability-kit.md ☑ Chart 3: Episode Risk–Cost Frontier Medium — reveals whether expensive runs are isolated or systemic across a DAG; prevents optimizing a leaf node when the root episode is the true cost driver
Regression/control-risk analysis (new MCP failures, blocked-request increases, write-posture drift) Observability Kit (episode/context truth) agentic-observability-kit.md ☑ Chart 4: Stability Matrix Medium — identifies reliability-driven token waste (failed retries, safety overruns); fixing control issues cuts cost and risk together
Domain-aware interpretation guardrails (triage/research/code_fix/etc.) Observability Kit (episode/context truth) agentic-observability-kit.md Medium — prevents false optimization calls on legitimately expensive domains (e.g., research); avoids cuts that would degrade quality rather than reduce waste
Portfolio overlap/stale workflow analysis Observability Kit (episode/context truth) agentic-observability-kit.md ☑ Chart 5: Portfolio Map High — identifies redundant workflows that run in parallel and double-spend tokens on the same problem; consolidation directly cuts COGS
5 decision charts (token bar, trend line, risk frontier, stability matrix, portfolio map) Observability Kit (episode/context truth) agentic-observability-kit.md ☑ (all 5 charts) Low (reporting) — no direct COGS impact; accelerates human decision-making so fixes happen sooner
Escalation gating + single-issue policy Observability Kit (episode/context truth) agentic-observability-kit.md Low (governance) — keeps signal-to-noise high; ensures owner attention goes to real regressions, not spurious alerts
5 actionable prompt artifacts New (kit-level synthesis) All three sources combined High — each prompt is a ready-to-apply COGS reduction action; the mining capability compounds value across subsequent agent runs

8 of 15 components have a corresponding graph. The 5 un-graphed High-COGS components (heavy-hitter flags, target selection, 4-area analysis, ranked recommendations, optimization history) are text/table only — candidates for a future chart phase if visual coverage of the optimizer output becomes a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants