Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,7 @@ export default defineConfig({
{ label: 'MultiRepoOps', link: '/patterns/multi-repo-ops/' },
{ label: 'Monitoring', link: '/patterns/monitoring/' },
{ label: 'Agentic Observability Kit', link: '/patterns/agentic-observability-kit/' },
{ label: 'Agentic Optimization Kit', link: '/patterns/agentic-optimization-kit/' },
{ label: 'Orchestration', link: '/patterns/orchestration/' },
{ label: 'ProjectOps', link: '/patterns/project-ops/' },
{ label: 'ResearchPlanAssignOps', link: '/patterns/research-plan-assign-ops/' },
Expand Down
134 changes: 134 additions & 0 deletions docs/src/content/docs/patterns/agentic-optimization-kit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
title: Agentic Optimization Kit
description: Use the built-in Agentic Optimization Kit to consolidate token auditing, optimization targeting, and agentic observability into one weekly executive report with actionable prompt artifacts.
---

> [!WARNING]
> **Experimental:** The Agentic Optimization Kit is still experimental! Things may break, change, or be removed without deprecation at any time.

The Agentic Optimization Kit is a weekly unified analyst that consolidates token auditing, optimization targeting, and agentic observability into one executive report with actionable prompt artifacts. It runs on a weekly schedule, analyzes the last 7 days of Copilot workflow run data, selects a high-ROI optimization target, and produces five diagnostic charts alongside a structured discussion report.

This pattern is useful when a repository has enough agentic activity that maintainers need to track cost trends, identify expensive workflows, and surface concrete optimization opportunities β€” all in one place.

## What it does

The kit runs through three analysis phases each week:

1. **Baseline audit** β€” Aggregates token usage, cost, turns, action minutes, errors, and warnings across all Copilot workflow runs from the last 7 days. Flags dominant workflows (>30% of total tokens), expensive-per-run workflows (>100k avg tokens/run), and noisy workflows (>50% error/warning rate).

2. **Optimization target selection and analysis** β€” Selects the highest-token workflow not optimized in the last 14 days, audits its runs in depth, and produces ranked recommendations covering tool usage efficiency, token efficiency, reliability, and prompt structure.

3. **Episode and observability analysis** β€” Analyzes 30 days of episodic run data to compute workflow instability scores, value proxies, and portfolio positioning. Identifies stale workflows, overlap pairs, and consolidation candidates.

After analysis, it generates five diagnostic charts and publishes a discussion with the full executive report. If repeated escalation signals are detected, it opens one escalation issue.

## Visual diagnostics

The kit produces five charts at publication quality:

### 1. Token Usage by Workflow

A horizontal bar chart of the top 15 workflows by total tokens over the last 7 days. Bars are color-coded by risk flag: dominant (>30% token share), expensive-per-run (>100k avg), or noisy (>50% error rate).

This answers: which workflows are consuming the most tokens, and are those levels expected?

### 2. Historical Token Trend

A line chart of daily total tokens and total cost from the rolling 90-day summary. Captures week-over-week direction to surface whether spend is improving, flat, or growing.

### 3. Episode Risk–Cost Frontier

A scatter plot of execution episodes in cost-risk space. The x-axis is episode cost, the y-axis is a composite risk score (risky nodes, poor-control signals, MCP failures, blocked requests, escalation eligibility), and point size reflects run count. Pareto-frontier outliers are annotated.

This answers: which execution chains are both expensive and risky?

### 4. Workflow Stability Matrix

A heatmap of workflow instability signals. Rows are workflows sorted by instability score; columns are the six underlying signal rates (risky run rate, poor-control rate, resource-heavy rate, latest-success fallback rate, blocked-request rate, MCP failure rate).

This answers: which workflows are chronically unstable, and in which dimensions?

### 5. Repository Portfolio Map

A scatter plot positioning all workflows in value–cost space. The x-axis is recent cost, the y-axis is an evidence-based value proxy (successful usage, stability, repeat use, absence of overkill signals), and the quadrants map to actionable categories: `keep`, `optimize`, `simplify`, `review`.

This answers: which workflows justify their cost, and which warrant a maintainer decision?

## Report structure

The weekly discussion follows this structure:

1. **Executive Summary** β€” period, total runs/tokens/cost/action-minutes, heavy-hitters, optimization target selection.
2. **Visual Diagnostics** β€” all five charts, each with a one-sentence `Decision` and `Why it matters`.
3. **Escalation Targets** β€” workflows that crossed repeated-threshold conditions (omitted when none).
4. **Optimization Target** β€” the selected workflow, its run metrics, and a ranked recommendation table (up to five recommendations with estimated savings per run).
5. **Five Actionable Prompts** β€” ready-to-use prompt artifacts covering optimization, stability fix, consolidation, right-sizing, and escalation.
6. **Full Per-Workflow Breakdown** β€” complete baseline table in a collapsible section.
7. **Episode Detail** β€” top episodes by risk score in a collapsible section.
8. **Portfolio Opportunities** β€” stale workflows, overlap pairs, and overkill candidates in a collapsible section.
9. **Optimization Analysis Detail** β€” full four-area analysis for the selected target in a collapsible section.

## Escalation conditions

The kit creates one escalation issue when any of the following holds across the last 14 days:

- Any episode is marked `escalation_eligible`.
- Two or more runs for the same workflow are classified as `risky`.
- Two or more runs have `new_mcp_failure` or `blocked_requests_increase` in their regression reason codes.
- Two or more runs for the same workflow have a medium or high severity `resource_heavy_for_domain` or `poor_agentic_control` assessment.

If no threshold is crossed, no issue is created.

## Relationship to the Agentic Observability Kit

The Agentic Optimization Kit and the [Agentic Observability Kit](/gh-aw/patterns/agentic-observability-kit/) are complementary:

- The **Observability Kit** focuses on operational regression detection, episode-level risk, and portfolio cleanup at repository scope.
- The **Optimization Kit** adds a weekly cost-focused audit layer with optimization targeting, historical trend tracking, token efficiency analysis, and the five actionable prompts.

Repositories with active Copilot usage benefit from running both. The Observability Kit surfaces operational regressions; the Optimization Kit surfaces spend efficiency and actionable reduction opportunities.

## Metrics and scoring

The kit uses several derived metrics to make the charts decision-oriented.

`episode_risk_score`
A composite risk score for a single execution episode. Combines risky nodes, poor-control nodes, MCP failures, blocked requests, trend signals, and escalation eligibility. Escalation-eligible episodes are weighted at 2Γ— because they already represent multiple threshold crossings.

`workflow_instability_score`
A workflow-level score derived from repeated risky runs, poor-control assessments, resource-heavy assessments, latest-success fallback usage, blocked requests, and MCP failures. Used to separate chronic instability from one-off incidents.

`workflow_value_proxy`
A repository-local proxy for workflow value combining recent successful usage (0.35), stability (0.25), repeat invocations (0.20), and the absence of overkill signals (0.20). Used to position workflows in the portfolio map quadrants.

`workflow_overlap_score`
An approximate similarity score between two workflows. Blends task domain, schedule family, behavior cluster, naming similarity, and assessment similarity. Values β‰₯ 0.55 are flagged as strong consolidation candidates.

## Flags and thresholds

| Flag | Condition | Implication |
|------|-----------|-------------|
| `is_dominant` | Workflow accounts for > 30% of total tokens | Single workflow concentrating systemic risk |
| `is_expensive_per_run` | Avg tokens/run > 100,000 | Prompt or tool configuration is the primary cost lever |
| `is_noisy` | Error + warning count > 50% of run count | Reliability waste consuming tokens without useful output |

Optimization cooldown prevents the same workflow from being selected again within 14 days.

## Cost caveats

The kit's cost signals are estimates, not billing records.

- `action_minutes` is derived from workflow duration rounded to billable minutes. Use it for relative comparison and trend detection, not as a GitHub invoice figure.
- `estimated_cost` reflects engine log fields. It is appropriate for portfolio prioritization, but should be treated as a run-level estimate rather than finance-grade accounting.
- Effective Tokens (when used instead of raw cost) normalize token classes and apply a per-model multiplier. They make cross-run and cross-model comparisons more useful, but they are not a billing unit.

## Source workflow

The built-in workflow lives at [`/.github/workflows/agentic-optimization-kit.md`](https://github.com/github/gh-aw/blob/main/.github/workflows/agentic-optimization-kit.md).

## Related documentation

- [Agentic Observability Kit](/gh-aw/patterns/agentic-observability-kit/) β€” operational regression detection and portfolio cleanup
- [Cost Management](/gh-aw/reference/cost-management/) β€” Actions minutes, inference spend, and optimization levers
- [`gh aw logs`](/gh-aw/setup/cli/#logs) β€” cross-run log analysis used as the primary data source
- [`gh aw audit`](/gh-aw/setup/cli/#audit) β€” single-run detailed reports
5 changes: 5 additions & 0 deletions docs/src/content/docs/setup/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,10 +165,15 @@ gh aw add githubnext/agentics/ci-doctor # Add single workflow
gh aw add githubnext/agentics/ci-doctor@v1.0.0 # Add specific version
gh aw add githubnext/agentics/ci-doctor --dir shared # Organize in subdirectory
gh aw add githubnext/agentics/ci-doctor --create-pull-request # Create PR instead of commit
gh aw add workflow1 workflow2 workflow3 # Add multiple workflows at once
gh aw add githubnext/agentics/ci-doctor --name my-doctor # Override filename for single workflow
```

**Options:** `--dir/-d`, `--create-pull-request`, `--no-gitattributes`, `--append`, `--disable-security-scanner`, `--engine/-e`, `--force/-f`, `--name/-n`, `--no-stop-after`, `--stop-after`

> [!NOTE]
> The `--name` flag cannot be used when adding multiple workflows at once. It only applies when adding a single workflow. Using `--name` with multiple workflow arguments returns an error.

#### `new`

Create a workflow template in `.github/workflows/`. Opens for editing automatically.
Expand Down