[copilot-token-optimizer] Token Optimization: Auto-Triage Issues

### 🔍 Optimization Target: Auto-Triage Issues

**Selected because**: Highest token consumer not recently optimized (top-5 by 7-day total, 8 runs/week)
**Analysis period**: 2026-04-07 → 2026-04-14
**Runs analyzed**: 8 runs (2 audited in full detail)

### 📊 Token Usage Profile

| Metric | Value |
|---|---|
| Total tokens (7d) | ~2.1M |
| Avg tokens/run | 268K–1.0M (high variance) |
| Avg turns/run | 6.9–22.2 (high variance) |
| Cache efficiency | ~47–49% |
| Input/output ratio | 99.5% input tokens |
| Model | claude-sonnet-4.6 |
| Actuation | read-only |
| Classification | `resource_heavy_for_domain` (heavy run) |

The workflow runs every 6 hours. On a typical day it fires 8 times. The token cost varies wildly from run to run depending on how many issues are present and which path the agent takes.

---

### 🔧 Recommendations

#### 1. Replace `list_issues` with `search_issues no:label` as first step — Est. savings: ~200–400K tokens/run

**Evidence from both audited runs**: Every run starts with:
```
list_issues(owner: "github", repo: "gh-aw", state: "OPEN", perPage: 50)
→ Output too large to read at once (120–125 KB)
```
This 120–125 KB payload is saved to a `/tmp/...` file that the agent **cannot process** with bash (python3 and node are not in the allowed shell tools). The agent then burns 4–5 turns failing to parse it before eventually calling `search_issues` with `no:label` — which is the correct approach from the start.

The fix is to instruct the agent to **start with `search_issues` using the `no:label` qualifier** instead of fetching all open issues. This eliminates the large payload from the context window and the downstream bash failures.

**Action**: Add to the "On Scheduled Runs" section:
> Start with `search_issues` using query `repo:github/gh-aw is:issue is:open no:label` instead of `list_issues`. Only fetch individual issue details via `issue_read` for issues that need classification.

#### 2. Add a pre-agent bash step to pre-fetch unlabeled issues — Est. savings: ~150–300K tokens/run

**Evidence**: The `agentic_fraction = 0.50` assessment appears in both audited runs, meaning ~50% of turns are data-gathering. Pre-fetching unlabeled issue numbers in a deterministic bash step eliminates those turns from the agent's budget.

**Action**: Add a frontmatter `steps` block:
````yaml
steps:
  pre-agent:
    - name: Fetch unlabeled issues
      run: |
        gh api "repos/github/gh-aw/issues?state=open&labels=&per_page=30" \
          --jq '[.[] | select(.labels | length == 0) | {number: .number, title: .title, body: .body}]' \
          > /tmp/gh-aw/agent/unlabeled-issues.json
        echo "Unlabeled issues:" $(jq length /tmp/gh-aw/agent/unlabeled-issues.json)
```
Then update the prompt to read from `/tmp/gh-aw/agent/unlabeled-issues.json` instead of fetching via MCP.

#### 3. Suppress `search_repositories` tool calls — Est. savings: ~50–80K tokens/run

**Evidence from MCP gateway logs**: Both audited runs show **59–60 calls to `search_repositories`** that ALL fail with:
```
calling "tools/call": unknown tool "search_repositories"
````
The `issues` toolset only exposes 5 tools: `get_label`, `issue_read`, `list_issue_types`, `list_issues`, `search_issues`. But Claude (claude-sonnet-4.6) repeatedly tries to call `search_repositories` throughout both runs — likely trying to look up repository metadata to verify labels exist.

Each failed call returns an error message that is added to the model's context window, contributing to input token bloat. With 60 calls per run and 8 runs/day, this is ~480 unnecessary API round-trips daily.

**Action**: Add an explicit instruction to the prompt:
> **Do NOT call `search_repositories`** — it is not available in this workflow. Use `list_issues` or `search_issues` to find issues, and `get_label` to verify a label exists.

#### 4. Downgrade to a smaller model — Est. savings: 40–60% effective token cost

**Evidence**: Both agentic assessments flagged `model_downgrade_available` (severity: low):
> "This Triage run may not need a frontier model. A smaller model (e.g. `gpt-4.1-mini`, `claude-haiku-4-5`) could handle the task at lower cost."

The task is pure read-only label classification — no code generation, no complex reasoning. The output token count is consistently tiny (3.2K–3.9K out of 442K–875K total). A lighter model would handle keyword-based classification at a fraction of the inference cost.

**Action**: Add to the workflow frontmatter:
```yaml
engine:
  name: copilot
  model: gpt-4.1-mini
```
Test with a few runs before adopting permanently. Use `claude-haiku-4-5` as an alternative.

---

<details>
<summary><b>Tool Usage Matrix</b></summary>

| Tool | Configured? | Run A (24399761724) | Run B (24385625798) | Recommendation |
|---|---|---|---|---|
| `github.list_issues` | ✅ (issues toolset) | 1 call | 1 call | ⚠️ Replace with `search_issues no:label` as first step |
| `github.issue_read` | ✅ (issues toolset) | 15 calls | 13 calls | ✅ Keep |
| `github.search_issues` | ✅ (issues toolset) | 2 calls | 1 call | ✅ Keep (use first) |
| `github.get_label` | ✅ (issues toolset) | 0 calls | 0 calls | ⚠️ Consider removing if not needed |
| `github.list_issue_types` | ✅ (issues toolset) | 0 calls | 0 calls | ⚠️ Consider removing if not needed |
| `github.search_repositories` | ❌ NOT available | 60 calls (all fail) | 59 calls (all fail) | ❌ Add explicit "do not call" instruction |
| `safeoutputs.add_labels` | ✅ (max: 10) | 0 calls (noop run) | 1 call | ✅ Keep |
| `safeoutputs.create_discussion` | ✅ (max: 1) | 0 calls | 1 call | ✅ Keep |
| `safeoutputs.noop` | ✅ | 1 call | 0 calls | ✅ Keep |
| `bash: jq` | ✅ | 0 successful | 0 successful | ⚠️ `jq` fails silently; remove or fix piping |
| `bash: grep/head/wc` | ✅ | multiple | multiple | ✅ Keep (used as fallback) |

**Note on bash tools**: `python3` and `node` commands fail with "Permission denied" in both runs despite being invoked via `shell()`. Only `cat`, `grep`, `head`, `wc` succeed reliably. The `jq *` tool also fails when reading from `/tmp/` files created by MCP tool output. Consider removing `bash` from the toolset entirely and relying on pre-agent bash steps for data processing.

</details>

<details>
<summary><b>Audited Runs Detail</b></summary>

#### Run [§24399761724](https://github.com/github/gh-aw/actions/runs/24399761724) — 2026-04-14T12:47 (schedule)
- **Turns**: 23 | **Tokens**: 875K | **Conclusion**: success (noop — 0 unlabeled issues)
- **Cache efficiency**: 48.7%
- **Assessments**: `resource_heavy_for_domain` (high), `poor_agentic_control` (medium), `partially_reducible` (low), `model_downgrade_available` (low)
- **What happened**: Called `list_issues` → 125 KB saved to /tmp → 10+ failed bash/python3/node attempts → grep fallback → `search_issues no:label` → 0 found → noop
- **Ghost calls**: 60 × `search_repositories` all returned "unknown tool"

#### Run [§24385625798](https://github.com/github/gh-aw/actions/runs/24385625798) — 2026-04-14T07:02 (schedule)
- **Turns**: 11 | **Tokens**: 445K | **Conclusion**: success (1 issue labeled)
- **Cache efficiency**: 46.6%
- **Assessments**: `poor_agentic_control` (medium), `partially_reducible` (low), `model_downgrade_available` (low)
- **What happened**: Called `list_issues` → 120 KB → multiple bash failures → `search_issues no:label` → found #26170 → `add_labels` → `create_discussion`
- **Ghost calls**: 59 × `search_repositories` all returned "unknown tool"

#### 7-Day Token Trend

| Snapshot Date | Runs | Total Tokens | Avg Tokens/Run | Avg Turns/Run |
|---|---|---|---|---|
| 2026-04-04 | 8 | 2,996K | 375K | 8.4 |
| 2026-04-06 | 5 | 5,028K | 1,006K | 22.2 |
| 2026-04-13 | 6 | 3,544K | 591K | 14.3 |
| 2026-04-14 | 8 | 2,148K | 268K | 6.9 |

The spike on 2026-04-06 (avg 1M tokens, 22 turns/run) suggests the repository had more unlabeled issues that week, causing the agent to call `issue_read` many more times and use more turns fighting bash tool failures.

</details>

### ⚠️ Caveats

- Full analysis is based on 2 runs with detailed MCP gateway logs; the 7-day snapshot covers 8 runs total
- The `search_repositories` ghost-tool behavior is consistent across both runs (59–60 calls each) — high confidence
- The `list_issues` → bash-failure → `search_issues` pattern is identical in both runs — high confidence
- Model downgrade recommendation should be validated with a test run before adopting

**References:**
- [§24399761724](https://github.com/github/gh-aw/actions/runs/24399761724) — heavy run (23 turns, 875K tokens)
- [§24385625798](https://github.com/github/gh-aw/actions/runs/24385625798) — baseline run (11 turns, 445K tokens)







> Generated by [Copilot Token Usage Optimizer](https://github.com/github/gh-aw/actions/runs/24406504343/agentic_workflow) · ● 2.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fcopilot-token-optimizer%22&type=issues)
> - [x] expires  on Apr 21, 2026, 3:25 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-token-optimizer] Token Optimization: Auto-Triage Issues #26241

🔍 Optimization Target: Auto-Triage Issues

📊 Token Usage Profile

🔧 Recommendations

1. Replace `list_issues` with `search_issues no:label` as first step — Est. savings: ~200–400K tokens/run

2. Add a pre-agent bash step to pre-fetch unlabeled issues — Est. savings: ~150–300K tokens/run

4. Downgrade to a smaller model — Est. savings: 40–60% effective token cost

Run §24399761724 — 2026-04-14T12:47 (schedule)

Run §24385625798 — 2026-04-14T07:02 (schedule)

7-Day Token Trend

⚠️ Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Total tokens (7d)	~2.1M
Avg tokens/run	268K–1.0M (high variance)
Avg turns/run	6.9–22.2 (high variance)
Cache efficiency	~47–49%
Input/output ratio	99.5% input tokens
Model	claude-sonnet-4.6
Actuation	read-only
Classification	`resource_heavy_for_domain` (heavy run)

Tool	Configured?	Run A (24399761724)	Run B (24385625798)	Recommendation
`github.list_issues`	✅ (issues toolset)	1 call	1 call	⚠️ Replace with `search_issues no:label` as first step
`github.issue_read`	✅ (issues toolset)	15 calls	13 calls	✅ Keep
`github.search_issues`	✅ (issues toolset)	2 calls	1 call	✅ Keep (use first)
`github.get_label`	✅ (issues toolset)	0 calls	0 calls	⚠️ Consider removing if not needed
`github.list_issue_types`	✅ (issues toolset)	0 calls	0 calls	⚠️ Consider removing if not needed
`github.search_repositories`	❌ NOT available	60 calls (all fail)	59 calls (all fail)	❌ Add explicit "do not call" instruction
`safeoutputs.add_labels`	✅ (max: 10)	0 calls (noop run)	1 call	✅ Keep
`safeoutputs.create_discussion`	✅ (max: 1)	0 calls	1 call	✅ Keep
`safeoutputs.noop`	✅	1 call	0 calls	✅ Keep
`bash: jq`	✅	0 successful	0 successful	⚠️ `jq` fails silently; remove or fix piping
`bash: grep/head/wc`	✅	multiple	multiple	✅ Keep (used as fallback)

Snapshot Date	Runs	Total Tokens	Avg Tokens/Run	Avg Turns/Run
2026-04-04	8	2,996K	375K	8.4
2026-04-06	5	5,028K	1,006K	22.2
2026-04-13	6	3,544K	591K	14.3
2026-04-14	8	2,148K	268K	6.9

[copilot-token-optimizer] Token Optimization: Auto-Triage Issues #26241

Description

🔍 Optimization Target: Auto-Triage Issues

📊 Token Usage Profile

🔧 Recommendations

1. Replace list_issues with search_issues no:label as first step — Est. savings: ~200–400K tokens/run

2. Add a pre-agent bash step to pre-fetch unlabeled issues — Est. savings: ~150–300K tokens/run

4. Downgrade to a smaller model — Est. savings: 40–60% effective token cost

Run §24399761724 — 2026-04-14T12:47 (schedule)

Run §24385625798 — 2026-04-14T07:02 (schedule)

7-Day Token Trend

⚠️ Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Replace `list_issues` with `search_issues no:label` as first step — Est. savings: ~200–400K tokens/run