Skip to content

[copilot-token-optimizer] Token Optimization: Auto-Triage IssuesΒ #26241

@github-actions

Description

@github-actions

πŸ” Optimization Target: Auto-Triage Issues

Selected because: Highest token consumer not recently optimized (top-5 by 7-day total, 8 runs/week)
Analysis period: 2026-04-07 β†’ 2026-04-14
Runs analyzed: 8 runs (2 audited in full detail)

πŸ“Š Token Usage Profile

Metric Value
Total tokens (7d) ~2.1M
Avg tokens/run 268K–1.0M (high variance)
Avg turns/run 6.9–22.2 (high variance)
Cache efficiency ~47–49%
Input/output ratio 99.5% input tokens
Model claude-sonnet-4.6
Actuation read-only
Classification resource_heavy_for_domain (heavy run)

The workflow runs every 6 hours. On a typical day it fires 8 times. The token cost varies wildly from run to run depending on how many issues are present and which path the agent takes.


πŸ”§ Recommendations

1. Replace list_issues with search_issues no:label as first step β€” Est. savings: ~200–400K tokens/run

Evidence from both audited runs: Every run starts with:

list_issues(owner: "github", repo: "gh-aw", state: "OPEN", perPage: 50)
β†’ Output too large to read at once (120–125 KB)

This 120–125 KB payload is saved to a /tmp/... file that the agent cannot process with bash (python3 and node are not in the allowed shell tools). The agent then burns 4–5 turns failing to parse it before eventually calling search_issues with no:label β€” which is the correct approach from the start.

The fix is to instruct the agent to start with search_issues using the no:label qualifier instead of fetching all open issues. This eliminates the large payload from the context window and the downstream bash failures.

Action: Add to the "On Scheduled Runs" section:

Start with search_issues using query repo:github/gh-aw is:issue is:open no:label instead of list_issues. Only fetch individual issue details via issue_read for issues that need classification.

2. Add a pre-agent bash step to pre-fetch unlabeled issues β€” Est. savings: ~150–300K tokens/run

Evidence: The agentic_fraction = 0.50 assessment appears in both audited runs, meaning ~50% of turns are data-gathering. Pre-fetching unlabeled issue numbers in a deterministic bash step eliminates those turns from the agent's budget.

Action: Add a frontmatter steps block:

steps:
  pre-agent:
    - name: Fetch unlabeled issues
      run: |
        gh api "repos/github/gh-aw/issues?state=open&labels=&per_page=30" \
          --jq '[.[] | select(.labels | length == 0) | {number: .number, title: .title, body: .body}]' \
          > /tmp/gh-aw/agent/unlabeled-issues.json
        echo "Unlabeled issues:" $(jq length /tmp/gh-aw/agent/unlabeled-issues.json)
```
Then update the prompt to read from `/tmp/gh-aw/agent/unlabeled-issues.json` instead of fetching via MCP.

#### 3. Suppress `search_repositories` tool calls β€” Est. savings: ~50–80K tokens/run

**Evidence from MCP gateway logs**: Both audited runs show **59–60 calls to `search_repositories`** that ALL fail with:
```
calling "tools/call": unknown tool "search_repositories"

The issues toolset only exposes 5 tools: get_label, issue_read, list_issue_types, list_issues, search_issues. But Claude (claude-sonnet-4.6) repeatedly tries to call search_repositories throughout both runs β€” likely trying to look up repository metadata to verify labels exist.

Each failed call returns an error message that is added to the model's context window, contributing to input token bloat. With 60 calls per run and 8 runs/day, this is ~480 unnecessary API round-trips daily.

Action: Add an explicit instruction to the prompt:

Do NOT call search_repositories β€” it is not available in this workflow. Use list_issues or search_issues to find issues, and get_label to verify a label exists.

4. Downgrade to a smaller model β€” Est. savings: 40–60% effective token cost

Evidence: Both agentic assessments flagged model_downgrade_available (severity: low):

"This Triage run may not need a frontier model. A smaller model (e.g. gpt-4.1-mini, claude-haiku-4-5) could handle the task at lower cost."

The task is pure read-only label classification β€” no code generation, no complex reasoning. The output token count is consistently tiny (3.2K–3.9K out of 442K–875K total). A lighter model would handle keyword-based classification at a fraction of the inference cost.

Action: Add to the workflow frontmatter:

engine:
  name: copilot
  model: gpt-4.1-mini

Test with a few runs before adopting permanently. Use claude-haiku-4-5 as an alternative.


Tool Usage Matrix
Tool Configured? Run A (24399761724) Run B (24385625798) Recommendation
github.list_issues βœ… (issues toolset) 1 call 1 call ⚠️ Replace with search_issues no:label as first step
github.issue_read βœ… (issues toolset) 15 calls 13 calls βœ… Keep
github.search_issues βœ… (issues toolset) 2 calls 1 call βœ… Keep (use first)
github.get_label βœ… (issues toolset) 0 calls 0 calls ⚠️ Consider removing if not needed
github.list_issue_types βœ… (issues toolset) 0 calls 0 calls ⚠️ Consider removing if not needed
github.search_repositories ❌ NOT available 60 calls (all fail) 59 calls (all fail) ❌ Add explicit "do not call" instruction
safeoutputs.add_labels βœ… (max: 10) 0 calls (noop run) 1 call βœ… Keep
safeoutputs.create_discussion βœ… (max: 1) 0 calls 1 call βœ… Keep
safeoutputs.noop βœ… 1 call 0 calls βœ… Keep
bash: jq βœ… 0 successful 0 successful ⚠️ jq fails silently; remove or fix piping
bash: grep/head/wc βœ… multiple multiple βœ… Keep (used as fallback)

Note on bash tools: python3 and node commands fail with "Permission denied" in both runs despite being invoked via shell(). Only cat, grep, head, wc succeed reliably. The jq * tool also fails when reading from /tmp/ files created by MCP tool output. Consider removing bash from the toolset entirely and relying on pre-agent bash steps for data processing.

Audited Runs Detail

Run Β§24399761724 β€” 2026-04-14T12:47 (schedule)

  • Turns: 23 | Tokens: 875K | Conclusion: success (noop β€” 0 unlabeled issues)
  • Cache efficiency: 48.7%
  • Assessments: resource_heavy_for_domain (high), poor_agentic_control (medium), partially_reducible (low), model_downgrade_available (low)
  • What happened: Called list_issues β†’ 125 KB saved to /tmp β†’ 10+ failed bash/python3/node attempts β†’ grep fallback β†’ search_issues no:label β†’ 0 found β†’ noop
  • Ghost calls: 60 Γ— search_repositories all returned "unknown tool"

Run Β§24385625798 β€” 2026-04-14T07:02 (schedule)

  • Turns: 11 | Tokens: 445K | Conclusion: success (1 issue labeled)
  • Cache efficiency: 46.6%
  • Assessments: poor_agentic_control (medium), partially_reducible (low), model_downgrade_available (low)
  • What happened: Called list_issues β†’ 120 KB β†’ multiple bash failures β†’ search_issues no:label β†’ found [PR Triage Report] PR Triage Report - 2026-04-14Β #26170 β†’ add_labels β†’ create_discussion
  • Ghost calls: 59 Γ— search_repositories all returned "unknown tool"

7-Day Token Trend

Snapshot Date Runs Total Tokens Avg Tokens/Run Avg Turns/Run
2026-04-04 8 2,996K 375K 8.4
2026-04-06 5 5,028K 1,006K 22.2
2026-04-13 6 3,544K 591K 14.3
2026-04-14 8 2,148K 268K 6.9

The spike on 2026-04-06 (avg 1M tokens, 22 turns/run) suggests the repository had more unlabeled issues that week, causing the agent to call issue_read many more times and use more turns fighting bash tool failures.

⚠️ Caveats

  • Full analysis is based on 2 runs with detailed MCP gateway logs; the 7-day snapshot covers 8 runs total
  • The search_repositories ghost-tool behavior is consistent across both runs (59–60 calls each) β€” high confidence
  • The list_issues β†’ bash-failure β†’ search_issues pattern is identical in both runs β€” high confidence
  • Model downgrade recommendation should be validated with a test run before adopting

References:

Generated by Copilot Token Usage Optimizer Β· ● 2.2M Β· β—·

  • expires on Apr 21, 2026, 3:25 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions