You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Selected because: Highest token consumer not recently optimized (top-5 by 7-day total, 8 runs/week) Analysis period: 2026-04-07 β 2026-04-14 Runs analyzed: 8 runs (2 audited in full detail)
π Token Usage Profile
Metric
Value
Total tokens (7d)
~2.1M
Avg tokens/run
268Kβ1.0M (high variance)
Avg turns/run
6.9β22.2 (high variance)
Cache efficiency
~47β49%
Input/output ratio
99.5% input tokens
Model
claude-sonnet-4.6
Actuation
read-only
Classification
resource_heavy_for_domain (heavy run)
The workflow runs every 6 hours. On a typical day it fires 8 times. The token cost varies wildly from run to run depending on how many issues are present and which path the agent takes.
π§ Recommendations
1. Replace list_issues with search_issues no:label as first step β Est. savings: ~200β400K tokens/run
Evidence from both audited runs: Every run starts with:
list_issues(owner: "github", repo: "gh-aw", state: "OPEN", perPage: 50)
β Output too large to read at once (120β125 KB)
This 120β125 KB payload is saved to a /tmp/... file that the agent cannot process with bash (python3 and node are not in the allowed shell tools). The agent then burns 4β5 turns failing to parse it before eventually calling search_issues with no:label β which is the correct approach from the start.
The fix is to instruct the agent to start with search_issues using the no:label qualifier instead of fetching all open issues. This eliminates the large payload from the context window and the downstream bash failures.
Action: Add to the "On Scheduled Runs" section:
Start with search_issues using query repo:github/gh-aw is:issue is:open no:label instead of list_issues. Only fetch individual issue details via issue_read for issues that need classification.
2. Add a pre-agent bash step to pre-fetch unlabeled issues β Est. savings: ~150β300K tokens/run
Evidence: The agentic_fraction = 0.50 assessment appears in both audited runs, meaning ~50% of turns are data-gathering. Pre-fetching unlabeled issue numbers in a deterministic bash step eliminates those turns from the agent's budget.
Action: Add a frontmatter steps block:
steps:
pre-agent:
- name: Fetch unlabeled issuesrun: | gh api "repos/github/gh-aw/issues?state=open&labels=&per_page=30" \ --jq '[.[] | select(.labels | length == 0) | {number: .number, title: .title, body: .body}]' \ > /tmp/gh-aw/agent/unlabeled-issues.json echo "Unlabeled issues:" $(jq length /tmp/gh-aw/agent/unlabeled-issues.json)```Then update the prompt to read from `/tmp/gh-aw/agent/unlabeled-issues.json` instead of fetching via MCP.#### 3. Suppress `search_repositories` tool calls β Est. savings: ~50β80K tokens/run**Evidence from MCP gateway logs**: Both audited runs show **59β60 calls to `search_repositories`** that ALL fail with:```calling "tools/call": unknown tool "search_repositories"
The issues toolset only exposes 5 tools: get_label, issue_read, list_issue_types, list_issues, search_issues. But Claude (claude-sonnet-4.6) repeatedly tries to call search_repositories throughout both runs β likely trying to look up repository metadata to verify labels exist.
Each failed call returns an error message that is added to the model's context window, contributing to input token bloat. With 60 calls per run and 8 runs/day, this is ~480 unnecessary API round-trips daily.
Action: Add an explicit instruction to the prompt:
Do NOT call search_repositories β it is not available in this workflow. Use list_issues or search_issues to find issues, and get_label to verify a label exists.
4. Downgrade to a smaller model β Est. savings: 40β60% effective token cost
Evidence: Both agentic assessments flagged model_downgrade_available (severity: low):
"This Triage run may not need a frontier model. A smaller model (e.g. gpt-4.1-mini, claude-haiku-4-5) could handle the task at lower cost."
The task is pure read-only label classification β no code generation, no complex reasoning. The output token count is consistently tiny (3.2Kβ3.9K out of 442Kβ875K total). A lighter model would handle keyword-based classification at a fraction of the inference cost.
Action: Add to the workflow frontmatter:
engine:
name: copilotmodel: gpt-4.1-mini
Test with a few runs before adopting permanently. Use claude-haiku-4-5 as an alternative.
Tool Usage Matrix
Tool
Configured?
Run A (24399761724)
Run B (24385625798)
Recommendation
github.list_issues
β (issues toolset)
1 call
1 call
β οΈ Replace with search_issues no:label as first step
github.issue_read
β (issues toolset)
15 calls
13 calls
β Keep
github.search_issues
β (issues toolset)
2 calls
1 call
β Keep (use first)
github.get_label
β (issues toolset)
0 calls
0 calls
β οΈ Consider removing if not needed
github.list_issue_types
β (issues toolset)
0 calls
0 calls
β οΈ Consider removing if not needed
github.search_repositories
β NOT available
60 calls (all fail)
59 calls (all fail)
β Add explicit "do not call" instruction
safeoutputs.add_labels
β (max: 10)
0 calls (noop run)
1 call
β Keep
safeoutputs.create_discussion
β (max: 1)
0 calls
1 call
β Keep
safeoutputs.noop
β
1 call
0 calls
β Keep
bash: jq
β
0 successful
0 successful
β οΈjq fails silently; remove or fix piping
bash: grep/head/wc
β
multiple
multiple
β Keep (used as fallback)
Note on bash tools: python3 and node commands fail with "Permission denied" in both runs despite being invoked via shell(). Only cat, grep, head, wc succeed reliably. The jq * tool also fails when reading from /tmp/ files created by MCP tool output. Consider removing bash from the toolset entirely and relying on pre-agent bash steps for data processing.
Ghost calls: 59 Γ search_repositories all returned "unknown tool"
7-Day Token Trend
Snapshot Date
Runs
Total Tokens
Avg Tokens/Run
Avg Turns/Run
2026-04-04
8
2,996K
375K
8.4
2026-04-06
5
5,028K
1,006K
22.2
2026-04-13
6
3,544K
591K
14.3
2026-04-14
8
2,148K
268K
6.9
The spike on 2026-04-06 (avg 1M tokens, 22 turns/run) suggests the repository had more unlabeled issues that week, causing the agent to call issue_read many more times and use more turns fighting bash tool failures.
β οΈ Caveats
Full analysis is based on 2 runs with detailed MCP gateway logs; the 7-day snapshot covers 8 runs total
The search_repositories ghost-tool behavior is consistent across both runs (59β60 calls each) β high confidence
The list_issues β bash-failure β search_issues pattern is identical in both runs β high confidence
Model downgrade recommendation should be validated with a test run before adopting
π Optimization Target: Auto-Triage Issues
Selected because: Highest token consumer not recently optimized (top-5 by 7-day total, 8 runs/week)
Analysis period: 2026-04-07 β 2026-04-14
Runs analyzed: 8 runs (2 audited in full detail)
π Token Usage Profile
resource_heavy_for_domain(heavy run)The workflow runs every 6 hours. On a typical day it fires 8 times. The token cost varies wildly from run to run depending on how many issues are present and which path the agent takes.
π§ Recommendations
1. Replace
list_issueswithsearch_issues no:labelas first step β Est. savings: ~200β400K tokens/runEvidence from both audited runs: Every run starts with:
This 120β125 KB payload is saved to a
/tmp/...file that the agent cannot process with bash (python3 and node are not in the allowed shell tools). The agent then burns 4β5 turns failing to parse it before eventually callingsearch_issueswithno:labelβ which is the correct approach from the start.The fix is to instruct the agent to start with
search_issuesusing theno:labelqualifier instead of fetching all open issues. This eliminates the large payload from the context window and the downstream bash failures.Action: Add to the "On Scheduled Runs" section:
2. Add a pre-agent bash step to pre-fetch unlabeled issues β Est. savings: ~150β300K tokens/run
Evidence: The
agentic_fraction = 0.50assessment appears in both audited runs, meaning ~50% of turns are data-gathering. Pre-fetching unlabeled issue numbers in a deterministic bash step eliminates those turns from the agent's budget.Action: Add a frontmatter
stepsblock:The
issuestoolset only exposes 5 tools:get_label,issue_read,list_issue_types,list_issues,search_issues. But Claude (claude-sonnet-4.6) repeatedly tries to callsearch_repositoriesthroughout both runs β likely trying to look up repository metadata to verify labels exist.Each failed call returns an error message that is added to the model's context window, contributing to input token bloat. With 60 calls per run and 8 runs/day, this is ~480 unnecessary API round-trips daily.
Action: Add an explicit instruction to the prompt:
4. Downgrade to a smaller model β Est. savings: 40β60% effective token cost
Evidence: Both agentic assessments flagged
model_downgrade_available(severity: low):The task is pure read-only label classification β no code generation, no complex reasoning. The output token count is consistently tiny (3.2Kβ3.9K out of 442Kβ875K total). A lighter model would handle keyword-based classification at a fraction of the inference cost.
Action: Add to the workflow frontmatter:
Test with a few runs before adopting permanently. Use
claude-haiku-4-5as an alternative.Tool Usage Matrix
github.list_issuessearch_issues no:labelas first stepgithub.issue_readgithub.search_issuesgithub.get_labelgithub.list_issue_typesgithub.search_repositoriessafeoutputs.add_labelssafeoutputs.create_discussionsafeoutputs.noopbash: jqjqfails silently; remove or fix pipingbash: grep/head/wcNote on bash tools:
python3andnodecommands fail with "Permission denied" in both runs despite being invoked viashell(). Onlycat,grep,head,wcsucceed reliably. Thejq *tool also fails when reading from/tmp/files created by MCP tool output. Consider removingbashfrom the toolset entirely and relying on pre-agent bash steps for data processing.Audited Runs Detail
Run Β§24399761724 β 2026-04-14T12:47 (schedule)
resource_heavy_for_domain(high),poor_agentic_control(medium),partially_reducible(low),model_downgrade_available(low)list_issuesβ 125 KB saved to /tmp β 10+ failed bash/python3/node attempts β grep fallback βsearch_issues no:labelβ 0 found β noopsearch_repositoriesall returned "unknown tool"Run Β§24385625798 β 2026-04-14T07:02 (schedule)
poor_agentic_control(medium),partially_reducible(low),model_downgrade_available(low)list_issuesβ 120 KB β multiple bash failures βsearch_issues no:labelβ found [PR Triage Report] PR Triage Report - 2026-04-14Β #26170 βadd_labelsβcreate_discussionsearch_repositoriesall returned "unknown tool"7-Day Token Trend
The spike on 2026-04-06 (avg 1M tokens, 22 turns/run) suggests the repository had more unlabeled issues that week, causing the agent to call
issue_readmany more times and use more turns fighting bash tool failures.search_repositoriesghost-tool behavior is consistent across both runs (59β60 calls each) β high confidencelist_issuesβ bash-failure βsearch_issuespattern is identical in both runs β high confidenceReferences: