Target Workflow: smoke-claude.md
Source report: #2053
Estimated cost per run: $0.23
Total tokens per run: ~197K raw (46.6K cache-write + 149.7K cache-read + 0.8K output)
Cache read rate: 76% (149.7K / 197K)
Cache write rate: 24% (46.6K / 197K)
LLM turns: avg 6 / run (range: 5–7)
Current Configuration
| Setting |
Value |
| Engine |
claude (claude-sonnet-4-6) |
| Max turns |
12 |
| Tools loaded |
GitHub MCP: 52 tools · Playwright MCP: 21 tools · Built-in: ~14 |
| Tools actually used |
list_pull_requests (9/10 runs) · browser_navigate (10/10) · Bash (10/10) · safeoutputs (10/10) = ~5 unique tools |
| GitHub toolsets |
[repos, pull_requests] — currently loads 52 tools despite restriction |
| Network groups |
defaults, github, playwright |
| Pre-agent steps |
✅ Yes — mkdir /tmp/gh-aw/mcp-logs/playwright only |
| Prompt size |
5,321 chars (~1,600 tokens) |
Tool Schema Token Budget (Turn 1 cache write = 46,600 tokens)
| Source |
Est. Tools |
Est. Tokens |
Share |
GitHub MCP (repos+pull_requests) |
~52 |
~26,000 |
56% |
| Playwright MCP |
~21 |
~10,500 |
23% |
| Built-in Claude Code tools |
~14 |
~2,800 |
6% |
| System prompt + workflow prompt |
— |
~4,600 |
10% |
| Other (Claude infra) |
— |
~2,700 |
6% |
| Total |
~87 |
~46,600 |
100% |
Cost Breakdown
Cache writes dominate cost at 75% of spend:
| Token Type |
Avg/Run |
Sonnet Rate |
Cost/Run |
% |
| Cache write |
46,603 |
$3.75/M |
$0.1748 |
75% |
| Cache read |
149,687 |
$0.30/M |
$0.0449 |
19% |
| Output |
801 |
$15.00/M |
$0.0120 |
5% |
| Input |
6 |
$3.00/M |
<$0.0001 |
0% |
| Total |
197,097 |
|
$0.2317 |
|
Recommendations
1. Switch model to claude-haiku-4-5
Estimated savings: ~$0.17/run (~73%) — highest-impact change by far.
Cache writes are charged at $3.75/M for Sonnet vs $1.00/M for Haiku. Cache reads: $0.30/M vs $0.08/M. Since this workflow does nothing that requires Sonnet-level reasoning (list 2 PRs, navigate to a URL, write a file, add a comment), Haiku is fully capable.
Cost projection with Haiku:
| Token Type |
Avg/Run |
Haiku Rate |
Cost/Run |
| Cache write |
46,603 |
$1.00/M |
$0.0466 |
| Cache read |
149,687 |
$0.08/M |
$0.0120 |
| Output |
801 |
$4.00/M |
$0.0032 |
| Total |
|
|
$0.0618 |
Implementation — add one line to .github/workflows/smoke-claude.md:
engine:
id: claude
max-turns: 12
+ model: claude-haiku-4-5
Then recompile and post-process:
gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts
Risk: Low. Haiku handles multi-step, parallel tool dispatch reliably. The test tasks are intentionally simple. Validate with 2–3 triggered runs before relying on it.
2. Drop repos from GitHub toolsets
Estimated savings: ~12.5K tokens/run turn-1 write (~75K tokens/run total) → ~$0.02/run after Haiku
Despite specifying toolsets: [repos, pull_requests], 52 GitHub tools are loaded in the agent's context (verified from --allowed-tools in the agent command). Only list_pull_requests is ever used (9/10 runs) plus search_pull_requests once.
The repos toolset likely contributes a large fraction of those 52 tools. Dropping it should cut tool schema tokens by ~12,500 (estimate based on ~25 tools × ~500 tokens/schema) from every Turn 1 cache write:
tools:
github:
- toolsets: [repos, pull_requests]
+ toolsets: [pull_requests]
After making this change, check agent-stdio.log for the --allowed-tools flag to confirm tool count dropped. If the Smoke test fails because a needed tool was in repos, add it back selectively.
Note: If your gh-aw version maps toolsets to a specific tool whitelist, you could also use a more granular override to expose only list_pull_requests and search_pull_requests.
3. Reduce max-turns from 12 to 8
Estimated savings: Prevents runaway cost; no direct token reduction per normal run.
Observed max turns across 10 runs: 7. Avg: 6. Setting max-turns: 8 caps worst-case spend without affecting normal operation:
engine:
id: claude
+ model: claude-haiku-4-5
- max-turns: 12
+ max-turns: 8
At Haiku pricing, a runaway 12-turn vs 8-turn run costs:
- 12 turns: +6 extra turns × ~23K tokens/turn × $0.08/M cache read = +$0.011
- Capping at 8 saves ~$0.004/run on 8-turn runs and avoids 12-turn runaway scenarios
4. Move mkdir pre-step + trim prompt (~minor)
Estimated savings: ~800 tokens/run → <$0.001/run
The current pre-agent step only creates a playwright log directory. The workflow prompt (5,321 chars / ~1,600 tokens) could be tightened by ~50%. Specific cuts:
- Remove the meta-instruction
"IMPORTANT: Keep all outputs extremely short..." (the model infers this from context)
- Consolidate the Output section with the Test Requirements section
- Remove redundant path hints like
(create the directory if it doesn't exist) — Bash handles that
Example condensed prompt (~700 tokens instead of 1,600):
# Smoke Test: Claude Engine Validation
Test all four capabilities and report results as a compact PR comment (max 5–10 lines).
1. **GitHub MCP**: `list_pull_requests` on `$\{\{ github.repository }}` (`perPage: 2`, `state: closed`)
2. **Playwright**: navigate to `https://github.com`, verify page title contains "GitHub"
3. **File Write**: `echo "Smoke test passed for Claude at $(date)" > /tmp/gh-aw/agent/smoke-test-claude-$\{\{ github.run_id }}.txt`
4. **Bash verify**: `cat` the file back
**On PR trigger**: `add_comment` (✅/❌ per test + PASS/FAIL) + `add_labels: [smoke-claude]` if all pass.
**On schedule/dispatch**: `noop` with summary.
Cache Analysis (Anthropic-Specific)
Per-turn breakdown for a typical 6-turn run (run §24548864481):
| Turn |
Input |
Output |
Cache Read |
Cache Write |
Notes |
| 1 |
3 |
6 |
0 |
~43,975 |
All tool schemas written; 3 parallel tool calls dispatched |
| 2 |
1 |
1 |
~43,975 |
~2,177 |
Tool results received; "All tests passed" reasoning |
| 3 |
1 |
1 |
~43,975 |
~2,177 |
add_comment dispatched |
| 4 |
1 |
63 |
~46,152 |
~279 |
add_labels dispatched |
| 5 |
1 |
708 |
~46,431 |
~103 |
Final result |
| Total |
6 |
779 |
~136,558 |
~46,534 |
|
Cache write amortization: The Turn 1 write of ~43,975 tokens (tool schemas) is reused 4× within the same run as cache reads. That's good amortization per run. However, a fresh 43,975-token cache write occurs at the start of every new run (TTL ~5 min between separate PR-triggered runs). With 10 token-producing runs, ~440K tokens were written to cache = $1.65 in cache-write cost alone.
Cache cost vs benefit: Within a run, caching reduces what would otherwise be 5× full context re-reads. Beneficial. The cost is dominated by the initial write, not the reads.
Expected Impact
| Metric |
Current |
After Rec 1 (Haiku) |
After Rec 1+2 |
Savings |
| Cost/run |
$0.2317 |
$0.0618 |
~$0.044 |
−81% |
| Cache write tokens/run |
46,603 |
46,603 |
~34,000 |
−27% |
| Total tokens/run |
~197K |
~197K |
~148K |
−25% |
| LLM turns |
avg 6 |
avg 6 |
avg 5–6 |
— |
| Cost for 10 runs |
$2.32 |
$0.62 |
~$0.44 |
−$1.88 |
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · ● 997.9K · ◷
Target Workflow:
smoke-claude.mdSource report: #2053
Estimated cost per run: $0.23
Total tokens per run: ~197K raw (46.6K cache-write + 149.7K cache-read + 0.8K output)
Cache read rate: 76% (149.7K / 197K)
Cache write rate: 24% (46.6K / 197K)
LLM turns: avg 6 / run (range: 5–7)
Current Configuration
claude(claude-sonnet-4-6)list_pull_requests(9/10 runs) ·browser_navigate(10/10) ·Bash(10/10) ·safeoutputs(10/10) = ~5 unique tools[repos, pull_requests]— currently loads 52 tools despite restrictiondefaults,github,playwrightmkdir /tmp/gh-aw/mcp-logs/playwrightonlyTool Schema Token Budget (Turn 1 cache write = 46,600 tokens)
repos+pull_requests)Cost Breakdown
Cache writes dominate cost at 75% of spend:
Recommendations
1. Switch model to
claude-haiku-4-5Estimated savings: ~$0.17/run (~73%) — highest-impact change by far.
Cache writes are charged at $3.75/M for Sonnet vs $1.00/M for Haiku. Cache reads: $0.30/M vs $0.08/M. Since this workflow does nothing that requires Sonnet-level reasoning (list 2 PRs, navigate to a URL, write a file, add a comment), Haiku is fully capable.
Cost projection with Haiku:
Implementation — add one line to
.github/workflows/smoke-claude.md:engine: id: claude max-turns: 12 + model: claude-haiku-4-5Then recompile and post-process:
Risk: Low. Haiku handles multi-step, parallel tool dispatch reliably. The test tasks are intentionally simple. Validate with 2–3 triggered runs before relying on it.
2. Drop
reposfrom GitHub toolsetsEstimated savings: ~12.5K tokens/run turn-1 write (~75K tokens/run total) → ~$0.02/run after Haiku
Despite specifying
toolsets: [repos, pull_requests], 52 GitHub tools are loaded in the agent's context (verified from--allowed-toolsin the agent command). Onlylist_pull_requestsis ever used (9/10 runs) plussearch_pull_requestsonce.The
repostoolset likely contributes a large fraction of those 52 tools. Dropping it should cut tool schema tokens by ~12,500 (estimate based on ~25 tools × ~500 tokens/schema) from every Turn 1 cache write:After making this change, check
agent-stdio.logfor the--allowed-toolsflag to confirm tool count dropped. If the Smoke test fails because a needed tool was inrepos, add it back selectively.Note: If your gh-aw version maps toolsets to a specific tool whitelist, you could also use a more granular override to expose only
list_pull_requestsandsearch_pull_requests.3. Reduce
max-turnsfrom 12 to 8Estimated savings: Prevents runaway cost; no direct token reduction per normal run.
Observed max turns across 10 runs: 7. Avg: 6. Setting
max-turns: 8caps worst-case spend without affecting normal operation:At Haiku pricing, a runaway 12-turn vs 8-turn run costs:
4. Move
mkdirpre-step + trim prompt (~minor)Estimated savings: ~800 tokens/run → <$0.001/run
The current pre-agent step only creates a playwright log directory. The workflow prompt (5,321 chars / ~1,600 tokens) could be tightened by ~50%. Specific cuts:
"IMPORTANT: Keep all outputs extremely short..."(the model infers this from context)(create the directory if it doesn't exist)— Bash handles thatExample condensed prompt (~700 tokens instead of 1,600):
Cache Analysis (Anthropic-Specific)
Per-turn breakdown for a typical 6-turn run (run §24548864481):
add_commentdispatchedadd_labelsdispatchedCache write amortization: The Turn 1 write of ~43,975 tokens (tool schemas) is reused 4× within the same run as cache reads. That's good amortization per run. However, a fresh 43,975-token cache write occurs at the start of every new run (TTL ~5 min between separate PR-triggered runs). With 10 token-producing runs, ~440K tokens were written to cache = $1.65 in cache-write cost alone.
Cache cost vs benefit: Within a run, caching reduces what would otherwise be 5× full context re-reads. Beneficial. The cost is dominated by the initial write, not the reads.
Expected Impact
Implementation Checklist
model: claude-haiku-4-5toengine:block in.github/workflows/smoke-claude.mdtoolsets: [repos, pull_requests]→toolsets: [pull_requests]intools.githubmax-turns: 12→max-turns: 8gh aw compile .github/workflows/smoke-claude.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts--allowed-toolsinagent-stdio.logto confirm GitHub tool count dropped