Skip to content

⚡ Claude Token Optimization2026-04-17 — Smoke Claude #2055

@github-actions

Description

@github-actions

Target Workflow: smoke-claude.md

Source report: #2053
Estimated cost per run: $0.23
Total tokens per run: ~197K raw (46.6K cache-write + 149.7K cache-read + 0.8K output)
Cache read rate: 76% (149.7K / 197K)
Cache write rate: 24% (46.6K / 197K)
LLM turns: avg 6 / run (range: 5–7)


Current Configuration

Setting Value
Engine claude (claude-sonnet-4-6)
Max turns 12
Tools loaded GitHub MCP: 52 tools · Playwright MCP: 21 tools · Built-in: ~14
Tools actually used list_pull_requests (9/10 runs) · browser_navigate (10/10) · Bash (10/10) · safeoutputs (10/10) = ~5 unique tools
GitHub toolsets [repos, pull_requests] — currently loads 52 tools despite restriction
Network groups defaults, github, playwright
Pre-agent steps ✅ Yes — mkdir /tmp/gh-aw/mcp-logs/playwright only
Prompt size 5,321 chars (~1,600 tokens)

Tool Schema Token Budget (Turn 1 cache write = 46,600 tokens)

Source Est. Tools Est. Tokens Share
GitHub MCP (repos+pull_requests) ~52 ~26,000 56%
Playwright MCP ~21 ~10,500 23%
Built-in Claude Code tools ~14 ~2,800 6%
System prompt + workflow prompt ~4,600 10%
Other (Claude infra) ~2,700 6%
Total ~87 ~46,600 100%

Cost Breakdown

Cache writes dominate cost at 75% of spend:

Token Type Avg/Run Sonnet Rate Cost/Run %
Cache write 46,603 $3.75/M $0.1748 75%
Cache read 149,687 $0.30/M $0.0449 19%
Output 801 $15.00/M $0.0120 5%
Input 6 $3.00/M <$0.0001 0%
Total 197,097 $0.2317

Recommendations

1. Switch model to claude-haiku-4-5

Estimated savings: ~$0.17/run (~73%) — highest-impact change by far.

Cache writes are charged at $3.75/M for Sonnet vs $1.00/M for Haiku. Cache reads: $0.30/M vs $0.08/M. Since this workflow does nothing that requires Sonnet-level reasoning (list 2 PRs, navigate to a URL, write a file, add a comment), Haiku is fully capable.

Cost projection with Haiku:

Token Type Avg/Run Haiku Rate Cost/Run
Cache write 46,603 $1.00/M $0.0466
Cache read 149,687 $0.08/M $0.0120
Output 801 $4.00/M $0.0032
Total $0.0618

Implementation — add one line to .github/workflows/smoke-claude.md:

 engine:
   id: claude
   max-turns: 12
+  model: claude-haiku-4-5

Then recompile and post-process:

gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts

Risk: Low. Haiku handles multi-step, parallel tool dispatch reliably. The test tasks are intentionally simple. Validate with 2–3 triggered runs before relying on it.


2. Drop repos from GitHub toolsets

Estimated savings: ~12.5K tokens/run turn-1 write (~75K tokens/run total) → ~$0.02/run after Haiku

Despite specifying toolsets: [repos, pull_requests], 52 GitHub tools are loaded in the agent's context (verified from --allowed-tools in the agent command). Only list_pull_requests is ever used (9/10 runs) plus search_pull_requests once.

The repos toolset likely contributes a large fraction of those 52 tools. Dropping it should cut tool schema tokens by ~12,500 (estimate based on ~25 tools × ~500 tokens/schema) from every Turn 1 cache write:

 tools:
   github:
-    toolsets: [repos, pull_requests]
+    toolsets: [pull_requests]

After making this change, check agent-stdio.log for the --allowed-tools flag to confirm tool count dropped. If the Smoke test fails because a needed tool was in repos, add it back selectively.

Note: If your gh-aw version maps toolsets to a specific tool whitelist, you could also use a more granular override to expose only list_pull_requests and search_pull_requests.


3. Reduce max-turns from 12 to 8

Estimated savings: Prevents runaway cost; no direct token reduction per normal run.

Observed max turns across 10 runs: 7. Avg: 6. Setting max-turns: 8 caps worst-case spend without affecting normal operation:

 engine:
   id: claude
+  model: claude-haiku-4-5
-  max-turns: 12
+  max-turns: 8

At Haiku pricing, a runaway 12-turn vs 8-turn run costs:

  • 12 turns: +6 extra turns × ~23K tokens/turn × $0.08/M cache read = +$0.011
  • Capping at 8 saves ~$0.004/run on 8-turn runs and avoids 12-turn runaway scenarios

4. Move mkdir pre-step + trim prompt (~minor)

Estimated savings: ~800 tokens/run → <$0.001/run

The current pre-agent step only creates a playwright log directory. The workflow prompt (5,321 chars / ~1,600 tokens) could be tightened by ~50%. Specific cuts:

  • Remove the meta-instruction "IMPORTANT: Keep all outputs extremely short..." (the model infers this from context)
  • Consolidate the Output section with the Test Requirements section
  • Remove redundant path hints like (create the directory if it doesn't exist) — Bash handles that

Example condensed prompt (~700 tokens instead of 1,600):

# Smoke Test: Claude Engine Validation

Test all four capabilities and report results as a compact PR comment (max 5–10 lines).

1. **GitHub MCP**: `list_pull_requests` on `$\{\{ github.repository }}` (`perPage: 2`, `state: closed`)
2. **Playwright**: navigate to `https://github.com`, verify page title contains "GitHub"
3. **File Write**: `echo "Smoke test passed for Claude at $(date)" > /tmp/gh-aw/agent/smoke-test-claude-$\{\{ github.run_id }}.txt`
4. **Bash verify**: `cat` the file back

**On PR trigger**: `add_comment` (✅/❌ per test + PASS/FAIL) + `add_labels: [smoke-claude]` if all pass.
**On schedule/dispatch**: `noop` with summary.

Cache Analysis (Anthropic-Specific)

Per-turn breakdown for a typical 6-turn run (run §24548864481):

Turn Input Output Cache Read Cache Write Notes
1 3 6 0 ~43,975 All tool schemas written; 3 parallel tool calls dispatched
2 1 1 ~43,975 ~2,177 Tool results received; "All tests passed" reasoning
3 1 1 ~43,975 ~2,177 add_comment dispatched
4 1 63 ~46,152 ~279 add_labels dispatched
5 1 708 ~46,431 ~103 Final result
Total 6 779 ~136,558 ~46,534

Cache write amortization: The Turn 1 write of ~43,975 tokens (tool schemas) is reused 4× within the same run as cache reads. That's good amortization per run. However, a fresh 43,975-token cache write occurs at the start of every new run (TTL ~5 min between separate PR-triggered runs). With 10 token-producing runs, ~440K tokens were written to cache = $1.65 in cache-write cost alone.

Cache cost vs benefit: Within a run, caching reduces what would otherwise be 5× full context re-reads. Beneficial. The cost is dominated by the initial write, not the reads.


Expected Impact

Metric Current After Rec 1 (Haiku) After Rec 1+2 Savings
Cost/run $0.2317 $0.0618 ~$0.044 −81%
Cache write tokens/run 46,603 46,603 ~34,000 −27%
Total tokens/run ~197K ~197K ~148K −25%
LLM turns avg 6 avg 6 avg 5–6
Cost for 10 runs $2.32 $0.62 ~$0.44 −$1.88

Implementation Checklist

  • Add model: claude-haiku-4-5 to engine: block in .github/workflows/smoke-claude.md
  • Change toolsets: [repos, pull_requests]toolsets: [pull_requests] in tools.github
  • Change max-turns: 12max-turns: 8
  • Optionally tighten the prompt body to ~700 tokens
  • Recompile: gh aw compile .github/workflows/smoke-claude.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Trigger a test run on a PR and verify: Haiku completes all 4 tests ✅
  • Check --allowed-tools in agent-stdio.log to confirm GitHub tool count dropped
  • Compare token usage vs this baseline after 3–5 runs

Generated by Daily Claude Token Optimization Advisor · ● 997.9K ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions