⚡ Claude Token Optimization2026-04-17 — Smoke Claude

## Target Workflow: `smoke-claude.md`

**Source report:** #2053
**Estimated cost per run:** $0.23
**Total tokens per run:** ~197K raw (46.6K cache-write + 149.7K cache-read + 0.8K output)
**Cache read rate:** 76% (149.7K / 197K)
**Cache write rate:** 24% (46.6K / 197K)
**LLM turns:** avg 6 / run (range: 5–7)

---

## Current Configuration

| Setting | Value |
|---------|-------|
| Engine | `claude` (`claude-sonnet-4-6`) |
| Max turns | 12 |
| Tools loaded | GitHub MCP: **52 tools** · Playwright MCP: **21 tools** · Built-in: ~14 |
| Tools actually used | `list_pull_requests` (9/10 runs) · `browser_navigate` (10/10) · `Bash` (10/10) · `safeoutputs` (10/10) = **~5 unique tools** |
| GitHub toolsets | `[repos, pull_requests]` — currently loads 52 tools despite restriction |
| Network groups | `defaults`, `github`, `playwright` |
| Pre-agent steps | ✅ Yes — `mkdir /tmp/gh-aw/mcp-logs/playwright` only |
| Prompt size | **5,321 chars** (~1,600 tokens) |

### Tool Schema Token Budget (Turn 1 cache write = 46,600 tokens)

| Source | Est. Tools | Est. Tokens | Share |
|--------|-----------|-------------|-------|
| GitHub MCP (`repos`+`pull_requests`) | ~52 | ~26,000 | 56% |
| Playwright MCP | ~21 | ~10,500 | 23% |
| Built-in Claude Code tools | ~14 | ~2,800 | 6% |
| System prompt + workflow prompt | — | ~4,600 | 10% |
| Other (Claude infra) | — | ~2,700 | 6% |
| **Total** | **~87** | **~46,600** | 100% |

---

## Cost Breakdown

Cache writes dominate cost at **75%** of spend:

| Token Type | Avg/Run | Sonnet Rate | Cost/Run | % |
|-----------|--------:|------------|--------:|--:|
| Cache write | 46,603 | $3.75/M | $0.1748 | 75% |
| Cache read | 149,687 | $0.30/M | $0.0449 | 19% |
| Output | 801 | $15.00/M | $0.0120 | 5% |
| Input | 6 | $3.00/M | <$0.0001 | 0% |
| **Total** | **197,097** | | **$0.2317** | |

---

## Recommendations

### 1. Switch model to `claude-haiku-4-5`

**Estimated savings: ~$0.17/run (~73%)** — highest-impact change by far.

Cache writes are charged at **$3.75/M for Sonnet vs $1.00/M for Haiku**. Cache reads: **$0.30/M vs $0.08/M**. Since this workflow does nothing that requires Sonnet-level reasoning (list 2 PRs, navigate to a URL, write a file, add a comment), Haiku is fully capable.

**Cost projection with Haiku:**

| Token Type | Avg/Run | Haiku Rate | Cost/Run |
|-----------|--------:|-----------|--------:|
| Cache write | 46,603 | $1.00/M | $0.0466 |
| Cache read | 149,687 | $0.08/M | $0.0120 |
| Output | 801 | $4.00/M | $0.0032 |
| **Total** | | | **$0.0618** |

**Implementation** — add one line to `.github/workflows/smoke-claude.md`:

```diff
 engine:
   id: claude
   max-turns: 12
+  model: claude-haiku-4-5
```

Then recompile and post-process:
```bash
gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts
```

**Risk:** Low. Haiku handles multi-step, parallel tool dispatch reliably. The test tasks are intentionally simple. Validate with 2–3 triggered runs before relying on it.

---

### 2. Drop `repos` from GitHub toolsets

**Estimated savings: ~12.5K tokens/run turn-1 write (~75K tokens/run total) → ~$0.02/run after Haiku**

Despite specifying `toolsets: [repos, pull_requests]`, **52 GitHub tools** are loaded in the agent's context (verified from `--allowed-tools` in the agent command). Only `list_pull_requests` is ever used (9/10 runs) plus `search_pull_requests` once.

The `repos` toolset likely contributes a large fraction of those 52 tools. Dropping it should cut tool schema tokens by ~12,500 (estimate based on ~25 tools × ~500 tokens/schema) from every Turn 1 cache write:

```diff
 tools:
   github:
-    toolsets: [repos, pull_requests]
+    toolsets: [pull_requests]
```

After making this change, check `agent-stdio.log` for the `--allowed-tools` flag to confirm tool count dropped. If the Smoke test fails because a needed tool was in `repos`, add it back selectively.

**Note:** If your gh-aw version maps toolsets to a specific tool whitelist, you could also use a more granular override to expose only `list_pull_requests` and `search_pull_requests`.

---

### 3. Reduce `max-turns` from 12 to 8

**Estimated savings: Prevents runaway cost; no direct token reduction per normal run.**

Observed max turns across 10 runs: **7**. Avg: 6. Setting `max-turns: 8` caps worst-case spend without affecting normal operation:

```diff
 engine:
   id: claude
+  model: claude-haiku-4-5
-  max-turns: 12
+  max-turns: 8
```

At Haiku pricing, a runaway 12-turn vs 8-turn run costs:
- 12 turns: +6 extra turns × ~23K tokens/turn × $0.08/M cache read = +$0.011
- Capping at 8 saves ~$0.004/run on 8-turn runs and avoids 12-turn runaway scenarios

---

### 4. Move `mkdir` pre-step + trim prompt (~minor)

**Estimated savings: ~800 tokens/run → <$0.001/run**

The current pre-agent step only creates a playwright log directory. The workflow prompt (5,321 chars / ~1,600 tokens) could be tightened by ~50%. Specific cuts:

- Remove the meta-instruction `"IMPORTANT: Keep all outputs extremely short..."` (the model infers this from context)
- Consolidate the Output section with the Test Requirements section
- Remove redundant path hints like `(create the directory if it doesn't exist)` — Bash handles that

Example condensed prompt (~700 tokens instead of 1,600):

```markdown
# Smoke Test: Claude Engine Validation

Test all four capabilities and report results as a compact PR comment (max 5–10 lines).

1. **GitHub MCP**: `list_pull_requests` on `$\{\{ github.repository }}` (`perPage: 2`, `state: closed`)
2. **Playwright**: navigate to `https://github.com`, verify page title contains "GitHub"
3. **File Write**: `echo "Smoke test passed for Claude at $(date)" > /tmp/gh-aw/agent/smoke-test-claude-$\{\{ github.run_id }}.txt`
4. **Bash verify**: `cat` the file back

**On PR trigger**: `add_comment` (✅/❌ per test + PASS/FAIL) + `add_labels: [smoke-claude]` if all pass.
**On schedule/dispatch**: `noop` with summary.
```

---

## Cache Analysis (Anthropic-Specific)

Per-turn breakdown for a typical 6-turn run (run [§24548864481](https://github.com/github/gh-aw-firewall/actions/runs/24548864481)):

| Turn | Input | Output | Cache Read | Cache Write | Notes |
|------|------:|-------:|-----------:|------------:|-------|
| 1 | 3 | 6 | 0 | ~43,975 | All tool schemas written; 3 parallel tool calls dispatched |
| 2 | 1 | 1 | ~43,975 | ~2,177 | Tool results received; "All tests passed" reasoning |
| 3 | 1 | 1 | ~43,975 | ~2,177 | `add_comment` dispatched |
| 4 | 1 | 63 | ~46,152 | ~279 | `add_labels` dispatched |
| 5 | 1 | 708 | ~46,431 | ~103 | Final result |
| **Total** | **6** | **779** | **~136,558** | **~46,534** | |

**Cache write amortization:** The Turn 1 write of ~43,975 tokens (tool schemas) is reused 4× within the same run as cache reads. That's good amortization per run. However, a **fresh 43,975-token cache write occurs at the start of every new run** (TTL ~5 min between separate PR-triggered runs). With 10 token-producing runs, ~440K tokens were written to cache = $1.65 in cache-write cost alone.

**Cache cost vs benefit:** Within a run, caching reduces what would otherwise be 5× full context re-reads. Beneficial. The cost is dominated by the initial write, not the reads.

---

## Expected Impact

| Metric | Current | After Rec 1 (Haiku) | After Rec 1+2 | Savings |
|--------|---------|---------------------|----------------|---------|
| Cost/run | $0.2317 | $0.0618 | ~$0.044 | **−81%** |
| Cache write tokens/run | 46,603 | 46,603 | ~34,000 | −27% |
| Total tokens/run | ~197K | ~197K | ~148K | −25% |
| LLM turns | avg 6 | avg 6 | avg 5–6 | — |
| Cost for 10 runs | $2.32 | $0.62 | ~$0.44 | −$1.88 |

---

## Implementation Checklist

- [ ] Add `model: claude-haiku-4-5` to `engine:` block in `.github/workflows/smoke-claude.md`
- [ ] Change `toolsets: [repos, pull_requests]` → `toolsets: [pull_requests]` in `tools.github`
- [ ] Change `max-turns: 12` → `max-turns: 8`
- [ ] Optionally tighten the prompt body to ~700 tokens
- [ ] Recompile: `gh aw compile .github/workflows/smoke-claude.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Trigger a test run on a PR and verify: Haiku completes all 4 tests ✅
- [ ] Check `--allowed-tools` in `agent-stdio.log` to confirm GitHub tool count dropped
- [ ] Compare token usage vs this baseline after 3–5 runs




> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/24566435304/agentic_workflow) · ● 997.9K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Claude Token Optimization2026-04-17 — Smoke Claude #2055

Target Workflow: `smoke-claude.md`

Current Configuration

Tool Schema Token Budget (Turn 1 cache write = 46,600 tokens)

Cost Breakdown

Recommendations

1. Switch model to `claude-haiku-4-5`

2. Drop `repos` from GitHub toolsets

3. Reduce `max-turns` from 12 to 8

4. Move `mkdir` pre-step + trim prompt (~minor)

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Engine	`claude` (`claude-sonnet-4-6`)
Max turns	12
Tools loaded	GitHub MCP: 52 tools · Playwright MCP: 21 tools · Built-in: ~14
Tools actually used	`list_pull_requests` (9/10 runs) · `browser_navigate` (10/10) · `Bash` (10/10) · `safeoutputs` (10/10) = ~5 unique tools
GitHub toolsets	`[repos, pull_requests]` — currently loads 52 tools despite restriction
Network groups	`defaults`, `github`, `playwright`
Pre-agent steps	✅ Yes — `mkdir /tmp/gh-aw/mcp-logs/playwright` only
Prompt size	5,321 chars (~1,600 tokens)

Source	Est. Tools	Est. Tokens	Share
GitHub MCP (`repos`+`pull_requests`)	~52	~26,000	56%
Playwright MCP	~21	~10,500	23%
Built-in Claude Code tools	~14	~2,800	6%
System prompt + workflow prompt	—	~4,600	10%
Other (Claude infra)	—	~2,700	6%
Total	~87	~46,600	100%

Token Type	Avg/Run	Sonnet Rate	Cost/Run	%
Cache write	46,603	$3.75/M	$0.1748	75%
Cache read	149,687	$0.30/M	$0.0449	19%
Output	801	$15.00/M	$0.0120	5%
Input	6	$3.00/M	<$0.0001	0%
Total	197,097		$0.2317

Token Type	Avg/Run	Haiku Rate	Cost/Run
Cache write	46,603	$1.00/M	$0.0466
Cache read	149,687	$0.08/M	$0.0120
Output	801	$4.00/M	$0.0032
Total			$0.0618

Turn	Input	Output	Cache Read	Cache Write	Notes
1	3	6	0	~43,975	All tool schemas written; 3 parallel tool calls dispatched
2	1	1	~43,975	~2,177	Tool results received; "All tests passed" reasoning
3	1	1	~43,975	~2,177	`add_comment` dispatched
4	1	63	~46,152	~279	`add_labels` dispatched
5	1	708	~46,431	~103	Final result
Total	6	779	~136,558	~46,534

Metric	Current	After Rec 1 (Haiku)	After Rec 1+2	Savings
Cost/run	$0.2317	$0.0618	~$0.044	−81%
Cache write tokens/run	46,603	46,603	~34,000	−27%
Total tokens/run	~197K	~197K	~148K	−25%
LLM turns	avg 6	avg 6	avg 5–6	—
Cost for 10 runs	$2.32	$0.62	~$0.44	−$1.88

⚡ Claude Token Optimization2026-04-17 — Smoke Claude #2055

Description

Target Workflow: smoke-claude.md

Current Configuration

Tool Schema Token Budget (Turn 1 cache write = 46,600 tokens)

Cost Breakdown

Recommendations

1. Switch model to claude-haiku-4-5

2. Drop repos from GitHub toolsets

3. Reduce max-turns from 12 to 8

4. Move mkdir pre-step + trim prompt (~minor)

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `smoke-claude.md`

1. Switch model to `claude-haiku-4-5`

2. Drop `repos` from GitHub toolsets

3. Reduce `max-turns` from 12 to 8

4. Move `mkdir` pre-step + trim prompt (~minor)