Skip to content

⚡ Claude Token Optimization2026-04-13 — Secret Digger (Claude) #1968

@github-actions

Description

@github-actions

Target Workflow: secret-digger-claude.md

Source report: #1967
Estimated cost per run: $0.51
Total tokens per run: ~126K (77K raw; token_usage field includes overhead multiplier)
Cache read rate: ~49% of context (within-run only; no cross-run reuse)
Cache write rate: ~50% of context
LLM turns: 3 (1 main agent + 2 threat detection)


Current Configuration

Setting Value
Tools loaded bash, cache-memory (minimal — already optimal)
Network groups defaults only
Pre-agent steps None
Prompt size ~5,850 chars (user content) + ~145K chars framework injections → ~38K token system prompt
Models Main agent: claude-haiku-4-5-20251001 (1 turn) · Threat detection: claude-sonnet-4-6 (2 turns)

Turn Architecture

This workflow runs two separate claude subprocess invocations:

  1. Main agent (claude --model claude-haiku-4-5-20251001 --max-turns 4): Uses Haiku ✅ — runs the security investigation (bash commands, file reads). Makes 1 turn.
  2. Threat detection (claude — no --model flag): Defaults to claude-sonnet-4-6 ❌ — runs post-hoc validation. Makes 2 turns.

The GH_AW_MODEL_AGENT_CLAUDE: "claude-haiku-4-5-20251001" env var in engine.env correctly controls the main agent. The threat detection step reads its model from $\{\{ vars.GH_AW_MODEL_DETECTION_CLAUDE || '' }} — a repository variable, not the frontmatter env. Since that variable is unset, the detection step falls back to Sonnet.


Token Analysis: The Cache Write Problem

Every run pays a cache write penalty of ~38K tokens for the full system prompt. Because runs are spaced ~1 hour apart and the Anthropic cache TTL is ~5 minutes, zero cross-run cache reuse occurs — each run is a cold start.

Within a single run, however, the cache works efficiently: the detection step's Turn 1 writes ~38K tokens, and Turn 2 reads them back (≈1.0× within-run efficiency).

Per-Turn Cache Breakdown (avg over 5 runs)

Turn Model Input Output Cache Read Cache Write Naive Cost
T1 (main agent) Haiku 123 ~87 0 0 ~$0.001
T2 (detection) Sonnet ~2 ~450 0 ~38,990 ~$0.153
T3 (detection) Sonnet ~2 ~400 ~37,927 ~0 ~$0.017
Total ~127 ~937 ~37,927 ~38,990 ~$0.171

Note: Reported cost ($0.51) is ~3× the naive calculation — consistent with a cost estimator applying higher effective rates for the context overhead in token_usage (125,840) vs the raw summary sum (77,728). The 3× ratio is stable across all 5 runs (σ < 1%), so savings estimates below apply proportionally.

Cache write amortization: Turn 2's 38K cache write is read once by Turn 3. That's a 1:1 write-to-read ratio within the run. Since no cross-run reads occur, each run's cache write is effectively single-use.

Cache cost vs benefit (Sonnet): Writing 38K tokens at $3.75/M = ~$0.145. Reading them at $0.30/M = ~$0.011. Break-even requires ~12.5× reads per write. With only 1 intra-run read, caching saves ~$0.011 but costs ~$0.145 — a net loss of ~$0.134 vs uncached if the output could be regenerated more cheaply. The saving grace: moving to Haiku reduces the cache write cost by 3.75×.


Recommendations

1. Set GH_AW_MODEL_DETECTION_CLAUDE repository variable

Estimated savings: ~$0.37/run (~73%)

The threat detection step (2 of 3 API calls) uses Sonnet by default because vars.GH_AW_MODEL_DETECTION_CLAUDE is unset in the repository. The task — running bash commands and analyzing sandbox observations — does not require frontier reasoning; Haiku handles it equally well.

No code change required. Set a repository variable:

Repository Settings → Secrets and variables → Actions → Variables
Name:  GH_AW_MODEL_DETECTION_CLAUDE
Value: claude-haiku-4-5-20251001

All Anthropic token rates scale by exactly 1/3.75 when switching from Sonnet to Haiku:

  • Cache write: $3.75/M → $1.00/M
  • Cache read: $0.30/M → $0.08/M
  • Output: $15.00/M → $4.00/M

Projected cost per run: $0.51 × (1/3.75) ≈ $0.14/run

2. Remove Task tool from both claude invocations

Estimated savings: Risk mitigation (prevents Sonnet sub-agent spawns)

The compiled lock file's --allowed-tools includes Task for both the main agent and threat detection steps. The Task tool allows Claude to spawn sub-agents — which may use the default model (Sonnet), bypassing the --model flag. The Secret Digger doesn't need sub-agent spawning; it's a linear investigation workflow.

Fix: In .github/workflows/shared/secret-audit.md, the current tools block is:

tools:
  cache-memory: true
  bash: true

The Task tool is included by gh-aw as a default Claude tool, not via the tools: block. To explicitly exclude it, check if gh-aw supports a tools.task: false option when it becomes available. Until then, monitor whether Task is ever invoked in runs (it hasn't appeared in tool usage logs for the 5 analyzed runs, suggesting Claude isn't using it).

3. Condense secret-audit.md prompt

Estimated savings: ~$0.002/run (<1%) — low priority

The 10-item investigation areas list (5,366 chars) is verbose. The Emergency Exit Rule section (turn budget guidance) is particularly long relative to its value. Trimming to ~2,500 chars saves ~720 tokens.

At Haiku cache write rate ($1.00/M): 720 tokens × $1.00/M × 3 (multiplier) = ~$0.002/run. Negligible.

Only worthwhile if the prompt is causing model confusion (e.g., if the agent is ignoring the "ONE deep area" focus instruction and spreading across all 10 areas — which poor_agentic_control flags have suggested in other Claude workflows).

4. Remove version-reporting.md import

Estimated savings: ~$0.0004/run — trivial

488 chars / ~120 tokens. The version information (cli_version from the lock file) doesn't add investigation value for a security scan. Remove this import to shrink the system prompt slightly.


Expected Impact

Metric Current Projected Savings
Cost/run $0.51 ~$0.14 −73%
Cost per 5-run session $2.55 ~$0.69 −$1.86
Monthly cost (est. 20 runs) ~$10.20 ~$2.76 −$7.44
Cache write cost/run ~$0.145 ~$0.039 −73%
Model quality Sonnet detection Haiku detection ✅ Adequate for bash-based investigation

Implementation Checklist

  • Primary fix: Set repo variable GH_AW_MODEL_DETECTION_CLAUDE = claude-haiku-4-5-20251001 in repository settings (no code change needed)
  • Verify subsequent run uses Haiku for all 3 turns in token report
  • Optional: Trim secret-audit.md to remove the verbose 10-item area list and collapse into 5 focus areas
  • Optional: Remove version-reporting.md import from secret-digger-claude.md
  • If prompt changes are made: recompile with gh aw compile .github/workflows/secret-digger-claude.md and post-process with npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Compare token usage on next run vs baseline ($0.51/run)

Technical Notes

The GH_AW_MODEL_AGENT_CLAUDE: "claude-haiku-4-5-20251001" env var in the frontmatter already correctly overrides the main agent model. The pattern \$\{GH_AW_MODEL_AGENT_CLAUDE:+ --model "$GH_AW_MODEL_AGENT_CLAUDE"} in the compiled lock file appends --model claude-haiku-4-5-20251001 to the main claude invocation. The threat detection invocation uses the analogous \$\{GH_AW_MODEL_DETECTION_CLAUDE:+ --model "$GH_AW_MODEL_DETECTION_CLAUDE"} pattern — but reads its value from $\{\{ vars.GH_AW_MODEL_DETECTION_CLAUDE || '' }} (a GitHub Actions variable, not the frontmatter env), which evaluates to empty string when the variable is unset.

Generated by Daily Claude Token Optimization Advisor · ● 1.3M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions