Skip to content

⚡ Claude Token Optimization2026-04-12 — Secret Digger (Claude) #1953

@github-actions

Description

@github-actions

Target Workflow: secret-digger-claude.md

Source report: #1951
Estimated cost per run: $0.51 avg (range $0.48–$0.54)
Total cost this period: $10.71 (21 runs on 2026-04-05)
Total tokens per run: ~126K (38K cache write + 38K cache read + ~1K output + ~127 input)
Cache read rate: 49% (0.97× write/read ratio — effectively no cross-run reuse)
Cache write rate: 100% of context per run
LLM turns: 3.0 avg (1 Haiku triage + 2 Sonnet main)
Model: claude-sonnet-4-6 (main) + claude-haiku-4-5-20251001 (triage)
Share of total period cost: 63% ($10.71 of $16.86)


Current Configuration

Setting Value
Tools loaded 19 (Bash, BashOutput, Edit, Edit(/cache-memory/*), ExitPlanMode, Glob, Grep, KillBash, LS, MultiEdit, MultiEdit(/cache-memory/*), NotebookEdit, NotebookRead, Read, Read(/cache-memory/*), Task, TodoWrite, Write, Write(/cache-memory/*))
Tools actually used Unknown (no tool_usage data) — likely: Bash, Read, Write(/cache-memory/*), Read(/cache-memory/*)
Network groups defaults (inherited from shared/secret-audit.md)
Pre-agent steps No
Prompt size ~5,900 bytes user content (secret-audit.md + version-reporting.md); ~38K tokens total context (dominated by framework system prompts + tool schemas)
max-turns 8 (actual usage: always 3)

Root Cause Analysis

The cost profile is dominated by Anthropic cache write charges on every run:

Turn Model Input Output Cache Read Cache Write
1 Haiku (triage) 123 ~91 0 0
2 Sonnet 4.6 ~4 ~430 0 ~38,731
3 Sonnet 4.6 ~0 ~430 37,927 ~500

Turn 2 writes the full 38K-token context (framework system prompts + tool schemas + user prompt) to Anthropic's prompt cache. Turn 3 reads it back. This within-run reuse is the only caching that occurs.

Why no cross-run cache reuse? Anthropic's cache TTL is ~5 minutes. The Secret Digger schedule runs hourly — every run finds a cold cache and pays the full cache write cost again. The 0.97× cache reuse ratio in the report confirms this: each run writes ~39K and reads only ~38K (from the same run's Turn 2), not from prior runs.

Implied cache write cost for claude-sonnet-4-6: Reverse-engineering from observed $0.51/run with 38.7K cache write tokens implies **$11.12/M tokens** — approximately 3× the claude-3.5-sonnet rate of $3.75/M. This is the primary cost driver.


Recommendations

1. Switch main agent to Haiku (96% cost reduction)

Estimated savings: $0.49/run ($10.35 per 21-run period)

The Secret Digger task is bash-based security exploration: running env, ps aux, find, inspecting /proc, reading files. This is read-only shell forensics — it does not require Sonnet-level reasoning. Haiku already handles the triage turn; it can handle the full investigation too.

Haiku cache write pricing is ~37× cheaper than the implied Sonnet 4.6 rate:

  • Sonnet 4.6 cache write: ~$11.12/M (implied from data)
  • Haiku cache write: ~$0.30/M

Implementation — Option A: workflow-level override (preferred, scoped to this workflow only):

Edit .github/workflows/secret-digger-claude.md:

engine:
  id: claude
  max-turns: 4        # also reduce from 8 (see rec #2)
  env:
    BASH_DEFAULT_TIMEOUT_MS: "1800000"
    BASH_MAX_TIMEOUT_MS: "1800000"
    GH_AW_MODEL_AGENT_CLAUDE: "claude-haiku-4-5-20251001"

Note: GH_AW_MODEL_AGENT_CLAUDE is read by the lock file as \$\{GH_AW_MODEL_AGENT_CLAUDE:+ --model "$GH_AW_MODEL_AGENT_CLAUDE"} — setting it via engine.env injects it into the agent environment. Verify this takes precedence over the vars.GH_AW_MODEL_AGENT_CLAUDE repo variable in the compiled lock.

Implementation — Option B: repo variable (affects all Claude workflows):

Set the GitHub Actions repository variable GH_AW_MODEL_AGENT_CLAUDE to claude-haiku-4-5-20251001. This is simpler but affects Smoke Claude and Security Guard too. Use Option A if you want this scoped.

Quality consideration: The workflow prompt in shared/secret-audit.md already instructs 6–8 focused tool calls per run and uses cache-memory to maintain state across runs. Haiku is well-suited to this structured, bash-heavy task. The final report is written to a GitHub Issue — review a few Haiku-generated issues to confirm quality meets the bar before committing.


2. Lower max-turns from 8 to 4

Estimated savings: $0 token savings (turns are bounded by task completion, not the limit)
Risk reduction: Prevents runaway turn escalation (all 21 runs completed in exactly 3 turns)

In .github/workflows/secret-digger-claude.md:

engine:
  id: claude
  max-turns: 4    # was 8; actual usage is always 3

3. Remove unused tools: NotebookEdit, NotebookRead, Task

Estimated savings: ~1,500–2,100 tokens/run from smaller tool schema (~3–5% of cache write)

Security scanning never needs Jupyter notebook editing or sub-agent spawning. These tools are included by the bash: true tool config but add schema tokens to every prompt.

Check whether gh-aw supports a tool exclusion syntax (not currently documented in secret-digger-claude.md). If supported, add to the workflow:

tools:
  cache-memory: true
  bash: true
  bash-exclude:           # hypothetical — verify syntax in gh-aw docs
    - NotebookEdit
    - NotebookRead
    - Task
    - TodoWrite
    - ExitPlanMode
  github: false

If exclusion isn't supported, this is a framework feature request. Each removed tool saves ~500–700 schema tokens × 2 turns × 21 runs = up to 88K tokens/period.


4. Trim shared/secret-audit.md (minor)

Estimated savings: ~600 tokens/run (~1.5% of cache write)

The investigation prompt is 5,366 bytes (~1,340 tokens) with 10 numbered investigation areas plus emergency exit rules, workflow steps, and reporting instructions. Consolidate redundant sections:

  • Merge steps 1–3 of "Investigation Workflow" into one sentence (they restate the prompt body)
  • Abbreviate the 10 investigation areas into a compact list (agent already knows these techniques)
  • Remove the "Emergency Exit Rule" section — the max-turns config enforces this structurally

Estimated 40–50% size reduction of secret-audit.md → ~670 token savings × 2 turns = ~1,340 tokens/run.


Cache Analysis (Anthropic-Specific)

Turn Model Input Output Cache Read Cache Write Net New
1 Haiku 123 ~91 0 0 214
2 Sonnet 4.6 ~4 ~430 0 38,731 38,735
3 Sonnet 4.6 ~0 ~430 37,927 ~500 ~500

Cache write amortization: Turn 2's 38,731 cache write tokens are read back ONCE by Turn 3. Cost/benefit per run: write cost ≈ $0.43 (at implied $11.12/M), read savings ≈ $0.034 (at $0.88/M implied read price). The cache costs 13× more per run than it saves — it is purely a within-run mechanism and the TTL-vs-schedule mismatch makes it structurally inefficient at this cadence.

Cache write grows across runs: cache_write ranges from 38,526 to 40,088 across the 21 runs (min→max), suggesting cache-memory state accumulation adds ~1–2K tokens to context over time. This will continue growing slowly as investigation history accumulates.


Expected Impact

Metric Current Projected (Haiku + max-turns 4) Savings
Cost/run $0.51 ~$0.018 -96%
Cache write cost/run ~$0.43 ~$0.012 -97%
Cost for 21-run period $10.71 ~$0.38 ~$10.33
LLM turns 3 3 (unchanged) 0
Max runaway turns 8 4 -50%
Token volume 126K/run ~39K effective -69%

Implementation Checklist

  • Decide on model approach: workflow-level env override (Option A) vs repo variable (Option B)
  • Edit .github/workflows/secret-digger-claude.md: set Haiku model, lower max-turns to 4
  • Recompile: gh aw compile .github/workflows/secret-digger-claude.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Trigger 2–3 manual runs (workflow_dispatch) and verify investigation quality in created issues
  • Compare estimated_cost in next token usage report vs $0.51 baseline
  • (Optional) Investigate tool exclusion syntax with gh-aw team; trim shared/secret-audit.md
  • (Optional) Investigate growing cache_write trend — consider pruning old cache-memory entries if context grows beyond ~45K tokens

Generated by Daily Claude Token Optimization Advisor · ● 856.9K ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions