Skip to content

⚡ Claude Token Optimization2026-04-12 — security-guard #1939

@github-actions

Description

@github-actions

Target Workflow: security-guard

Source report: #1920
Estimated cost per run: $0.22 (report 7-day avg) / $0.29 (recent 11-run avg)
Total tokens per run: ~187K (report avg)
Cache read rate: 82% ✅
Cache write rate: 22% of total tokens per run
LLM turns: avg 4.8 (report) / 5.6 (recent) — range 0–11


Current Configuration

Setting Value
Tools loaded github: toolsets: [pull_requests, repos] (~10–15 tools)
Tools actually used github_pull_request_read, github_get_file_contents, github_list_pull_requests
Network groups github only
Pre-agent steps ✅ Yes — fetches PR diff (8KB cap)
Post-agent steps ❌ No
Prompt size 5,371 bytes / 712 words
max-turns 25 (dangerously high)
timeout-minutes 10

Cost Breakdown (per run)

Cache writes dominate at ~75% of total cost per run — this is the primary optimization lever.

Cost Component Tokens (avg) Rate Cost/run % of Total
Cache write ~47K $3.75/M ~$0.176 75%
Output ~4.6K $15.00/M ~$0.069 14%
Cache read ~165K $0.30/M ~$0.050 11%
Input (net-new) ~7 $3.00/M ~$0.000 ~0%

Cache reads are cheap — the 82% cache reuse rate is healthy. The problem is cache writes are expensive and happen on every run because the Anthropic cache TTL (~5 min) rarely spans consecutive PR events.


Recommendations

1. 🔴 Lower max-turns from 25 to 8

Estimated savings: ~$0.46 per outlier run (~20–30% of period cost)

The observability insights flag execution drift: Security Guard varied from 0 to 11 turns, averaging 5.1. With max-turns: 25, a runaway agent loop can run unchecked. Evidence:

  • Outlier run §24290111812: 11 turns, 481K tokens, $0.68 — 3× the average
  • Report also documents a 24-turn, 981K-token run (§24271368232) at $0.57

At 11 turns, the cache grows to ~120K write tokens vs ~43K on a clean 4-turn run. Each extra turn adds ~15K accumulated context to re-cache. Capping at 8 turns provides headroom for complex PRs while eliminating runaway cost.

Implementation — in .github/workflows/security-guard.md frontmatter:

engine:
  id: claude
  max-turns: 8   # was: 25

2. 🔴 Add security-relevance pre-step + early-exit instruction

Estimated savings: ~$0.15–0.22 per skipped run (full run avoided); potential 30–50% run cost reduction if many PRs touch non-security files

Many PRs likely only change documentation, tests, CI config, or unrelated TypeScript — none of which require security analysis. Currently the agent receives the full prompt and tool budget regardless. Adding a pre-step that detects security-relevant files and passing the result to the agent lets it noop immediately on irrelevant PRs.

Step 1 — add to steps: section of security-guard.md:

steps:
  - name: Fetch PR changed files
    id: pr-diff
    if: github.event.pull_request.number
    run: |
      DELIM="GHAW_PR_FILES_$(date +%s)"
      {
        echo "PR_FILES<<${DELIM}"
        gh api "repos/${GH_REPO}/pulls/${PR_NUMBER}/files" \
          --paginate --jq '.[] | "### " + .filename + " (+" + (.additions|tostring) + "/-" + (.deletions|tostring) + ")\n" + (.patch // "") + "\n"' \
          | head -c 8000 || true
        echo ""
        echo "${DELIM}"
      } >> "$GITHUB_OUTPUT"
    env:
      GH_TOKEN: ${{ github.token }}
      PR_NUMBER: ${{ github.event.pull_request.number }}
      GH_REPO: ${{ github.repository }}

  # NEW: count security-critical file changes
  - name: Check security relevance
    id: security-relevance
    if: github.event.pull_request.number
    run: |
      SECURITY_RE="host-iptables|setup-iptables|squid-config|docker-manager|seccomp-profile|domain-patterns|entrypoint\.sh|Dockerfile|containers/"
      COUNT=$(gh api "repos/${GH_REPO}/pulls/${PR_NUMBER}/files" \
        --paginate --jq '.[].filename' \
        | grep -cE "$SECURITY_RE" || echo "0")
      echo "security_files_changed=$COUNT" >> "$GITHUB_OUTPUT"
    env:
      GH_TOKEN: ${{ github.token }}
      PR_NUMBER: ${{ github.event.pull_request.number }}
      GH_REPO: ${{ github.repository }}

Step 2 — add to the agent prompt body (near the top, after the heading):

## Security Relevance Check

**Security-critical files changed in this PR:** ${{ steps.security-relevance.outputs.security_files_changed }}

> If this value is `0`, no security-critical files were modified. Use `noop` immediately without further analysis — this PR does not require a security review.

3. 🟡 Add conciseness constraint to reduce output tokens on multi-turn runs

Estimated savings: ~$0.05–0.12 on complex PRs (reduces per-turn output from 5–17K to 1–3K tokens)

The outlier run generated 17,607 output tokens vs a normal run's 1,791–5,475. Verbose agent explanations cascade: each large output grows the context, inflating cache writes on subsequent turns. A tight output budget breaks this loop.

Add to the "Output Format" section of security-guard.md:

## Output Format

**IMPORTANT: Be concise.** Report each security finding in ≤ 150 words. Maximum 5 findings total.
If you find security concerns:
...

4. 🟡 Investigate and resolve the error/cancellation rate

Estimated savings: eliminates ~30% of wasted compute (cancelled runs)

The 7-day report shows 6 cancelled + 2 errored runs out of 20 = 40% non-success rate. Cancellations likely stem from the 10-minute timeout-minutes limit being hit on PRs with large diffs or complex tool call chains. Each cancelled run may still incur partial token costs.

Investigate:

  • Check if cancelled runs correlate with PRs that have large diffs (diff fetched via head -c 8000 may still trigger slow tool calls)
  • Run max-turns: 8 (rec Improve links in readme to AW project #1) first — reducing turns will also reduce cancellation risk since fewer turns = faster completion

Consider increasing timeout slightly if needed after turn cap:

timeout-minutes: 15   # was: 10 — headroom after max-turns reduction

Cache Analysis (Anthropic-Specific)

Per-run breakdown (all 11 runs from JSON data):

Run Turns Cache Read Cache Write Output Cost Reuse%
§24289182705 4 105K 21K 1.8K $0.14 84%
§24289175288 6 171K 46K 2.7K $0.26 79%
§24290111812 11 409K 120K 17.6K $0.68 77%
§24290215023 5 178K 45K 7.0K $0.33 80%
§24291039056 81K 42K 1.0K 66%
§24291085402 5 150K 25K 3.2K $0.18 86%
§24292050150 6 185K 53K 5.1K $0.33 78%
§24292134863 6 191K 23K 3.1K $0.19 89%
§24292329640 5 180K 55K 5.5K $0.34 77%
§24310558551 4 81K 43K 2.0K $0.22 65%
§24312346928 4 84K 45K 1.4K $0.22 65%

Cache write amortization: Runs with 4 turns show 65% reuse (3 reads per Turn-1 write). Runs with 6 turns show 86–89% reuse (5 reads per Turn-1 write). Cache writes ARE amortized within a run — the problem is the absolute write volume (~47K avg) is high due to tool schema overhead.

Cache cost vs benefit: At 47K avg writes × $3.75/M = $0.18/write. Those same 47K tokens read back over 5 turns × $0.30/M = $0.07. Without caching they'd cost $3.00/M × 47K × 5 turns = $0.71. Caching saves $0.46/run — highly justified. The write cost is unavoidable given the context size.

Cross-run caching: Not reliable — Security Guard runs are spaced >5 minutes apart (20 runs / 7 days ≈ 2.8/day), so each run is a cold cache start. Runs with low cache_write (21K) likely benefit from warm cache when two PRs open in quick succession.


Expected Impact

Metric Current Projected Savings
Total tokens/run (avg) 187K ~140K –25%
Cost/run (avg) $0.22 ~$0.15 –32%
Outlier cost cap $0.68+ ≤$0.40 –41%
Period cost (20 runs) $4.47 ~$3.00 –33%
LLM turns (avg) 4.8 ~3.5 –27%
Non-success rate 40% ~15% –25pp

Projections assume: 30% of PRs skip analysis (rec #2), max-turns enforced at 8 (rec #1), and output verbosity reduced (rec #3).


Implementation Checklist

  • Lower max-turns from 25 to 8 in security-guard.md frontmatter
  • Add Check security relevance pre-step to security-guard.md
  • Add security relevance variable reference at top of agent prompt
  • Add conciseness constraint to "Output Format" section
  • Recompile: gh aw compile .github/workflows/security-guard.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Open a PR and verify CI passes
  • Compare token usage on next 5 runs vs baseline ($0.22 avg)

Generated by Daily Claude Token Optimization Advisor · ● 1.2M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions