You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cache writes dominate at ~75% of total cost per run — this is the primary optimization lever.
Cost Component
Tokens (avg)
Rate
Cost/run
% of Total
Cache write
~47K
$3.75/M
~$0.176
75%
Output
~4.6K
$15.00/M
~$0.069
14%
Cache read
~165K
$0.30/M
~$0.050
11%
Input (net-new)
~7
$3.00/M
~$0.000
~0%
Cache reads are cheap — the 82% cache reuse rate is healthy. The problem is cache writes are expensive and happen on every run because the Anthropic cache TTL (~5 min) rarely spans consecutive PR events.
Recommendations
1. 🔴 Lower max-turns from 25 to 8
Estimated savings: ~$0.46 per outlier run (~20–30% of period cost)
The observability insights flag execution drift: Security Guard varied from 0 to 11 turns, averaging 5.1. With max-turns: 25, a runaway agent loop can run unchecked. Evidence:
Outlier run §24290111812: 11 turns, 481K tokens, $0.68 — 3× the average
Report also documents a 24-turn, 981K-token run (§24271368232) at $0.57
At 11 turns, the cache grows to ~120K write tokens vs ~43K on a clean 4-turn run. Each extra turn adds ~15K accumulated context to re-cache. Capping at 8 turns provides headroom for complex PRs while eliminating runaway cost.
Implementation — in .github/workflows/security-guard.md frontmatter:
Estimated savings: ~$0.15–0.22 per skipped run (full run avoided); potential 30–50% run cost reduction if many PRs touch non-security files
Many PRs likely only change documentation, tests, CI config, or unrelated TypeScript — none of which require security analysis. Currently the agent receives the full prompt and tool budget regardless. Adding a pre-step that detects security-relevant files and passing the result to the agent lets it noop immediately on irrelevant PRs.
Step 1 — add to steps: section of security-guard.md:
Step 2 — add to the agent prompt body (near the top, after the heading):
## Security Relevance Check**Security-critical files changed in this PR:** ${{ steps.security-relevance.outputs.security_files_changed }}
> If this value is `0`, no security-critical files were modified. Use `noop` immediately without further analysis — this PR does not require a security review.
3. 🟡 Add conciseness constraint to reduce output tokens on multi-turn runs
Estimated savings: ~$0.05–0.12 on complex PRs (reduces per-turn output from 5–17K to 1–3K tokens)
The outlier run generated 17,607 output tokens vs a normal run's 1,791–5,475. Verbose agent explanations cascade: each large output grows the context, inflating cache writes on subsequent turns. A tight output budget breaks this loop.
Add to the "Output Format" section of security-guard.md:
## Output Format**IMPORTANT: Be concise.** Report each security finding in ≤ 150 words. Maximum 5 findings total.
If you find security concerns:
...
4. 🟡 Investigate and resolve the error/cancellation rate
Estimated savings: eliminates ~30% of wasted compute (cancelled runs)
The 7-day report shows 6 cancelled + 2 errored runs out of 20 = 40% non-success rate. Cancellations likely stem from the 10-minute timeout-minutes limit being hit on PRs with large diffs or complex tool call chains. Each cancelled run may still incur partial token costs.
Investigate:
Check if cancelled runs correlate with PRs that have large diffs (diff fetched via head -c 8000 may still trigger slow tool calls)
Consider increasing timeout slightly if needed after turn cap:
timeout-minutes: 15# was: 10 — headroom after max-turns reduction
Cache Analysis (Anthropic-Specific)
Per-run breakdown (all 11 runs from JSON data):
Run
Turns
Cache Read
Cache Write
Output
Cost
Reuse%
§24289182705
4
105K
21K
1.8K
$0.14
84%
§24289175288
6
171K
46K
2.7K
$0.26
79%
§24290111812
11
409K
120K
17.6K
$0.68
77%
§24290215023
5
178K
45K
7.0K
$0.33
80%
§24291039056
—
81K
42K
1.0K
—
66%
§24291085402
5
150K
25K
3.2K
$0.18
86%
§24292050150
6
185K
53K
5.1K
$0.33
78%
§24292134863
6
191K
23K
3.1K
$0.19
89%
§24292329640
5
180K
55K
5.5K
$0.34
77%
§24310558551
4
81K
43K
2.0K
$0.22
65%
§24312346928
4
84K
45K
1.4K
$0.22
65%
Cache write amortization: Runs with 4 turns show 65% reuse (3 reads per Turn-1 write). Runs with 6 turns show 86–89% reuse (5 reads per Turn-1 write). Cache writes ARE amortized within a run — the problem is the absolute write volume (~47K avg) is high due to tool schema overhead.
Cache cost vs benefit: At 47K avg writes × $3.75/M = $0.18/write. Those same 47K tokens read back over 5 turns × $0.30/M = $0.07. Without caching they'd cost $3.00/M × 47K × 5 turns = $0.71. Caching saves $0.46/run — highly justified. The write cost is unavoidable given the context size.
Cross-run caching: Not reliable — Security Guard runs are spaced >5 minutes apart (20 runs / 7 days ≈ 2.8/day), so each run is a cold cache start. Runs with low cache_write (21K) likely benefit from warm cache when two PRs open in quick succession.
Expected Impact
Metric
Current
Projected
Savings
Total tokens/run (avg)
187K
~140K
–25%
Cost/run (avg)
$0.22
~$0.15
–32%
Outlier cost cap
$0.68+
≤$0.40
–41%
Period cost (20 runs)
$4.47
~$3.00
–33%
LLM turns (avg)
4.8
~3.5
–27%
Non-success rate
40%
~15%
–25pp
Projections assume: 30% of PRs skip analysis (rec #2), max-turns enforced at 8 (rec #1), and output verbosity reduced (rec #3).
Implementation Checklist
Lower max-turns from 25 to 8 in security-guard.md frontmatter
Add Check security relevance pre-step to security-guard.md
Add security relevance variable reference at top of agent prompt
Add conciseness constraint to "Output Format" section
Recompile: gh aw compile .github/workflows/security-guard.md
Target Workflow:
security-guardSource report: #1920
Estimated cost per run: $0.22 (report 7-day avg) / $0.29 (recent 11-run avg)
Total tokens per run: ~187K (report avg)
Cache read rate: 82% ✅
Cache write rate: 22% of total tokens per run
LLM turns: avg 4.8 (report) / 5.6 (recent) — range 0–11
Current Configuration
github: toolsets: [pull_requests, repos](~10–15 tools)github_pull_request_read,github_get_file_contents,github_list_pull_requestsgithubonlyCost Breakdown (per run)
Cache writes dominate at ~75% of total cost per run — this is the primary optimization lever.
Recommendations
1. 🔴 Lower
max-turnsfrom 25 to 8Estimated savings: ~$0.46 per outlier run (~20–30% of period cost)
The observability insights flag execution drift: Security Guard varied from 0 to 11 turns, averaging 5.1. With
max-turns: 25, a runaway agent loop can run unchecked. Evidence:At 11 turns, the cache grows to ~120K write tokens vs ~43K on a clean 4-turn run. Each extra turn adds ~15K accumulated context to re-cache. Capping at 8 turns provides headroom for complex PRs while eliminating runaway cost.
Implementation — in
.github/workflows/security-guard.mdfrontmatter:2. 🔴 Add security-relevance pre-step + early-exit instruction
Estimated savings: ~$0.15–0.22 per skipped run (full run avoided); potential 30–50% run cost reduction if many PRs touch non-security files
Many PRs likely only change documentation, tests, CI config, or unrelated TypeScript — none of which require security analysis. Currently the agent receives the full prompt and tool budget regardless. Adding a pre-step that detects security-relevant files and passing the result to the agent lets it noop immediately on irrelevant PRs.
Step 1 — add to
steps:section ofsecurity-guard.md:Step 2 — add to the agent prompt body (near the top, after the heading):
3. 🟡 Add conciseness constraint to reduce output tokens on multi-turn runs
Estimated savings: ~$0.05–0.12 on complex PRs (reduces per-turn output from 5–17K to 1–3K tokens)
The outlier run generated 17,607 output tokens vs a normal run's 1,791–5,475. Verbose agent explanations cascade: each large output grows the context, inflating cache writes on subsequent turns. A tight output budget breaks this loop.
Add to the "Output Format" section of
security-guard.md:4. 🟡 Investigate and resolve the error/cancellation rate
Estimated savings: eliminates ~30% of wasted compute (cancelled runs)
The 7-day report shows 6 cancelled + 2 errored runs out of 20 = 40% non-success rate. Cancellations likely stem from the 10-minute
timeout-minuteslimit being hit on PRs with large diffs or complex tool call chains. Each cancelled run may still incur partial token costs.Investigate:
head -c 8000may still trigger slow tool calls)max-turns: 8(rec Improve links in readme to AW project #1) first — reducing turns will also reduce cancellation risk since fewer turns = faster completionConsider increasing timeout slightly if needed after turn cap:
Cache Analysis (Anthropic-Specific)
Per-run breakdown (all 11 runs from JSON data):
Cache write amortization: Runs with 4 turns show 65% reuse (3 reads per Turn-1 write). Runs with 6 turns show 86–89% reuse (5 reads per Turn-1 write). Cache writes ARE amortized within a run — the problem is the absolute write volume (~47K avg) is high due to tool schema overhead.
Cache cost vs benefit: At 47K avg writes × $3.75/M = $0.18/write. Those same 47K tokens read back over 5 turns × $0.30/M = $0.07. Without caching they'd cost $3.00/M × 47K × 5 turns = $0.71. Caching saves $0.46/run — highly justified. The write cost is unavoidable given the context size.
Cross-run caching: Not reliable — Security Guard runs are spaced >5 minutes apart (20 runs / 7 days ≈ 2.8/day), so each run is a cold cache start. Runs with low cache_write (21K) likely benefit from warm cache when two PRs open in quick succession.
Expected Impact
Implementation Checklist
max-turnsfrom 25 to 8 insecurity-guard.mdfrontmatterCheck security relevancepre-step tosecurity-guard.mdgh aw compile .github/workflows/security-guard.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts