⚡ Claude Token Optimization2026-04-12 — security-guard

## Target Workflow: `security-guard`

**Source report:** #1920
**Estimated cost per run:** $0.22 (report 7-day avg) / $0.29 (recent 11-run avg)
**Total tokens per run:** ~187K (report avg)
**Cache read rate:** 82% ✅
**Cache write rate:** 22% of total tokens per run
**LLM turns:** avg 4.8 (report) / 5.6 (recent) — range **0–11**

---

## Current Configuration

| Setting | Value |
|---------|-------|
| Tools loaded | `github: toolsets: [pull_requests, repos]` (~10–15 tools) |
| Tools actually used | `github_pull_request_read`, `github_get_file_contents`, `github_list_pull_requests` |
| Network groups | `github` only |
| Pre-agent steps | ✅ Yes — fetches PR diff (8KB cap) |
| Post-agent steps | ❌ No |
| Prompt size | 5,371 bytes / 712 words |
| max-turns | **25** (dangerously high) |
| timeout-minutes | 10 |

---

## Cost Breakdown (per run)

Cache writes dominate at ~75% of total cost per run — this is the primary optimization lever.

| Cost Component | Tokens (avg) | Rate | Cost/run | % of Total |
|---|---:|---|---:|---:|
| Cache write | ~47K | $3.75/M | ~$0.176 | **75%** |
| Output | ~4.6K | $15.00/M | ~$0.069 | **14%** |
| Cache read | ~165K | $0.30/M | ~$0.050 | **11%** |
| Input (net-new) | ~7 | $3.00/M | ~$0.000 | ~0% |

> Cache reads are cheap — the 82% cache reuse rate is healthy. The problem is **cache writes are expensive** and happen on every run because the Anthropic cache TTL (~5 min) rarely spans consecutive PR events.

---

## Recommendations

### 1. 🔴 Lower `max-turns` from 25 to 8

**Estimated savings:** ~$0.46 per outlier run (~20–30% of period cost)

The observability insights flag **execution drift**: Security Guard varied from 0 to 11 turns, averaging 5.1. With `max-turns: 25`, a runaway agent loop can run unchecked. Evidence:

- Outlier run [§24290111812](https://github.com/github/gh-aw-firewall/actions/runs/24290111812): **11 turns, 481K tokens, $0.68** — 3× the average
- Report also documents a 24-turn, 981K-token run ([§24271368232](https://github.com/github/gh-aw-firewall/actions/runs/24271368232)) at $0.57

At 11 turns, the cache grows to ~120K write tokens vs ~43K on a clean 4-turn run. Each extra turn adds ~15K accumulated context to re-cache. Capping at 8 turns provides headroom for complex PRs while eliminating runaway cost.

**Implementation** — in `.github/workflows/security-guard.md` frontmatter:

```yaml
engine:
  id: claude
  max-turns: 8   # was: 25
```

---

### 2. 🔴 Add security-relevance pre-step + early-exit instruction

**Estimated savings:** ~$0.15–0.22 per skipped run (full run avoided); potential 30–50% run cost reduction if many PRs touch non-security files

Many PRs likely only change documentation, tests, CI config, or unrelated TypeScript — none of which require security analysis. Currently the agent receives the full prompt and tool budget regardless. Adding a pre-step that detects security-relevant files and passing the result to the agent lets it noop immediately on irrelevant PRs.

**Step 1 — add to `steps:` section of `security-guard.md`:**

```yaml
steps:
  - name: Fetch PR changed files
    id: pr-diff
    if: github.event.pull_request.number
    run: |
      DELIM="GHAW_PR_FILES_$(date +%s)"
      {
        echo "PR_FILES<<${DELIM}"
        gh api "repos/${GH_REPO}/pulls/${PR_NUMBER}/files" \
          --paginate --jq '.[] | "### " + .filename + " (+" + (.additions|tostring) + "/-" + (.deletions|tostring) + ")\n" + (.patch // "") + "\n"' \
          | head -c 8000 || true
        echo ""
        echo "${DELIM}"
      } >> "$GITHUB_OUTPUT"
    env:
      GH_TOKEN: ${{ github.token }}
      PR_NUMBER: ${{ github.event.pull_request.number }}
      GH_REPO: ${{ github.repository }}

  # NEW: count security-critical file changes
  - name: Check security relevance
    id: security-relevance
    if: github.event.pull_request.number
    run: |
      SECURITY_RE="host-iptables|setup-iptables|squid-config|docker-manager|seccomp-profile|domain-patterns|entrypoint\.sh|Dockerfile|containers/"
      COUNT=$(gh api "repos/${GH_REPO}/pulls/${PR_NUMBER}/files" \
        --paginate --jq '.[].filename' \
        | grep -cE "$SECURITY_RE" || echo "0")
      echo "security_files_changed=$COUNT" >> "$GITHUB_OUTPUT"
    env:
      GH_TOKEN: ${{ github.token }}
      PR_NUMBER: ${{ github.event.pull_request.number }}
      GH_REPO: ${{ github.repository }}
```

**Step 2 — add to the agent prompt body** (near the top, after the heading):

```markdown
## Security Relevance Check

**Security-critical files changed in this PR:** ${{ steps.security-relevance.outputs.security_files_changed }}

> If this value is `0`, no security-critical files were modified. Use `noop` immediately without further analysis — this PR does not require a security review.
```

---

### 3. 🟡 Add conciseness constraint to reduce output tokens on multi-turn runs

**Estimated savings:** ~$0.05–0.12 on complex PRs (reduces per-turn output from 5–17K to 1–3K tokens)

The outlier run generated **17,607 output tokens** vs a normal run's 1,791–5,475. Verbose agent explanations cascade: each large output grows the context, inflating cache writes on subsequent turns. A tight output budget breaks this loop.

**Add to the "Output Format" section of `security-guard.md`:**

```markdown
## Output Format

**IMPORTANT: Be concise.** Report each security finding in ≤ 150 words. Maximum 5 findings total.
If you find security concerns:
...
```

---

### 4. 🟡 Investigate and resolve the error/cancellation rate

**Estimated savings:** eliminates ~30% of wasted compute (cancelled runs)

The 7-day report shows 6 cancelled + 2 errored runs out of 20 = **40% non-success rate**. Cancellations likely stem from the 10-minute `timeout-minutes` limit being hit on PRs with large diffs or complex tool call chains. Each cancelled run may still incur partial token costs.

**Investigate:**
- Check if cancelled runs correlate with PRs that have large diffs (diff fetched via `head -c 8000` may still trigger slow tool calls)
- Run `max-turns: 8` (rec #1) first — reducing turns will also reduce cancellation risk since fewer turns = faster completion

**Consider increasing timeout slightly if needed after turn cap:**

```yaml
timeout-minutes: 15   # was: 10 — headroom after max-turns reduction
```

---

## Cache Analysis (Anthropic-Specific)

Per-run breakdown (all 11 runs from JSON data):

| Run | Turns | Cache Read | Cache Write | Output | Cost | Reuse% |
|-----|------:|-----------:|------------:|-------:|-----:|-------:|
| §24289182705 | 4 | 105K | 21K | 1.8K | $0.14 | 84% |
| §24289175288 | 6 | 171K | 46K | 2.7K | $0.26 | 79% |
| §24290111812 | **11** | 409K | 120K | 17.6K | **$0.68** | 77% |
| §24290215023 | 5 | 178K | 45K | 7.0K | $0.33 | 80% |
| §24291039056 | — | 81K | 42K | 1.0K | — | 66% |
| §24291085402 | 5 | 150K | 25K | 3.2K | $0.18 | 86% |
| §24292050150 | 6 | 185K | 53K | 5.1K | $0.33 | 78% |
| §24292134863 | 6 | 191K | 23K | 3.1K | $0.19 | 89% |
| §24292329640 | 5 | 180K | 55K | 5.5K | $0.34 | 77% |
| §24310558551 | 4 | 81K | 43K | 2.0K | $0.22 | 65% |
| §24312346928 | 4 | 84K | 45K | 1.4K | $0.22 | 65% |

**Cache write amortization:** Runs with 4 turns show 65% reuse (3 reads per Turn-1 write). Runs with 6 turns show 86–89% reuse (5 reads per Turn-1 write). Cache writes ARE amortized within a run — the problem is the absolute write volume (~47K avg) is high due to tool schema overhead.

**Cache cost vs benefit:** At 47K avg writes × $3.75/M = $0.18/write. Those same 47K tokens read back over 5 turns × $0.30/M = $0.07. Without caching they'd cost $3.00/M × 47K × 5 turns = $0.71. **Caching saves $0.46/run** — highly justified. The write cost is unavoidable given the context size.

**Cross-run caching:** Not reliable — Security Guard runs are spaced >5 minutes apart (20 runs / 7 days ≈ 2.8/day), so each run is a cold cache start. Runs with low cache_write (21K) likely benefit from warm cache when two PRs open in quick succession.

---

## Expected Impact

| Metric | Current | Projected | Savings |
|--------|---------|-----------|---------|
| Total tokens/run (avg) | 187K | ~140K | **–25%** |
| Cost/run (avg) | $0.22 | ~$0.15 | **–32%** |
| Outlier cost cap | $0.68+ | ≤$0.40 | **–41%** |
| Period cost (20 runs) | $4.47 | ~$3.00 | **–33%** |
| LLM turns (avg) | 4.8 | ~3.5 | **–27%** |
| Non-success rate | 40% | ~15% | **–25pp** |

> Projections assume: 30% of PRs skip analysis (rec #2), max-turns enforced at 8 (rec #1), and output verbosity reduced (rec #3).

---

## Implementation Checklist

- [ ] Lower `max-turns` from 25 to 8 in `security-guard.md` frontmatter
- [ ] Add `Check security relevance` pre-step to `security-guard.md`
- [ ] Add security relevance variable reference at top of agent prompt
- [ ] Add conciseness constraint to "Output Format" section
- [ ] Recompile: `gh aw compile .github/workflows/security-guard.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Open a PR and verify CI passes
- [ ] Compare token usage on next 5 runs vs baseline ($0.22 avg)




> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/24312479338/agentic_workflow) · ● 1.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Claude Token Optimization2026-04-12 — security-guard #1939

Target Workflow: `security-guard`

Current Configuration

Cost Breakdown (per run)

Recommendations

1. 🔴 Lower `max-turns` from 25 to 8

2. 🔴 Add security-relevance pre-step + early-exit instruction

3. 🟡 Add conciseness constraint to reduce output tokens on multi-turn runs

4. 🟡 Investigate and resolve the error/cancellation rate

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Tools loaded	`github: toolsets: [pull_requests, repos]` (~10–15 tools)
Tools actually used	`github_pull_request_read`, `github_get_file_contents`, `github_list_pull_requests`
Network groups	`github` only
Pre-agent steps	✅ Yes — fetches PR diff (8KB cap)
Post-agent steps	❌ No
Prompt size	5,371 bytes / 712 words
max-turns	25 (dangerously high)
timeout-minutes	10

Cost Component	Tokens (avg)	Rate	Cost/run	% of Total
Cache write	~47K	$3.75/M	~$0.176	75%
Output	~4.6K	$15.00/M	~$0.069	14%
Cache read	~165K	$0.30/M	~$0.050	11%
Input (net-new)	~7	$3.00/M	~$0.000	~0%

Run	Turns	Cache Read	Cache Write	Output	Cost	Reuse%
§24289182705	4	105K	21K	1.8K	$0.14	84%
§24289175288	6	171K	46K	2.7K	$0.26	79%
§24290111812	11	409K	120K	17.6K	$0.68	77%
§24290215023	5	178K	45K	7.0K	$0.33	80%
§24291039056	—	81K	42K	1.0K	—	66%
§24291085402	5	150K	25K	3.2K	$0.18	86%
§24292050150	6	185K	53K	5.1K	$0.33	78%
§24292134863	6	191K	23K	3.1K	$0.19	89%
§24292329640	5	180K	55K	5.5K	$0.34	77%
§24310558551	4	81K	43K	2.0K	$0.22	65%
§24312346928	4	84K	45K	1.4K	$0.22	65%

Metric	Current	Projected	Savings
Total tokens/run (avg)	187K	~140K	–25%
Cost/run (avg)	$0.22	~$0.15	–32%
Outlier cost cap	$0.68+	≤$0.40	–41%
Period cost (20 runs)	$4.47	~$3.00	–33%
LLM turns (avg)	4.8	~3.5	–27%
Non-success rate	40%	~15%	–25pp

⚡ Claude Token Optimization2026-04-12 — security-guard #1939

Description

Target Workflow: security-guard

Current Configuration

Cost Breakdown (per run)

Recommendations

1. 🔴 Lower max-turns from 25 to 8

2. 🔴 Add security-relevance pre-step + early-exit instruction

3. 🟡 Add conciseness constraint to reduce output tokens on multi-turn runs

4. 🟡 Investigate and resolve the error/cancellation rate

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `security-guard`

1. 🔴 Lower `max-turns` from 25 to 8