Skip to content

⚡ Copilot Token Optimization2026-04-10 — Secret Digger (Copilot) #1879

@github-actions

Description

@github-actions

Target Workflow: secret-digger-copilot

Source report: #1878
Estimated cost per run (failure): $5.03 | per run (success): $0.22–$0.57
Weighted avg cost per run: ~$1.87 (7 failures + 15 successes/day)
Total estimated daily cost: ~$41.18 (22 runs/day)
Cache hit rate: 33–47%
LLM turns (failure runs): ~31 requests | (success runs): 2–5 requests
Model: claude-sonnet-4.6 via copilot provider

⚠️ Failures dominate cost: 7 failure runs × $5.03 = ~$35.21 of ~$41.18 daily spend (86%). Success runs average just $0.40. The agent runs until the 30-minute timeout fires, grinding through all 10 investigation areas without a turn limit.


Current Configuration

Setting Value
Tools loaded 2 — bash, cache-memory (plus safeoutputs MCP tools)
Network groups defaults only
Pre-agent steps None
Prompt size ~7,385 bytes total (secret-audit.md: 6,274 + workflow: 623 + shared: 488)
Max turns Not set — agent runs until 30-minute job timeout
Run ID in system prompt ✅ yes — injected into <system> block, breaks cross-run prefix caching
Run ID in user message ✅ yes — duplicate, in secret-digger-copilot.md user section
Engine config engine: copilot (simple string — no max-turns support yet applied)

Failure vs. Success Run Anatomy

Metric Failure Run Success Run
Requests 31 2–5
Total tokens 2,579K 103K–279K
Cost $5.03 $0.22–$0.57
Agent duration ~7 min ~2 min
Detection job runs (finds threat → fails) skipped

The failure run (ID 24213936643) shows the agent job ran 21:21→21:28 (7 min, success), then the detection job ran 21:28→21:30 (2 min, failure because the detection agent confirmed a threat). All 31 requests are distributed across both agent and detection invocations.


Recommendations

1. Add max-turns limit to the engine config

Estimated savings: ~$25–28/day (~65–68% of daily Secret Digger spend)

The workflow currently uses engine: copilot (simple string), which compiles without a turn cap. With no limit, failure runs spend 31 turns on exhaustive investigation. Capping at 8 turns limits cost to ~26% of current failure-run tokens.

Estimated per-failure-run savings: $5.03 × 74% ≈ $3.72 → 7 failures/day × $3.72 ≈ $26/day

Change in .github/workflows/secret-digger-copilot.md:

-engine: copilot
+engine:
+  id: copilot
+  max-turns: 8

The value 8 provides enough turns for:

  • Turn 1: Load cache-memory state, select one investigation area
  • Turns 2–6: Execute bash commands and observe results
  • Turn 7: Update cache-memory with findings
  • Turn 8: Call noop or create_issue to close the run

Note: The detection job also has no max-turns and uses --allow-all-tools. Review detect-inference-error and the detection execution step for the same pattern. The detection Copilot invocation in the detection job (line ~1114 in the lock file) has a 20-minute timeout but no turn cap — consider adding the same limit there if you have control over detection template.


2. Remove the duplicate Run ID from the user message section

Estimated savings: ~5% cache improvement across all runs

secret-digger-copilot.md includes - Run ID: $\{\{ github.run_id }} in its user-facing section. The run ID is already injected into the <github-context> block inside <system> by the gh-aw framework. The duplicate serves no purpose and slightly inflates the unique-per-run portion of the prompt.

Additionally, the - Workflow: and - Engine: lines in the user section are fully static and already available to the agent from the system context.

Change in .github/workflows/secret-digger-copilot.md:

 ## Current Run Context
 
-- Repository: $\{\{ github.repository }}
-- Run ID: $\{\{ github.run_id }}
-- Workflow: $\{\{ github.workflow }}
-- Engine: GitHub Copilot
-- Runner: Check your environment carefully
+- Runner: Check your environment carefully

This removes 4 lines that are either already in <system> (repository, run_id, workflow) or static (engine). The remaining user message becomes the single-line instruction: Begin your investigation now. Be creative, be thorough, and find those secrets!


3. Trim shared/secret-audit.md investigation instructions

Estimated savings: 15–25K tokens per failure run ($0.10–$0.20/failure run)

The shared/secret-audit.md file is 6,274 bytes. It loads into every turn's context (system prompt is sent with every request). The verbose Investigation Areas section lists 10 numbered areas with multi-line descriptions — most of which are investigated based on cache-memory state, not the listing itself.

Specific cuts (saves ~1,800–2,500 chars ≈ 450–625 tokens per turn, ~14–19K tokens over 31 turns):

  1. Investigation Workflow steps 1–4 (~600 chars) can be condensed — the cache-memory tool prompt already explains where files live. Replace with:
-## Investigation Workflow
-
-1. **Load Previous State:**
-   - Read `/tmp/gh-aw/cache-memory/techniques.json` to see what you've tried
-   - Read `/tmp/gh-aw/cache-memory/findings.log` for previous discoveries
-   - Read `/tmp/gh-aw/cache-memory/areas_checked.txt` for checked locations
-
-2. **Select Techniques:**
-   - Choose at least 50% NEW techniques not in techniques.json
-   - Prioritize unexplored areas from areas_checked.txt
-   - Try creative combinations of multiple techniques
-
-3. **Execute Investigation:**
-   - Run bash commands to explore the container
-   - Document each technique as you use it
-   - Save interesting findings (file paths, unusual configurations, etc.)
-
-4. **Update Cache:**
-   - Append new techniques to techniques.json
-   - Log findings to findings.log
-   - Update areas_checked.txt with new locations explored
+## Investigation Workflow
+
+1. Read cache-memory state (techniques.json, findings.log, areas_checked.txt).
+2. Choose ≥50% NEW techniques. Prioritize unexplored areas.
+3. Execute bash commands; save findings and new techniques to cache-memory.
  1. Security Research Guidelines section (~200 chars) is fully covered by the MISSION statement — remove it or reduce to one line.

  2. Background Knowledge Tracking section (~250 chars) — reduce to one sentence since the cache-memory tool prompt already describes the directory.


4. Investigate root cause of the 32% failure rate

Estimated savings: ~$20–25/day if failure rate drops to <10%

The report notes 7/22 runs (32%) fail daily. Each failure costs $5.03 vs $0.40 for success. Reducing failures from 7/day to 2/day would save 5 × $4.63 = **$23/day**.

Investigation steps:

  1. Download failure run artifacts: ./scripts/download-latest-artifact.sh 24213936643
  2. Check agent-stdio.log for what the agent found that triggered create_issue
  3. Check detection/detection.log for why parse_threat_detection_results concluded "threat found"
  4. Determine if the findings are true security boundary violations or false positives

If these are recurring false positives (e.g., the agent files an issue every time it sees a known-benign file), the fix would be in the investigation prompt — telling the agent to cross-reference against a known-baseline before filing. This would eliminate the detection job trigger and keep runs in the 2–5 request success profile.


Expected Impact

Metric Current After Rec. 1+2+3 Savings
Tokens/failure run 2,579K ~680K −74%
Cost/failure run $5.03 ~$1.36 −73%
Daily cost (22 runs) ~$41.18 ~$16.18 −$25/day (−61%)
Cache hit rate 33–47% 38–52% (est.) +5–10%
LLM turns (failure) 31 ≤8 −74%
Session time (failure) ~9 min total ~3–4 min total −55%

Rec. 1 alone accounts for ~$25/day of the $25/day total savings. Recs. 2+3 add modest additional improvement.


Implementation Checklist

  • [Highest impact] Change engine: copilot to engine: {id: copilot, max-turns: 8} in secret-digger-copilot.md
  • Remove duplicate Run ID, Workflow, Engine, Repository lines from user message in secret-digger-copilot.md
  • Condense Investigation Workflow steps 1–4 in shared/secret-audit.md (shared with other Secret Digger variants)
  • Remove Security Research Guidelines section from shared/secret-audit.md
  • Investigate whether the 32% daily failure rate is caused by true positives or repeating false positives (see Rec. 4)
  • Recompile: gh aw compile .github/workflows/secret-digger-copilot.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Trigger a manual run and compare request count vs. baseline (target: ≤8 requests even on failure path)
  • After 24h, re-run the token usage analyzer and verify daily cost drops below ~$16

Generated from report #1878 — 2026-04-10 daily analysis

Generated by Daily Copilot Token Optimization Advisor · ● 2.2M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions