Skip to content

[copilot-token-optimizer] Daily Syntax Error Quality Check β€” 9.3M tokens/run from blocked compile toolΒ #24911

@github-actions

Description

@github-actions

πŸ” Optimization Target: Daily Syntax Error Quality Check

Selected because: Highest token consumer in current audit snapshot (9.35M/run, 129 turns avg)
Analysis period: 2026-04-04 to 2026-04-06
Runs analyzed: 2 runs (consistent pattern across both)


πŸ“Š Token Usage Profile

Metric Value
Total tokens (7d) 9,349,044
Avg tokens/run ~9,292,016
Avg turns/run 130
Cache efficiency 49.6%
Input tokens 9,305,247
Output tokens 43,797
Input:Output ratio 212:1
Avg tokens/turn ~72,000
Action minutes 21–22 min

🚨 Root Cause: gh aw compile * Blocks Absolute Paths

Both analyzed runs show the exact same failure pattern:

The workflow allows "gh aw compile *" as a bash tool, but the agent copies test files to /tmp/ and then tries to compile them with absolute paths like gh aw compile /tmp/test-syntax-a.md. The glob * does not match paths containing /, so every compile attempt is rejected:

βœ— Test Category A - invalid YAML syntax (shell)
  β”” Permission denied and could not request permission from user

βœ— Test Category B - invalid engine name typo (shell)
  β”” Permission denied and could not request permission from user

βœ— Test Category C - negative timeout (shell)
  β”” Permission denied and could not request permission from user

This triggers a costly fallback: the agent reads compiler source code directly (64 source-reading turns) to evaluate error message quality without ever running the compiler. The result is 129 wasted turns instead of the ~15–20 a functioning workflow would require.

Turn Breakdown (2/2 runs)

Category Turns Notes
Source code analysis (fallback) 64 Reading compiler, schema, console files
Actual work (selection, editing, reporting) 40 The only productive turns
Binary/extension search 11 Trying to locate gh-aw before giving up
Firewall/sandbox debugging 8 Investigating why compile was blocked
Compile attempts (all denied) 5 The root cause
Other 11 Workspace inspection
Total 139 vs ~15–20 expected

πŸ”§ Recommendations

1. Fix the gh aw compile tool pattern β€” Est. savings: ~7.9M tokens/run (84%)

Evidence: 5 compile attempts in each of 2 runs, all blocked with "Permission denied". Triggered 64-turn source-code fallback.

The workflow copies test files to /tmp/ but only gh aw compile * (relative-path glob) is allowed. Fix: add an explicit /tmp/ path pattern.

Action: In daily-syntax-error-quality.md, change the bash tools config:

# Before
tools:
  bash:
    - "gh aw compile *"

# After
tools:
  bash:
    - "gh aw compile *"
    - "gh aw compile /tmp/*.md"

Or consolidate to compile inside a temp subdirectory the agent can reach with relative paths:

tools:
  bash:
    - "gh aw compile /tmp/syntax-error-tests/*.md"

Expected result: Agent completes testing in ~15–20 turns (~1.1–1.4M tokens) instead of ~130 turns (~9.3M tokens).


2. Remove unused GitHub toolsets β€” Est. savings: ~100–200K tokens/run

Evidence: tool_breadth: narrow in behavior fingerprint. Exactly 1 MCP call across the entire run (the final noop). The workflow does local file operations and compile β€” it has no need for GitHub issue/PR/repo tools.

Action: Remove the github toolset from the workflow:

# Before
tools:
  github:
    toolsets:
      - default
  bash:
    - ...

# After
tools:
  bash:
    - ...

The GitHub tool schema (all tool definitions for default toolset) is loaded into the system prompt on every turn. Removing it saves ~1–2K tokens Γ— 130 turns = 130–260K tokens. At 20 turns post-fix: 20–40K tokens.


3. Trim the prompt verbose examples β€” Est. savings: ~40–60K tokens/run (post-fix)

Evidence: The workflow prompt is 704 lines / 6,118 tokens with 56 code blocks and 48 section headers. The "Issue Structure" section alone contains a full 400-line example markdown issue template with nested code blocks. Much of this is example scaffolding the agent largely ignores (it ended up calling noop anyway).

Action: Replace the inline example issue template (Phase 6 section) with a concise reference:

## Phase 6: Create Issue with Suggestions

Only create if average score < 70 or any test case scores < 55.

Use the standard issue format:
- h3 sections for Summary, Test Results, Recommendations
- Collapsible `<details>` for per-test-case details
- Priority table with estimated impact

Include: test configs, compiler outputs, scores, strengths/weaknesses.

Estimated savings: ~3K tokens/turn Γ— 20 turns = 60K tokens/run after fix. Lower priority but reduces maintenance burden.


4. Pre-copy test variants in a frontmatter run: step β€” Est. savings: ~150–200K tokens/run (post-fix)

Evidence: The agent spends 11 turns selecting workflows and copying/editing them. This is deterministic data-gathering that could be moved to a pre-agent bash step.

Action: Add a step that pre-selects 3 diverse workflows and creates error variants:

steps:
  - name: Prepare test cases
    run: |
      mkdir -p /tmp/syntax-error-tests
      # Select 3 diverse workflows (non-daily, non-test)
      WORKFLOWS=$(find .github/workflows -name '*.md' ! -name 'daily-*.md' ! -name '*-test.md' | shuf | head -3)
      echo "$WORKFLOWS" > /tmp/syntax-error-tests/selected.txt
      cat /tmp/syntax-error-tests/selected.txt

Then pass the list to the agent via environment or inject into the prompt. This saves 8–11 turns of shell exploration.


Tool Usage Matrix
Tool Configured? Used in N/2 runs Avg calls/run Recommendation
shell(find *.md) βœ… 2/2 ~3 Keep
shell(cat .github/workflows/*.md) βœ… 2/2 ~5 Keep
shell(head -n * *.md) βœ… 2/2 ~2 Keep
shell(cp .github/workflows/*.md /tmp/*.md) βœ… 0/2 0 Revise β€” agent used write tool instead
shell(cat /tmp/*.md) βœ… 2/2 ~4 Keep
shell(gh aw compile *) βœ… 2/2 (denied) ~5 Fix pattern β€” all calls blocked
shell(grep), shell(ls), etc. βœ… 2/2 ~10 Keep
shell(yq) βœ… 0/2 0 Consider removing
github: toolsets: default βœ… 0/2 0 Remove β€” never used
safeoutputs βœ… 2/2 (noop) 1 Keep
Audited Runs Detail
Run Date Tokens Turns Conclusion Cache Efficiency
Β§24028699317 2026-04-06 9,349,044 129 βœ… success 49.6%
Β§23977056913 2026-04-04 9,234,988 131 βœ… success ~49% (est.)

Pattern: Both runs followed identical failure path β€” compile blocked β†’ binary search β†’ firewall debug β†’ source code analysis fallback β†’ noop.


πŸ“ˆ Expected Impact Summary

Recommendation Est. Savings/Run Confidence Effort
Fix gh aw compile tool pattern ~7.9M tokens (84%) High Low
Remove GitHub toolsets ~150K tokens High Low
Trim prompt examples ~60K tokens (post-fix) Medium Medium
Pre-compute test variants ~200K tokens (post-fix) Medium Medium
Total (all applied) ~8.3M tokens (~89%)

⚠️ Caveats

  • Analysis is based on 2 runs over 2 days; however the pattern is identical in both, indicating a structural issue, not a fluke
  • Token counts per turn grow with conversation context length β€” fixing the compile tool will have compounding benefit beyond the linear estimate
  • The source-code analysis fallback did produce a valid output (82.7/100 score, noop) β€” correctness is preserved with the current workaround, but at extreme cost

References:

Generated by Copilot Token Usage Optimizer Β· ● 1.9M Β· β—·

  • expires on Apr 13, 2026, 3:08 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions