[copilot-token-optimizer] Daily Syntax Error Quality Check — 9.3M tokens/run from blocked compile tool

### 🔍 Optimization Target: Daily Syntax Error Quality Check

**Selected because**: Highest token consumer in current audit snapshot (9.35M/run, 129 turns avg)  
**Analysis period**: 2026-04-04 to 2026-04-06  
**Runs analyzed**: 2 runs (consistent pattern across both)

---

### 📊 Token Usage Profile

| Metric | Value |
|---|---|
| Total tokens (7d) | 9,349,044 |
| Avg tokens/run | ~9,292,016 |
| Avg turns/run | 130 |
| Cache efficiency | 49.6% |
| Input tokens | 9,305,247 |
| Output tokens | 43,797 |
| Input:Output ratio | 212:1 |
| Avg tokens/turn | ~72,000 |
| Action minutes | 21–22 min |

---

### 🚨 Root Cause: `gh aw compile *` Blocks Absolute Paths

**Both analyzed runs show the exact same failure pattern:**

The workflow allows `"gh aw compile *"` as a bash tool, but the agent copies test files to `/tmp/` and then tries to compile them with absolute paths like `gh aw compile /tmp/test-syntax-a.md`. The glob `*` does not match paths containing `/`, so every compile attempt is rejected:

```
✗ Test Category A - invalid YAML syntax (shell)
  └ Permission denied and could not request permission from user

✗ Test Category B - invalid engine name typo (shell)
  └ Permission denied and could not request permission from user

✗ Test Category C - negative timeout (shell)
  └ Permission denied and could not request permission from user
```

This triggers a costly fallback: the agent reads compiler source code directly (64 source-reading turns) to evaluate error message quality without ever running the compiler. The result is 129 wasted turns instead of the ~15–20 a functioning workflow would require.

#### Turn Breakdown (2/2 runs)

| Category | Turns | Notes |
|---|---|---|
| Source code analysis (fallback) | 64 | Reading compiler, schema, console files |
| Actual work (selection, editing, reporting) | 40 | The only productive turns |
| Binary/extension search | 11 | Trying to locate `gh-aw` before giving up |
| Firewall/sandbox debugging | 8 | Investigating why compile was blocked |
| Compile attempts (all denied) | 5 | The root cause |
| Other | 11 | Workspace inspection |
| **Total** | **139** | vs ~15–20 expected |

---

### 🔧 Recommendations

#### 1. Fix the `gh aw compile` tool pattern — Est. savings: **~7.9M tokens/run (84%)**

**Evidence**: 5 compile attempts in each of 2 runs, all blocked with "Permission denied". Triggered 64-turn source-code fallback.

The workflow copies test files to `/tmp/` but only `gh aw compile *` (relative-path glob) is allowed. Fix: add an explicit `/tmp/` path pattern.

**Action**: In `daily-syntax-error-quality.md`, change the bash tools config:

```yaml
# Before
tools:
  bash:
    - "gh aw compile *"

# After
tools:
  bash:
    - "gh aw compile *"
    - "gh aw compile /tmp/*.md"
```

Or consolidate to compile inside a temp subdirectory the agent can reach with relative paths:

```yaml
tools:
  bash:
    - "gh aw compile /tmp/syntax-error-tests/*.md"
```

**Expected result**: Agent completes testing in ~15–20 turns (~1.1–1.4M tokens) instead of ~130 turns (~9.3M tokens).

---

#### 2. Remove unused GitHub toolsets — Est. savings: **~100–200K tokens/run**

**Evidence**: `tool_breadth: narrow` in behavior fingerprint. Exactly 1 MCP call across the entire run (the final `noop`). The workflow does local file operations and compile — it has no need for GitHub issue/PR/repo tools.

**Action**: Remove the `github` toolset from the workflow:

```yaml
# Before
tools:
  github:
    toolsets:
      - default
  bash:
    - ...

# After
tools:
  bash:
    - ...
```

The GitHub tool schema (all tool definitions for `default` toolset) is loaded into the system prompt on every turn. Removing it saves ~1–2K tokens × 130 turns = 130–260K tokens. At 20 turns post-fix: 20–40K tokens.

---

#### 3. Trim the prompt verbose examples — Est. savings: **~40–60K tokens/run (post-fix)**

**Evidence**: The workflow prompt is 704 lines / 6,118 tokens with 56 code blocks and 48 section headers. The "Issue Structure" section alone contains a full 400-line example markdown issue template with nested code blocks. Much of this is example scaffolding the agent largely ignores (it ended up calling `noop` anyway).

**Action**: Replace the inline example issue template (Phase 6 section) with a concise reference:

```markdown
## Phase 6: Create Issue with Suggestions

Only create if average score < 70 or any test case scores < 55.

Use the standard issue format:
- h3 sections for Summary, Test Results, Recommendations
- Collapsible `<details>` for per-test-case details
- Priority table with estimated impact

Include: test configs, compiler outputs, scores, strengths/weaknesses.
```

Estimated savings: ~3K tokens/turn × 20 turns = 60K tokens/run after fix. Lower priority but reduces maintenance burden.

---

#### 4. Pre-copy test variants in a frontmatter `run:` step — Est. savings: **~150–200K tokens/run (post-fix)**

**Evidence**: The agent spends 11 turns selecting workflows and copying/editing them. This is deterministic data-gathering that could be moved to a pre-agent bash step.

**Action**: Add a step that pre-selects 3 diverse workflows and creates error variants:

```yaml
steps:
  - name: Prepare test cases
    run: |
      mkdir -p /tmp/syntax-error-tests
      # Select 3 diverse workflows (non-daily, non-test)
      WORKFLOWS=$(find .github/workflows -name '*.md' ! -name 'daily-*.md' ! -name '*-test.md' | shuf | head -3)
      echo "$WORKFLOWS" > /tmp/syntax-error-tests/selected.txt
      cat /tmp/syntax-error-tests/selected.txt
```

Then pass the list to the agent via environment or inject into the prompt. This saves 8–11 turns of shell exploration.

---

<details>
<summary><b>Tool Usage Matrix</b></summary>

| Tool | Configured? | Used in N/2 runs | Avg calls/run | Recommendation |
|---|---|---|---|---|
| `shell(find *.md)` | ✅ | 2/2 | ~3 | Keep |
| `shell(cat .github/workflows/*.md)` | ✅ | 2/2 | ~5 | Keep |
| `shell(head -n * *.md)` | ✅ | 2/2 | ~2 | Keep |
| `shell(cp .github/workflows/*.md /tmp/*.md)` | ✅ | 0/2 | 0 | Revise — agent used `write` tool instead |
| `shell(cat /tmp/*.md)` | ✅ | 2/2 | ~4 | Keep |
| `shell(gh aw compile *)` | ✅ | 2/2 (denied) | ~5 | **Fix pattern** — all calls blocked |
| `shell(grep)`, `shell(ls)`, etc. | ✅ | 2/2 | ~10 | Keep |
| `shell(yq)` | ✅ | 0/2 | 0 | Consider removing |
| `github: toolsets: default` | ✅ | 0/2 | 0 | **Remove** — never used |
| `safeoutputs` | ✅ | 2/2 (noop) | 1 | Keep |

</details>

<details>
<summary><b>Audited Runs Detail</b></summary>

| Run | Date | Tokens | Turns | Conclusion | Cache Efficiency |
|---|---|---|---|---|---|
| [§24028699317](https://github.com/github/gh-aw/actions/runs/24028699317) | 2026-04-06 | 9,349,044 | 129 | ✅ success | 49.6% |
| [§23977056913](https://github.com/github/gh-aw/actions/runs/23977056913) | 2026-04-04 | 9,234,988 | 131 | ✅ success | ~49% (est.) |

**Pattern**: Both runs followed identical failure path — compile blocked → binary search → firewall debug → source code analysis fallback → noop.

</details>

---

### 📈 Expected Impact Summary

| Recommendation | Est. Savings/Run | Confidence | Effort |
|---|---|---|---|
| Fix `gh aw compile` tool pattern | ~7.9M tokens (84%) | High | Low |
| Remove GitHub toolsets | ~150K tokens | High | Low |
| Trim prompt examples | ~60K tokens (post-fix) | Medium | Medium |
| Pre-compute test variants | ~200K tokens (post-fix) | Medium | Medium |
| **Total (all applied)** | **~8.3M tokens (~89%)** | | |

---

### ⚠️ Caveats

- Analysis is based on 2 runs over 2 days; however the pattern is identical in both, indicating a structural issue, not a fluke
- Token counts per turn grow with conversation context length — fixing the compile tool will have compounding benefit beyond the linear estimate
- The source-code analysis fallback did produce a valid output (82.7/100 score, noop) — correctness is preserved with the current workaround, but at extreme cost

**References:**
- [§24028699317](https://github.com/github/gh-aw/actions/runs/24028699317) — Apr 6 run
- [§23977056913](https://github.com/github/gh-aw/actions/runs/23977056913) — Apr 4 run







> Generated by [Copilot Token Usage Optimizer](https://github.com/github/gh-aw/actions/runs/24036650732/agentic_workflow) · ● 1.9M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fcopilot-token-optimizer%22&type=issues)
> - [x] expires  on Apr 13, 2026, 3:08 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-token-optimizer] Daily Syntax Error Quality Check — 9.3M tokens/run from blocked compile tool #24911

🔍 Optimization Target: Daily Syntax Error Quality Check

📊 Token Usage Profile

🚨 Root Cause: `gh aw compile *` Blocks Absolute Paths

Turn Breakdown (2/2 runs)

🔧 Recommendations

1. Fix the `gh aw compile` tool pattern — Est. savings: ~7.9M tokens/run (84%)

2. Remove unused GitHub toolsets — Est. savings: ~100–200K tokens/run

3. Trim the prompt verbose examples — Est. savings: ~40–60K tokens/run (post-fix)

4. Pre-copy test variants in a frontmatter `run:` step — Est. savings: ~150–200K tokens/run (post-fix)

📈 Expected Impact Summary

⚠️ Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Total tokens (7d)	9,349,044
Avg tokens/run	~9,292,016
Avg turns/run	130
Cache efficiency	49.6%
Input tokens	9,305,247
Output tokens	43,797
Input:Output ratio	212:1
Avg tokens/turn	~72,000
Action minutes	21–22 min

Category	Turns	Notes
Source code analysis (fallback)	64	Reading compiler, schema, console files
Actual work (selection, editing, reporting)	40	The only productive turns
Binary/extension search	11	Trying to locate `gh-aw` before giving up
Firewall/sandbox debugging	8	Investigating why compile was blocked
Compile attempts (all denied)	5	The root cause
Other	11	Workspace inspection
Total	139	vs ~15–20 expected

Tool	Configured?	Used in N/2 runs	Avg calls/run	Recommendation
`shell(find *.md)`	✅	2/2	~3	Keep
`shell(cat .github/workflows/*.md)`	✅	2/2	~5	Keep
`shell(head -n * *.md)`	✅	2/2	~2	Keep
`shell(cp .github/workflows/.md /tmp/.md)`	✅	0/2	0	Revise — agent used `write` tool instead
`shell(cat /tmp/*.md)`	✅	2/2	~4	Keep
`shell(gh aw compile *)`	✅	2/2 (denied)	~5	Fix pattern — all calls blocked
`shell(grep)`, `shell(ls)`, etc.	✅	2/2	~10	Keep
`shell(yq)`	✅	0/2	0	Consider removing
`github: toolsets: default`	✅	0/2	0	Remove — never used
`safeoutputs`	✅	2/2 (noop)	1	Keep

Run	Date	Tokens	Turns	Conclusion	Cache Efficiency
§24028699317	2026-04-06	9,349,044	129	✅ success	49.6%
§23977056913	2026-04-04	9,234,988	131	✅ success	~49% (est.)

Recommendation	Est. Savings/Run	Confidence	Effort
Fix `gh aw compile` tool pattern	~7.9M tokens (84%)	High	Low
Remove GitHub toolsets	~150K tokens	High	Low
Trim prompt examples	~60K tokens (post-fix)	Medium	Medium
Pre-compute test variants	~200K tokens (post-fix)	Medium	Medium
Total (all applied)	~8.3M tokens (~89%)

[copilot-token-optimizer] Daily Syntax Error Quality Check — 9.3M tokens/run from blocked compile tool #24911

Description

🔍 Optimization Target: Daily Syntax Error Quality Check

📊 Token Usage Profile

🚨 Root Cause: gh aw compile * Blocks Absolute Paths

Turn Breakdown (2/2 runs)

🔧 Recommendations

1. Fix the gh aw compile tool pattern — Est. savings: ~7.9M tokens/run (84%)

2. Remove unused GitHub toolsets — Est. savings: ~100–200K tokens/run

3. Trim the prompt verbose examples — Est. savings: ~40–60K tokens/run (post-fix)

4. Pre-copy test variants in a frontmatter run: step — Est. savings: ~150–200K tokens/run (post-fix)

📈 Expected Impact Summary

⚠️ Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

🚨 Root Cause: `gh aw compile *` Blocks Absolute Paths

1. Fix the `gh aw compile` tool pattern — Est. savings: ~7.9M tokens/run (84%)

4. Pre-copy test variants in a frontmatter `run:` step — Est. savings: ~150–200K tokens/run (post-fix)