[cli-tools-test] Daily CLI Tools Testing: 10 workflow compilation failures + MCP metrics gap [2026-03-22]

### Summary

Daily exploratory testing of the `audit`, `logs`, and `compile` MCP tools on 2026-03-22. Two significant issues found: **10 workflow compilation failures** and **MCP tool call metrics are not captured** (always 0).

---

### ✅ What Worked Correctly

- **`logs` tool**: Basic download, engine filter (`claude`, `copilot`), workflow-name filter, date range, count limit — all functional
- **`logs` edge case**: Non-existent workflow returns a helpful error with suggestions to check the `status` tool
- **`logs` old date**: Returns empty results (not an error) for queries with no data ✅
- **`audit` successful run**: Full report with jobs, tool usage, firewall analysis, created items — all populated correctly (tested: Issue Monster [§23415308371](https://github.com/github/gh-aw/actions/runs/23415308371), Sergo [§23413720096](https://github.com/github/gh-aw/actions/runs/23413720096))
- **`audit` with URL**: Supports full GitHub Actions run URLs as input ✅
- **`compile` individual workflow**: Compiles `issue-monster.md` successfully ✅
- **`compile` strict=false**: Correctly compiles workflows with internal fields when strict mode disabled ✅
- **Log files**: 163 directories, 415 MB of logs downloaded — structure intact; agent-stdio.log, aw_info.json, detection.log, run_summary.json all present

---

### 🔴 Issue 1: 10 Workflows Fail to Compile (Critical)

Running `compile` (default `strict: true`) against all 177 workflows reveals **10 compilation failures**:

<details>
<summary><b>4 smoke-* workflows: `sandbox.mcp.container` blocked in strict mode</b></summary>

````
smoke-copilot.md: strict mode: 'sandbox.mcp.container' is not allowed because it is an 
  internal implementation detail. Remove 'sandbox.mcp.container' or set 'strict: false'
```

Affected files:
- `smoke-copilot.md`
- `smoke-codex.md`
- `smoke-copilot-arm.md`
- `smoke-claude.md`

These workflows use `sandbox.mcp.container: "ghcr.io/github/gh-aw-mcpg"` but are missing `strict: false` in their frontmatter. Compiling each individually with `strict: false` succeeds. **Fix**: add `strict: false` to these four smoke workflow frontmatters.

</details>

<details>
<summary><b>6 workflows: Missing `vulnerability-alerts: read` permission for `dependabot` toolset</b></summary>

```
Missing required permissions for GitHub toolsets:
  - vulnerability-alerts: read (required by dependabot)
````

Affected files:
- `daily-firewall-report.md`
- `deep-report.md`
- `dependabot-go-checker.md`
- `github-mcp-structural-analysis.md`
- `github-mcp-tools-report.md`
- `security-review.md`

These workflows use the `dependabot` GitHub toolset but don't declare `vulnerability-alerts: read` in their permissions block. This likely became a required permission in a recent update. **Fix**: add `vulnerability-alerts: read` to each workflow's `permissions` block, or remove the `dependabot` toolset if not needed.

</details>

---

### 🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)

In both `logs` and `audit` output, all MCP server tool calls show `input_size: 0`, `output_size: 0`, and `avg_duration: "0ns"` — regardless of the server (github, safeoutputs, serena) or tool called.

**Example from `audit` of run 23413720096:**
````json
{
  "server_name": "serena",
  "tool_name": "onboarding",
  "call_count": 1,
  "total_input_size": 0,
  "total_output_size": 0,
  "max_input_size": 0,
  "max_output_size": 0
}
```

By contrast, native tool calls (bash, Read, etc.) do show real input/output sizes. This is a tracking gap: users cannot tell how much data was transferred to/from MCP servers, making it harder to diagnose performance issues or unexpected MCP behavior.

---

### 🟡 Issue 3: Audit of Invalid Run ID Returns Opaque Error (Low)

```
McpError: MCP error -32603: calling "tools/call": failed to fetch run metadata
````

When auditing a non-existent run ID (e.g., `99999999999`), the error code `-32603` (generic internal error) and message "failed to fetch run metadata" doesn't clearly indicate the run ID was not found. A user-facing message like "Run 99999999999 not found — verify the run ID exists in this repository" would be more helpful.

---

### 📊 Test Metrics

| Phase | Tests Run | Pass | Fail |
|-------|-----------|------|------|
| Phase 1: Discovery | 2 | 2 | 0 |
| Phase 2: Logs | 7 | 7 | 0 |
| Phase 3: Audit | 4 | 4 | 0 |
| Phase 4: Compile | 4 | 3 | 1 |
| Phase 5: Edge Cases | 4 | 3 | 1 |

**Resources:**
- 177 workflows discovered, 167 compile successfully
- 163 log directories, 415 MB of log data
- Logs download speed: ~10 runs in < 10s ✅
- Audit duration: ~5s per run ✅
- Compile (all 177): ~5s ✅

**References:**
- [§23415460310](https://github.com/github/gh-aw/actions/runs/23415460310) — this test run
- [§23415308371](https://github.com/github/gh-aw/actions/runs/23415308371) — Issue Monster (tested successful audit)
- [§23413720096](https://github.com/github/gh-aw/actions/runs/23413720096) — Sergo Claude (tested complex audit)




> Generated by [Daily CLI Tools Exploratory Tester](https://github.com/github/gh-aw/actions/runs/23415460310) · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-cli-tools-tester%22&type=issues)
> - [x] expires  on Mar 29, 2026, 11:58 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cli-tools-test] Daily CLI Tools Testing: 10 workflow compilation failures + MCP metrics gap [2026-03-22] #22336

Summary

✅ What Worked Correctly

🔴 Issue 1: 10 Workflows Fail to Compile (Critical)

🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)

📊 Test Metrics

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Phase	Tests Run	Pass	Fail
Phase 1: Discovery	2	2	0
Phase 2: Logs	7	7	0
Phase 3: Audit	4	4	0
Phase 4: Compile	4	3	1
Phase 5: Edge Cases	4	3	1

[cli-tools-test] Daily CLI Tools Testing: 10 workflow compilation failures + MCP metrics gap [2026-03-22] #22336

Description

Summary

✅ What Worked Correctly

🔴 Issue 1: 10 Workflows Fail to Compile (Critical)

🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)

📊 Test Metrics

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions