Summary
Daily exploratory testing of the audit, logs, and compile MCP tools on 2026-03-22. Two significant issues found: 10 workflow compilation failures and MCP tool call metrics are not captured (always 0).
✅ What Worked Correctly
logs tool: Basic download, engine filter (claude, copilot), workflow-name filter, date range, count limit — all functional
logs edge case: Non-existent workflow returns a helpful error with suggestions to check the status tool
logs old date: Returns empty results (not an error) for queries with no data ✅
audit successful run: Full report with jobs, tool usage, firewall analysis, created items — all populated correctly (tested: Issue Monster §23415308371, Sergo §23413720096)
audit with URL: Supports full GitHub Actions run URLs as input ✅
compile individual workflow: Compiles issue-monster.md successfully ✅
compile strict=false: Correctly compiles workflows with internal fields when strict mode disabled ✅
- Log files: 163 directories, 415 MB of logs downloaded — structure intact; agent-stdio.log, aw_info.json, detection.log, run_summary.json all present
🔴 Issue 1: 10 Workflows Fail to Compile (Critical)
Running compile (default strict: true) against all 177 workflows reveals 10 compilation failures:
4 smoke-* workflows: `sandbox.mcp.container` blocked in strict mode
smoke-copilot.md: strict mode: 'sandbox.mcp.container' is not allowed because it is an
internal implementation detail. Remove 'sandbox.mcp.container' or set 'strict: false'
```
Affected files:
- `smoke-copilot.md`
- `smoke-codex.md`
- `smoke-copilot-arm.md`
- `smoke-claude.md`
These workflows use `sandbox.mcp.container: "ghcr.io/github/gh-aw-mcpg"` but are missing `strict: false` in their frontmatter. Compiling each individually with `strict: false` succeeds. **Fix**: add `strict: false` to these four smoke workflow frontmatters.
</details>
<details>
<summary><b>6 workflows: Missing `vulnerability-alerts: read` permission for `dependabot` toolset</b></summary>
```
Missing required permissions for GitHub toolsets:
- vulnerability-alerts: read (required by dependabot)
Affected files:
daily-firewall-report.md
deep-report.md
dependabot-go-checker.md
github-mcp-structural-analysis.md
github-mcp-tools-report.md
security-review.md
These workflows use the dependabot GitHub toolset but don't declare vulnerability-alerts: read in their permissions block. This likely became a required permission in a recent update. Fix: add vulnerability-alerts: read to each workflow's permissions block, or remove the dependabot toolset if not needed.
🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)
In both logs and audit output, all MCP server tool calls show input_size: 0, output_size: 0, and avg_duration: "0ns" — regardless of the server (github, safeoutputs, serena) or tool called.
Example from audit of run 23413720096:
{
"server_name": "serena",
"tool_name": "onboarding",
"call_count": 1,
"total_input_size": 0,
"total_output_size": 0,
"max_input_size": 0,
"max_output_size": 0
}
```
By contrast, native tool calls (bash, Read, etc.) do show real input/output sizes. This is a tracking gap: users cannot tell how much data was transferred to/from MCP servers, making it harder to diagnose performance issues or unexpected MCP behavior.
---
### 🟡 Issue 3: Audit of Invalid Run ID Returns Opaque Error (Low)
```
McpError: MCP error -32603: calling "tools/call": failed to fetch run metadata
When auditing a non-existent run ID (e.g., 99999999999), the error code -32603 (generic internal error) and message "failed to fetch run metadata" doesn't clearly indicate the run ID was not found. A user-facing message like "Run 99999999999 not found — verify the run ID exists in this repository" would be more helpful.
📊 Test Metrics
| Phase |
Tests Run |
Pass |
Fail |
| Phase 1: Discovery |
2 |
2 |
0 |
| Phase 2: Logs |
7 |
7 |
0 |
| Phase 3: Audit |
4 |
4 |
0 |
| Phase 4: Compile |
4 |
3 |
1 |
| Phase 5: Edge Cases |
4 |
3 |
1 |
Resources:
- 177 workflows discovered, 167 compile successfully
- 163 log directories, 415 MB of log data
- Logs download speed: ~10 runs in < 10s ✅
- Audit duration: ~5s per run ✅
- Compile (all 177): ~5s ✅
References:
Generated by Daily CLI Tools Exploratory Tester · ◷
Summary
Daily exploratory testing of the
audit,logs, andcompileMCP tools on 2026-03-22. Two significant issues found: 10 workflow compilation failures and MCP tool call metrics are not captured (always 0).✅ What Worked Correctly
logstool: Basic download, engine filter (claude,copilot), workflow-name filter, date range, count limit — all functionallogsedge case: Non-existent workflow returns a helpful error with suggestions to check thestatustoollogsold date: Returns empty results (not an error) for queries with no data ✅auditsuccessful run: Full report with jobs, tool usage, firewall analysis, created items — all populated correctly (tested: Issue Monster §23415308371, Sergo §23413720096)auditwith URL: Supports full GitHub Actions run URLs as input ✅compileindividual workflow: Compilesissue-monster.mdsuccessfully ✅compilestrict=false: Correctly compiles workflows with internal fields when strict mode disabled ✅🔴 Issue 1: 10 Workflows Fail to Compile (Critical)
Running
compile(defaultstrict: true) against all 177 workflows reveals 10 compilation failures:4 smoke-* workflows: `sandbox.mcp.container` blocked in strict mode
Affected files:
daily-firewall-report.mddeep-report.mddependabot-go-checker.mdgithub-mcp-structural-analysis.mdgithub-mcp-tools-report.mdsecurity-review.mdThese workflows use the
dependabotGitHub toolset but don't declarevulnerability-alerts: readin their permissions block. This likely became a required permission in a recent update. Fix: addvulnerability-alerts: readto each workflow'spermissionsblock, or remove thedependabottoolset if not needed.🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)
In both
logsandauditoutput, all MCP server tool calls showinput_size: 0,output_size: 0, andavg_duration: "0ns"— regardless of the server (github, safeoutputs, serena) or tool called.Example from
auditof run 23413720096:{ "server_name": "serena", "tool_name": "onboarding", "call_count": 1, "total_input_size": 0, "total_output_size": 0, "max_input_size": 0, "max_output_size": 0 } ``` By contrast, native tool calls (bash, Read, etc.) do show real input/output sizes. This is a tracking gap: users cannot tell how much data was transferred to/from MCP servers, making it harder to diagnose performance issues or unexpected MCP behavior. --- ### 🟡 Issue 3: Audit of Invalid Run ID Returns Opaque Error (Low) ``` McpError: MCP error -32603: calling "tools/call": failed to fetch run metadataWhen auditing a non-existent run ID (e.g.,
99999999999), the error code-32603(generic internal error) and message "failed to fetch run metadata" doesn't clearly indicate the run ID was not found. A user-facing message like "Run 99999999999 not found — verify the run ID exists in this repository" would be more helpful.📊 Test Metrics
Resources:
References: