-
Notifications
You must be signed in to change notification settings - Fork 295
Closed
Description
Exploratory testing of the audit and logs MCP tools on 2026-03-13 revealed a consistent metric accuracy bug: safe-output-items.jsonl is always empty and SafeItemsCount is always 0 even when safe output actions (e.g., add_labels) are clearly executed and recorded in safeoutputs.jsonl.
Evidence
Across 10 workflow runs checked today, every run with safe output activity had:
safeoutputs.jsonl- populated (contains actual safe output actions)safe-output-items.jsonl- empty (0B)SafeItemsCountinrun_summary.json- always 0
Specific reproduction case
Run: §23074558467 (Auto-Triage Issues, success)
safeoutputs.jsonl (populated):
{"integrity":"high","item_number":20875,"labels":["bug","cli"],"secrecy":"public","type":"add_labels"}
```
`safe-output-items.jsonl` (**empty - bug**):
```
(no content)run_summary.json metrics (SafeItemsCount wrong):
{
"SafeItemsCount": 0,
"Turns": 0,
"ToolCalls": null
}The agent_output.json file correctly contains the validated item — confirming the safe output was executed:
{"items":[{"integrity":"high","item_number":20875,"labels":["bug","cli"],"secrecy":"public","type":"add_labels"}],"errors":[]}All Affected Runs
| Run | safeoutputs.jsonl | safe-output-items.jsonl | SafeItemsCount |
|---|---|---|---|
| run-23074558467 | 103B | 0B | 0 |
| run-23072165186 | 219B | 0B | 0 |
| run-23059678935 | 318B | 0B | 0 |
| run-23064798754 | 258B | 0B | 0 |
| run-23059960694 | 330B | 0B | 0 |
| run-23073052777 | 106B | 0B | 0 |
| run-23071864635 | 106B | 0B | 0 |
| run-23070912219 | 106B | 0B | 0 |
| run-23051588927 | 131B | 0B | 0 |
| run-23070335592 | 106B | 0B | 0 |
Related Metrics Also Broken
Turns: always 0 — reported as "Completed in 0 turns" inaudittool even when 102k tokens used. May be Copilot engine-specific, but misleading.ToolCalls: always null — individual MCP tool call details are never stored inrun_summary.json, only aggregate summaries inmcp_tool_usage.
Impact
gh aw auditreports incorrect safe output activity (shows no safe outputs when items were created)gh aw logssummarytotal_safe_items: 0is always wrong- Audit trail completeness is compromised — users and monitoring tools cannot rely on safe output counts
- Severity: High — core observability/compliance feature
Steps to Reproduce
- Trigger any workflow that uses safeoutputs (e.g., Auto-Triage Issues)
- After completion, run
gh aw audit (run_id) - Observe
SafeItemsCount: 0in metrics - Check
safe-output-items.jsonlin logs — it will be empty despitesafeoutputs.jsonlcontaining entries
Expected Behavior
safe-output-items.jsonlshould be populated with safe output items that were executedSafeItemsCountshould reflect the actual number of safe output actions takenauditshould report these items in its analysis
Environment
- Repository: github/gh-aw
- Workflow Run ID: §23075154599
- Testing Date: 2026-03-13
- CLI Version:
2a255d2(fromrun_summary.json)
Generated by Daily CLI Tools Exploratory Tester · ◷
- expires on Mar 21, 2026, 12:04 AM UTC
Reactions are currently unavailable