Parent epic: #22735
Depends on: #22755 (firewall policy enrichment)
Summary
Add gh aw audit report [--workflow <name>] [--last <N>] to generate comprehensive audit reports across multiple workflow runs. Designed for security reviews, compliance checks, and feeding debugging/optimization agents.
Output
MVP (Phase 3a)
- Executive summary — total runs analyzed, overall denial rate, unique domains across all runs
- Domain inventory — all domains contacted across runs, with per-run allow/deny status
- Per-run breakdown — summary row per run with key metrics
Follow-up (Phase 3b)
- Anomaly detection — runs with unusual patterns (denial spikes, new domains)
- Recommendations — frequently denied domains (candidates for allowlisting), unused allowed domains (candidates for removal)
Output: Markdown by default (suitable for security reviews, piping to files, `$GITHUB_STEP_SUMMARY`). Also supports JSON.
Usage
# Report on last 10 runs of a workflow
gh aw audit report --workflow "agent-task" --last 10
# Report on all recent runs (default: last 20)
gh aw audit report
# JSON for dashboards
gh aw audit report --workflow "agent-task" --last 5 --json
Implementation Notes
From the expert review on #22736:
Parallel Artifact Downloads
Downloading artifacts for 10-20 runs serially will be slow. Use parallel goroutines with a concurrency limit (e.g., 5). The existing downloadRunArtifacts() in logs_download.go is synchronous — needs goroutine wrapper.
Rate Limiting
Downloading 20 runs' artifacts may hit GitHub API rate limits. Implement backoff.
Cross-Run Aggregation Types
Create a new CrossRunFirewallAnalysis type rather than reusing FirewallAnalysis — semantics are different (union of domains across runs, per-run breakdown).
Workflow Filter
The --workflow flag should accept either workflow name or workflow filename and resolve via the GitHub API (workflow name != filename).
Recommendation Engine (Phase 3b)
"Frequently denied → suggest allowlist" is a simple count threshold — keep it simple. "Unused allowed → suggest removal" requires comparing the policy manifest's allowed domains against observed traffic — doable but requires the manifest for every run. Keep as a stretch goal.
Performance Bounds
Define a max runs limit (suggest 50) to bound download time and memory usage.
Schema Stability
AWF's PolicyManifest has a version: 1 field. Check this version when parsing to handle future format changes gracefully.
Fallback Chain
audit.jsonl → access.log → no firewall data. Older runs may lack JSONL format.
Tasks
Phase 3a (MVP)
Phase 3b (Follow-up)
Parent epic: #22735
Depends on: #22755 (firewall policy enrichment)
Summary
Add
gh aw audit report [--workflow <name>] [--last <N>]to generate comprehensive audit reports across multiple workflow runs. Designed for security reviews, compliance checks, and feeding debugging/optimization agents.Output
MVP (Phase 3a)
Follow-up (Phase 3b)
Output: Markdown by default (suitable for security reviews, piping to files, `$GITHUB_STEP_SUMMARY`). Also supports JSON.
Usage
Implementation Notes
From the expert review on #22736:
Parallel Artifact Downloads
Downloading artifacts for 10-20 runs serially will be slow. Use parallel goroutines with a concurrency limit (e.g., 5). The existing
downloadRunArtifacts()inlogs_download.gois synchronous — needs goroutine wrapper.Rate Limiting
Downloading 20 runs' artifacts may hit GitHub API rate limits. Implement backoff.
Cross-Run Aggregation Types
Create a new
CrossRunFirewallAnalysistype rather than reusingFirewallAnalysis— semantics are different (union of domains across runs, per-run breakdown).Workflow Filter
The
--workflowflag should accept either workflow name or workflow filename and resolve via the GitHub API (workflow name != filename).Recommendation Engine (Phase 3b)
"Frequently denied → suggest allowlist" is a simple count threshold — keep it simple. "Unused allowed → suggest removal" requires comparing the policy manifest's allowed domains against observed traffic — doable but requires the manifest for every run. Keep as a stretch goal.
Performance Bounds
Define a max runs limit (suggest 50) to bound download time and memory usage.
Schema Stability
AWF's
PolicyManifesthas aversion: 1field. Check this version when parsing to handle future format changes gracefully.Fallback Chain
audit.jsonl→access.log→ no firewall data. Older runs may lack JSONL format.Tasks
Phase 3a (MVP)
audit reportsubcommand with--workflowand--lastflagsCrossRunFirewallAnalysistype)pkg/cli/testdata/Phase 3b (Follow-up)