diff --git a/docs/src/content/docs/reference/artifacts.md b/docs/src/content/docs/reference/artifacts.md new file mode 100644 index 00000000000..ac4103f23a0 --- /dev/null +++ b/docs/src/content/docs/reference/artifacts.md @@ -0,0 +1,173 @@ +--- +title: Artifacts +description: Complete reference for artifact names, directory structures, and download patterns used by GitHub Agentic Workflows. +sidebar: + order: 298 +--- + +GitHub Agentic Workflows upload several artifacts during workflow execution. This reference documents every artifact name, its contents, and how to access the data — especially for downstream workflows that use `gh run download` directly instead of `gh aw logs`. + +## Quick Reference + +| Artifact Name | Constant | Type | Description | +|---------------|----------|------|-------------| +| `agent` | `constants.AgentArtifactName` | Multi-file | Unified agent job outputs (logs, safe outputs, token usage summary) | +| `activation` | `constants.ActivationArtifactName` | Multi-file | Activation job output (`aw_info.json`, `prompt.txt`, rate limits) | +| `firewall-audit-logs` | `constants.FirewallAuditArtifactName` | Multi-file | AWF firewall audit/observability logs (token usage, network policy, audit trail) | +| `detection` | `constants.DetectionArtifactName` | Single-file | Threat detection log (`detection.log`) | +| `safe-output` | `constants.SafeOutputArtifactName` | Legacy/back-compat | Historical standalone safe output artifact (`safe_output.jsonl`); in current compiled workflows this content is included in the unified `agent` artifact instead | +| `agent-output` | `constants.AgentOutputArtifactName` | Legacy/back-compat | Historical standalone agent output artifact (`agent_output.json`); in current compiled workflows this content is included in the unified `agent` artifact instead | +| `aw-info` | — | Single-file | Engine configuration (`aw_info.json`) | +| `prompt` | — | Single-file | Generated prompt (`prompt.txt`) | +| `safe-outputs-items` | `constants.SafeOutputItemsArtifactName` | Single-file | Safe output items manifest | +| `code-scanning-sarif` | `constants.SarifArtifactName` | Single-file | SARIF file for code scanning results | + +## Artifact Sets + +The `gh aw logs` and `gh aw audit` commands support `--artifacts` to download only specific artifact groups: + +| Set Name | Artifacts Downloaded | Use Case | +|----------|---------------------|----------| +| `all` | Everything | Full analysis (default) | +| `agent` | `agent` | Agent logs and outputs | +| `activation` | `activation` | Activation data (`aw_info.json`, `prompt.txt`) | +| `firewall` | `firewall-audit-logs` | Network policy and firewall audit data | +| `mcp` | `firewall-audit-logs` | MCP gateway traffic logs | +| `detection` | `detection` | Threat detection output | +| `github-api` | `activation`, `agent` | GitHub API rate limit logs | + +```bash +# Download only firewall artifacts +gh aw logs --artifacts firewall + +# Download agent and firewall artifacts +gh aw logs --artifacts agent --artifacts firewall + +# Download everything (default) +gh aw logs +``` + +## `firewall-audit-logs` + +The `firewall-audit-logs` artifact is uploaded by **all firewall-enabled workflows**. It contains AWF (Agent Workflow Firewall) structured audit and observability logs. + +> **⚠️ Important:** This artifact is **separate** from the `agent` artifact. Token usage data (`token-usage.jsonl`) lives here, not in the `agent` artifact. + +### Directory Structure + +``` +firewall-audit-logs/ +├── api-proxy-logs/ +│ └── token-usage.jsonl ← Token usage data (input/output/cache tokens per API request) +├── squid-logs/ +│ └── access.log ← Network policy log (domain allow/deny decisions) +├── audit.jsonl ← Firewall audit trail (policy matches, rule evaluations) +└── policy-manifest.json ← Policy configuration snapshot +``` + +### Accessing Token Usage Data + +**Recommended: Use `gh aw logs`** + +```bash +# Download and analyze firewall data +gh aw logs --artifacts firewall + +# Output as JSON for scripting +gh aw logs --artifacts firewall --json +``` + +**Direct download with `gh run download`:** + +```bash +# Download the firewall-audit-logs artifact +gh run download -n firewall-audit-logs + +# Token usage data is at: +cat firewall-audit-logs/api-proxy-logs/token-usage.jsonl + +# Network access log is at: +cat firewall-audit-logs/squid-logs/access.log + +# Audit trail is at: +cat firewall-audit-logs/audit.jsonl + +# Policy manifest is at: +cat firewall-audit-logs/policy-manifest.json +``` + +### Common Mistake + +Downstream workflows sometimes download `agent-artifacts` or `agent` expecting to find `token-usage.jsonl`. This will silently return no data — the token usage file is only in the `firewall-audit-logs` artifact. + +```bash +# ❌ WRONG — token-usage.jsonl is NOT in the agent artifact +gh run download -n agent +cat agent/token-usage.jsonl # File not found! + +# ✅ CORRECT — download from firewall-audit-logs +gh run download -n firewall-audit-logs +cat firewall-audit-logs/api-proxy-logs/token-usage.jsonl +``` + +## `agent` + +The unified `agent` artifact contains all agent job outputs. + +### Contents + +- Agent execution logs +- Safe output data (`agent_output.json`) +- GitHub API rate limit logs (`github_rate_limits.jsonl`) +- Token usage summary (`agent_usage.json`) — aggregated totals only; per-request data is in `firewall-audit-logs` + +## `activation` + +The `activation` artifact contains activation job outputs. + +### Contents + +- `aw_info.json` — Engine configuration and workflow metadata +- `prompt.txt` — The generated prompt sent to the AI agent +- `github_rate_limits.jsonl` — Rate limit data from the activation job + +## `detection` + +The `detection` artifact contains threat detection output. + +### Contents + +- `detection.log` — Threat detection analysis results + +Legacy name: `threat-detection.log` (still supported for backward compatibility). + +## Naming Compatibility + +Artifact names changed between upload-artifact v4 and v5. The `gh aw logs` and `gh aw audit` commands handle both naming schemes transparently: + +| Old Name (pre-v5) | New Name (v5+) | File Inside | +|--------------------|----------------|-------------| +| `aw_info.json` | `aw-info` | `aw_info.json` | +| `safe_output.jsonl` | `safe-output` | `safe_output.jsonl` | +| `agent_output.json` | `agent-output` | `agent_output.json` | +| `prompt.txt` | `prompt` | `prompt.txt` | +| `threat-detection.log` | `detection` | `detection.log` | + +Single-file artifacts are automatically flattened to root level regardless of their artifact directory name. Multi-file artifacts (`firewall-audit-logs`, `agent`, `activation`) retain their directory structure. + +## Workflow Call Prefixes + +When workflows are invoked via `workflow_call`, GitHub Actions prepends a short hash to artifact names (e.g., `abc123-firewall-audit-logs`). The CLI handles this automatically by matching artifact names that end with `-{base-name}`. + +```bash +# Both of these are recognized as the firewall artifact: +# - firewall-audit-logs (direct invocation) +# - abc123-firewall-audit-logs (workflow_call invocation) +``` + +## Related Documentation + +- [Audit Commands](/gh-aw/reference/audit/) — Download and analyze workflow run artifacts +- [Cost Management](/gh-aw/reference/cost-management/) — Track token usage and inference spend +- [Network](/gh-aw/reference/network/) — Firewall and domain allow/deny configuration +- [Compilation Process](/gh-aw/reference/compilation-process/) — How workflows are compiled including artifact upload steps diff --git a/docs/src/content/docs/reference/compilation-process.md b/docs/src/content/docs/reference/compilation-process.md index 23962e23873..6aa00885fdf 100644 --- a/docs/src/content/docs/reference/compilation-process.md +++ b/docs/src/content/docs/reference/compilation-process.md @@ -252,10 +252,27 @@ Workflows generate several artifacts during execution: | **agent_output.json** | `/tmp/gh-aw/safeoutputs/` | AI agent output with structured safe output data (create_issue, add_comment, etc.) | Uploaded by agent job, downloaded by safe output jobs, auto-deleted after 90 days | | **agent_usage.json** | `/tmp/gh-aw/` | Aggregated token counts: `{"input_tokens":…,"output_tokens":…,"cache_read_tokens":…,"cache_write_tokens":…}` | Bundled in the unified agent artifact when the firewall is enabled; accessible to third-party tools without parsing step summaries | | **prompt.txt** | `/tmp/gh-aw/aw-prompts/` | Generated prompt sent to AI agent (includes markdown instructions, imports, context variables) | Retained for debugging and reproduction | -| **firewall-logs/** | `/tmp/gh-aw/firewall-logs/` | Network access logs in Squid format (when `network.firewall:` enabled) | Analyzed by `gh aw logs` command | +| **firewall-audit-logs** | See structure below | Dedicated artifact for AWF audit/observability logs (token usage, network policy, audit trail) | Uploaded by all firewall-enabled workflows; analyzed by `gh aw logs --artifacts firewall` | +| **firewall-logs/** | `/tmp/gh-aw/sandbox/firewall/logs/` | Network access logs in Squid format (when `network.firewall:` enabled) | Analyzed by `gh aw logs` command | | **cache-memory/** | `/tmp/gh-aw/cache-memory/` | Persistent agent memory across runs (when `tools.cache-memory:` configured) | Restored at start, saved at end via GitHub Actions cache | | **patches/**, **sarif/**, **metadata/** | Various | Safe output data (git patches, SARIF files, metadata JSON) | Temporary, cleaned after processing | +### `firewall-audit-logs` Artifact Structure + +The `firewall-audit-logs` artifact is a dedicated multi-file artifact uploaded by all firewall-enabled workflows. It is **separate** from the unified `agent` artifact. Downstream workflows that need token usage data or firewall audit logs must download this artifact specifically. + +``` +firewall-audit-logs/ +├── api-proxy-logs/ +│ └── token-usage.jsonl ← Token usage data per request +├── squid-logs/ +│ └── access.log ← Network policy log (allow/deny) +├── audit.jsonl ← Firewall audit trail +└── policy-manifest.json ← Policy configuration snapshot +``` + +> **Tip:** Use `gh aw logs --artifacts firewall` to download and analyze firewall data instead of `gh run download` directly. The CLI handles artifact naming and backward compatibility automatically. See the [Artifacts reference](/gh-aw/reference/artifacts/) for the complete artifact naming guide. + ## MCP Server Integration Model Context Protocol (MCP) servers provide tools to AI agents. Compilation generates `mcp-config.json` from workflow configuration. diff --git a/scratchpad/artifact-naming-compatibility.md b/scratchpad/artifact-naming-compatibility.md index 60bebeca86b..c55f6a47db1 100644 --- a/scratchpad/artifact-naming-compatibility.md +++ b/scratchpad/artifact-naming-compatibility.md @@ -31,12 +31,71 @@ The `gh aw logs` and `gh aw audit` commands maintain full backward and forward c ## Compatibility Matrix +### Single-File Artifacts + +These artifacts contain exactly one file and are flattened to the root directory by `flattenSingleFileArtifacts()`: + | Artifact Name (Old) | Artifact Name (New) | File in Artifact | After Flattening | CLI Expects | |---------------------|---------------------|------------------|------------------|-------------| | `aw_info.json` | `aw-info` | `aw_info.json` | `aw_info.json` | ✅ | | `safe_output.jsonl` | `safe-output` | `safe_output.jsonl` | `safe_output.jsonl` | ✅ | | `agent_output.json` | `agent-output` | `agent_output.json` | `agent_output.json` | ✅ | | `prompt.txt` | `prompt` | `prompt.txt` | `prompt.txt` | ✅ | +| `threat-detection.log` | `detection` | `detection.log` | `detection.log` | ✅ | + +### Multi-File Artifacts + +These artifacts are initially downloaded by `gh run download` as directory trees that retain their internal structure. However, unlike the single-file artifact handling above, `gh aw logs` / `gh aw audit` may perform additional post-processing for some multi-file artifacts (notably `agent` and `activation`) to move expected files into the final layout used by the CLI. + +| Artifact Name | Constant | Contents | Notes | +|---------------|----------|----------|-------| +| `firewall-audit-logs` | `constants.FirewallAuditArtifactName` | AWF structured audit/observability logs | Uploaded by all firewall-enabled workflows; retains directory structure after download | +| `agent` | `constants.AgentArtifactName` | Unified agent job outputs (logs, safe outputs, token usage) | Downloaded as a directory tree, then post-processed by CLI flattening/reorganization helpers | +| `activation` | `constants.ActivationArtifactName` | Activation job output (`aw_info.json`, `prompt.txt`) | Downloaded as a directory tree, then post-processed by CLI flattening helpers for downstream use | + +#### `firewall-audit-logs` Directory Structure + +The `firewall-audit-logs` artifact (constant: `constants.FirewallAuditArtifactName`) is uploaded by all firewall-enabled agentic workflows. It is **separate** from the `agent` artifact and must be downloaded independently. + +``` +firewall-audit-logs/ +├── api-proxy-logs/ +│ └── token-usage.jsonl ← Token usage data (input/output/cache tokens per request) +├── squid-logs/ +│ └── access.log ← Network policy log (domain allow/deny decisions) +├── audit.jsonl ← Firewall audit trail (policy matches, rule evaluations) +└── policy-manifest.json ← Policy configuration snapshot +``` + +**Downloading firewall audit logs with `gh run download`:** + +```bash +# Download only the firewall-audit-logs artifact +gh run download -n firewall-audit-logs + +# The data is then at: +# firewall-audit-logs/api-proxy-logs/token-usage.jsonl +# firewall-audit-logs/squid-logs/access.log +# firewall-audit-logs/audit.jsonl +# firewall-audit-logs/policy-manifest.json +``` + +**Recommended: Use `gh aw logs` instead of `gh run download`:** + +The `gh aw logs` command knows the correct artifact names and handles backward compatibility automatically: + +```bash +# Download and analyze all logs (including firewall data) +gh aw logs + +# Download only firewall artifacts +gh aw logs --artifacts firewall + +# Output as JSON for programmatic use +gh aw logs --artifacts firewall --json +``` + +> **⚠️ Common mistake:** Downloading `agent-artifacts` or `agent` and expecting to find `token-usage.jsonl` there. Token usage data lives in the `firewall-audit-logs` artifact, not in the agent artifact. ## Testing