Skip to content

[awf] CLI: add --diagnostic-logs flag to capture container logs on failure #2011

@lpcox

Description

@lpcox

Problem

When AWF containers fail to start (e.g., Squid crashes on startup in DinD environments), there are no diagnostic logs — docker logs output, container exit codes, and mount info are never captured. Debugging requires customers to manually add diagnostic steps to their workflows.

Context

Original report: github/gh-aw#25548

A customer on ARC runners with DinD sidecars hit a Squid container crash (exit code 1) where the Squid access logs were empty (Squid never started). Diagnosing the root cause required multiple back-and-forth rounds.

Root Cause

The current cleanup lifecycle in src/cli.ts and src/docker-manager.ts calls docker compose down -v and removes the work directory, discarding all docker logs output and container state. There is no mechanism to capture container-level operational logs (stdout/stderr from entrypoints) before teardown.

The --keep-containers flag preserves containers but requires the user to know in advance they need it, and doesn't help for automated CI failures.

Proposed Solution

Add a --diagnostic-logs flag (off by default) to src/cli.ts that, when enabled and AWF exits with a non-zero code, collects container-level diagnostics before teardown:

  1. In src/cli.ts: Add --diagnostic-logs option to the Commander program definition.
  2. In src/docker-manager.ts: Add a collectDiagnostics(workDir, auditDir) function that runs:
    • docker logs awf-squiddiagnostics/squid-container.log
    • docker logs awf-agentdiagnostics/agent-container.log
    • docker logs awf-iptables-initdiagnostics/iptables-init.log (if exists)
    • docker inspect --format '\{\{.State.ExitCode}}' awf-squid awf-agentdiagnostics/exit-codes.txt
    • docker inspect --format '\{\{json .Mounts}}' awf-agentdiagnostics/agent-mounts.json
    • Sanitized docker-compose.yml (strip env var values containing tokens/keys) → diagnostics/docker-compose-sanitized.yml
  3. Call collectDiagnostics in the error/signal cleanup path (src/cli.ts:95-103, 122-126) when --diagnostic-logs is set and exit code is non-zero.
  4. Write to \$\{workDir}/diagnostics/ (preserved if --keep-containers, or moved alongside squid-logs).
  5. Security: Never collect raw env var values, full docker inspect JSON, or /etc/shadow; sanitize the compose file before writing.

Generated by Firewall Issue Dispatcher · ● 2.1M ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions