Skip to content

[awf] CLI: Add --diagnostic-logs flag to collect Docker container logs on failure #1928

@lpcox

Description

@lpcox

Problem

When AWF containers fail to start (e.g., Squid crashes during initialization in DinD environments), application-level logs (access.log, audit.jsonl) are never written because the service never became healthy. Diagnosing these failures requires asking users to manually add debug steps to their workflow.

A concrete example: a customer on ARC runners with DinD sidecars hit a Squid container crash (exit code 1) with empty access logs, requiring multiple support rounds. See github/gh-aw#18385.

Context

Root Cause

The AWF cleanup path (src/cli.ts catch blocks and signal handlers, src/docker-manager.ts cleanup function) does not run docker logs <container> on failure. Low-level Docker operational state (container stderr/stdout, exit codes, mount bindings) is discarded when containers are stopped.

Proposed Solution

Add --diagnostic-logs flag (already partially planned per the issue):

  1. src/types.ts: Add diagnosticLogs?: boolean to WrapperConfig.

  2. src/cli.ts: Add --diagnostic-logs option to the commander program and wire it into WrapperConfig. Note: --diagnostic-logs already appears at line 1554 in src/docker-manager.ts in the compose generation path — verify if this is already scaffolded and hook into the cleanup path.

  3. src/docker-manager.tsperformCleanup() / error handlers:

    • On non-zero exit with flag enabled, call docker logs --tail 200 awf-squid, awf-agent, awf-api-proxy, awf-iptables-init and write to \$\{workDir}/diagnostics/.
    • Run docker inspect --format '\{\{.State.ExitCode}}' and docker inspect --format '\{\{json .Mounts}}' for each container.
    • Emit a sanitized docker-compose.yml (redact env var values matching token|key|secret|password case-insensitively).
  4. Bundle into audit artifact: Write diagnostics to \$\{auditDir}/diagnostics/ when --audit-dir is set, else \$\{workDir}/diagnostics/.

  5. Graceful skipping: If a container no longer exists (already cleaned up), skip that container without error.

What NOT to collect

  • Raw environment variables (may contain API keys)
  • Full docker inspect output (contains env vars)
  • Host filesystem contents

Generated by Firewall Issue Dispatcher · ● 1.6M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions