Problem
When AWF containers fail to start (e.g., Squid crashes during initialization in DinD environments), application-level logs (access.log, audit.jsonl) are never written because the service never became healthy. Diagnosing these failures requires asking users to manually add debug steps to their workflow.
A concrete example: a customer on ARC runners with DinD sidecars hit a Squid container crash (exit code 1) with empty access logs, requiring multiple support rounds. See github/gh-aw#18385.
Context
Root Cause
The AWF cleanup path (src/cli.ts catch blocks and signal handlers, src/docker-manager.ts cleanup function) does not run docker logs <container> on failure. Low-level Docker operational state (container stderr/stdout, exit codes, mount bindings) is discarded when containers are stopped.
Proposed Solution
Add --diagnostic-logs flag (already partially planned per the issue):
-
src/types.ts: Add diagnosticLogs?: boolean to WrapperConfig.
-
src/cli.ts: Add --diagnostic-logs option to the commander program and wire it into WrapperConfig. Note: --diagnostic-logs already appears at line 1554 in src/docker-manager.ts in the compose generation path — verify if this is already scaffolded and hook into the cleanup path.
-
src/docker-manager.ts — performCleanup() / error handlers:
- On non-zero exit with flag enabled, call
docker logs --tail 200 awf-squid, awf-agent, awf-api-proxy, awf-iptables-init and write to \$\{workDir}/diagnostics/.
- Run
docker inspect --format '\{\{.State.ExitCode}}' and docker inspect --format '\{\{json .Mounts}}' for each container.
- Emit a sanitized
docker-compose.yml (redact env var values matching token|key|secret|password case-insensitively).
-
Bundle into audit artifact: Write diagnostics to \$\{auditDir}/diagnostics/ when --audit-dir is set, else \$\{workDir}/diagnostics/.
-
Graceful skipping: If a container no longer exists (already cleaned up), skip that container without error.
What NOT to collect
- Raw environment variables (may contain API keys)
- Full
docker inspect output (contains env vars)
- Host filesystem contents
Generated by Firewall Issue Dispatcher · ● 1.6M · ◷
Problem
When AWF containers fail to start (e.g., Squid crashes during initialization in DinD environments), application-level logs (
access.log,audit.jsonl) are never written because the service never became healthy. Diagnosing these failures requires asking users to manually add debug steps to their workflow.A concrete example: a customer on ARC runners with DinD sidecars hit a Squid container crash (exit code 1) with empty access logs, requiring multiple support rounds. See github/gh-aw#18385.
Context
--diagnostic-logsopt-in flagRoot Cause
The AWF cleanup path (
src/cli.tscatch blocks and signal handlers,src/docker-manager.tscleanup function) does not rundocker logs <container>on failure. Low-level Docker operational state (container stderr/stdout, exit codes, mount bindings) is discarded when containers are stopped.Proposed Solution
Add
--diagnostic-logsflag (already partially planned per the issue):src/types.ts: AdddiagnosticLogs?: booleantoWrapperConfig.src/cli.ts: Add--diagnostic-logsoption to the commander program and wire it intoWrapperConfig. Note:--diagnostic-logsalready appears at line 1554 insrc/docker-manager.tsin the compose generation path — verify if this is already scaffolded and hook into the cleanup path.src/docker-manager.ts—performCleanup()/ error handlers:docker logs --tail 200 awf-squid,awf-agent,awf-api-proxy,awf-iptables-initand write to\$\{workDir}/diagnostics/.docker inspect --format '\{\{.State.ExitCode}}'anddocker inspect --format '\{\{json .Mounts}}'for each container.docker-compose.yml(redact env var values matchingtoken|key|secret|passwordcase-insensitively).Bundle into audit artifact: Write diagnostics to
\$\{auditDir}/diagnostics/when--audit-diris set, else\$\{workDir}/diagnostics/.Graceful skipping: If a container no longer exists (already cleaned up), skip that container without error.
What NOT to collect
docker inspectoutput (contains env vars)