Skip to content

[WIP] Add diagnostic logs collection for Docker failures#25559

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/add-diagnostic-logs-collection
Closed

[WIP] Add diagnostic logs collection for Docker failures#25559
Copilot wants to merge 1 commit intomainfrom
copilot/add-diagnostic-logs-collection

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 10, 2026

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.


This section details on the original issue you should resolve

<issue_title>feat: collect Docker operational logs on failure for AWF diagnostics</issue_title>
<issue_description>## Summary

When AWF containers fail to start (e.g., Squid crashes on startup in DinD environments), we currently have no diagnostic information because application-level logs (access.log, audit.jsonl) are never written. This makes debugging customer issues require multiple rounds of back-and-forth to gather basic info like docker logs output.

Motivation: A customer running ARC runners with DinD sidecars hit a Squid container crash (exit code 1) where the root cause was invisible — the squid access logs were empty because Squid never started. Diagnosing this required asking the customer to manually add debug steps to their workflow. See #18385.

Proposal

Add a --diagnostic-logs flag (off by default) that collects Docker operational logs on failure and includes them in the firewall-audit-logs artifact under a diagnostics/ subdirectory.

What to collect on failure

Data Command Why
Container logs docker logs <container> for squid, agent, api-proxy, iptables-init Captures entrypoint stderr/stdout — shows WHY a container crashed
Container exit codes docker inspect --format '{{.State.ExitCode}}' Quick triage signal
Mount inspection docker inspect --format '{{json .Mounts}}' Shows what Docker actually mounted vs. what was requested (critical for DinD debugging)
Sanitized docker-compose.yml Strip env vars containing tokens/keys Shows the full container config without leaking secrets

What NOT to collect (even with the flag)

  • Raw environment variables (may contain API keys)
  • Full docker inspect output (contains env vars)
  • Host filesystem contents

Feature flag behavior

  • --diagnostic-logs: Opt-in flag, off by default
  • When enabled and AWF exits with a non-zero code, collect the above and write to ${auditDir}/diagnostics/ or ${workDir}/diagnostics/
  • When disabled (default), no additional data is collected — current behavior preserved
  • Consider making this default-on in a future release once validated

Implementation notes

  • Collection should happen in the cleanup/error path (src/cli.ts catch block and signal handlers)
  • Use docker logs with --tail 200 to cap output size
  • Sanitize docker-compose.yml by redacting any env var value containing token, key, secret, password (case-insensitive)
  • If a container doesn't exist (already cleaned up), skip gracefully
  • Bundle into existing firewall-audit-logs artifact upload path

Acceptance criteria

  • --diagnostic-logs flag added to AWF CLI
  • On failure with flag enabled: container logs, exit codes, mount info, and sanitized compose config collected
  • Output written to diagnostics/ subdirectory alongside existing audit artifacts
  • No secrets leaked in collected diagnostics
  • Works in both standard and DinD environments
  • Documentation updated</issue_description>

Comments on the Issue (you are @copilot in this section)

@pelikhan @copilot add a features flag in the frontmatter to enable this:

features:
awf-diagnostic-logs: true

Copilot AI linked an issue Apr 10, 2026 that may be closed by this pull request
6 tasks
@pelikhan pelikhan closed this Apr 10, 2026
Copilot AI requested a review from pelikhan April 10, 2026 00:17
Copilot stopped work on behalf of pelikhan due to an error April 10, 2026 00:17
@github-actions github-actions Bot deleted the copilot/add-diagnostic-logs-collection branch April 17, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: collect Docker operational logs on failure for AWF diagnostics

2 participants