Problem
The awf-api-proxy sidecar container intermittently fails its health check during docker compose up, causing the agent to never start. This has been reported on gh-aw v0.71.1 with firewall image 0.25.28 and matches a prior pattern from #27888 (now closed).
The compose health check configuration is:
healthcheck:
test: [CMD, curl, '-f', '(localhost/redacted)
interval: 1s
timeout: 1s
retries: 5
start_period: 2s
Total window is only 7 seconds (2s start + 5×1s), which may be insufficient on loaded GitHub-hosted runners.
Context
Root Cause
Two compounding problems:
-
Insufficient health check grace period. The start_period: 2s and 5 retries with 1s interval gives a maximum window of ~7s for the Node.js api-proxy process to start, bind its port, and pass the /health check. On a resource-constrained or busy runner, container startup alone can exceed this window.
-
Missing log capture on failure. When docker compose up fails due to an unhealthy container, the api-proxy logs are not captured before containers are removed. The api-proxy-logs mount directory is absent from uploaded artifacts, making the failure undiagnosable.
Relevant source files:
src/docker-manager.ts — generates the Docker Compose config including the health check parameters
src/cli.ts — stopContainers() cleans up after failure; does not capture api-proxy logs before teardown
containers/api-proxy/ — the api-proxy container itself
Proposed Solution
-
Increase health check tolerance in src/docker-manager.ts: raise start_period to 5s, retries to 10, and timeout to 3s to match the Squid health check's robustness.
-
Capture api-proxy container logs on failure in src/cli.ts: before calling docker compose down, run docker compose logs api-proxy and write to the work directory's api-proxy-logs/ folder so it is included in uploaded artifacts.
-
Always include api-proxy-logs/ in artifacts (even if empty) to make absence explicit and aid triage.
-
Optionally retry docker compose up once if only the api-proxy health check fails, before treating the whole run as failed.
Generated by Firewall Issue Dispatcher · ● 726.9K · ◷
Problem
The
awf-api-proxysidecar container intermittently fails its health check duringdocker compose up, causing the agent to never start. This has been reported ongh-aw v0.71.1with firewall image0.25.28and matches a prior pattern from #27888 (now closed).The compose health check configuration is:
Total window is only 7 seconds (2s start + 5×1s), which may be insufficient on loaded GitHub-hosted runners.
Context
gh-aw v0.71.1, firewall0.25.28ubuntu24,ImageVersion: 20260426.100.1elastic/docs-contentRoot Cause
Two compounding problems:
Insufficient health check grace period. The
start_period: 2sand 5 retries with 1s interval gives a maximum window of ~7s for the Node.js api-proxy process to start, bind its port, and pass the/healthcheck. On a resource-constrained or busy runner, container startup alone can exceed this window.Missing log capture on failure. When
docker compose upfails due to an unhealthy container, the api-proxy logs are not captured before containers are removed. Theapi-proxy-logsmount directory is absent from uploaded artifacts, making the failure undiagnosable.Relevant source files:
src/docker-manager.ts— generates the Docker Compose config including the health check parameterssrc/cli.ts—stopContainers()cleans up after failure; does not capture api-proxy logs before teardowncontainers/api-proxy/— the api-proxy container itselfProposed Solution
Increase health check tolerance in
src/docker-manager.ts: raisestart_periodto5s,retriesto10, andtimeoutto3sto match the Squid health check's robustness.Capture api-proxy container logs on failure in
src/cli.ts: before callingdocker compose down, rundocker compose logs api-proxyand write to the work directory'sapi-proxy-logs/folder so it is included in uploaded artifacts.Always include
api-proxy-logs/in artifacts (even if empty) to make absence explicit and aid triage.Optionally retry
docker compose uponce if only the api-proxy health check fails, before treating the whole run as failed.