Problem
The awf-api-proxy sidecar container intermittently fails its Docker health check during docker compose up, causing the entire AWF agent startup to fail with exit code 1 before any agent turns run. Affected workflows include Smoke CI, Sub-Issue Closer, and Daily Team Evolution Insights.
Failure signature:
Container awf-api-proxy Waiting
Container awf-squid Healthy
Container awf-api-proxy Error
dependency failed to start: container awf-api-proxy is unhealthy
[ERROR] Failed to start containers: Command failed with exit code 1: docker compose up -d --pull never
Context
Root Cause
The awf-api-proxy Node.js HTTP server takes longer to bind on some runners (resource contention, cold start), exceeding the Docker health check start_period. The Docker Compose depends_on: api-proxy: condition: service_healthy in src/docker-manager.ts means any health check timeout terminates the entire stack.
Specifically:
- The
HEALTHCHECK in containers/api-proxy/Dockerfile likely uses default start_period (0s) and low retries, which is intolerant of runner load variance
- No
docker logs awf-api-proxy capture in the error path in src/docker-manager.ts, so the actual container failure reason is hidden
Proposed Solution
-
containers/api-proxy/Dockerfile: Increase HEALTHCHECK start_period to at least 15s and set retries=5 to tolerate slow cold starts:
HEALTHCHECK --interval=3s --timeout=5s --start-period=15s --retries=5 \
CMD curl -f (localhost/redacted) || exit 1
-
src/docker-manager.ts: In the error path for docker compose up failure, add a step to capture and log docker logs awf-api-proxy so the failure reason is surfaced in agent-stdio.log.
-
src/docker-manager.ts: Consider adding a retry loop (up to 2 retries with docker compose up) specifically for transient health check failures, guarded by checking the exit message.
-
src/docker-manager.ts: The api-proxy service depends_on block should be reviewed — consider whether service_started (instead of service_healthy) is appropriate for non-critical proxy paths, or keep service_healthy and fix the health check timing.
Success criteria: Smoke CI passes 5 consecutive runs without awf-api-proxy Error.
Generated by Firewall Issue Dispatcher · ● 450K · ◷
Problem
The
awf-api-proxysidecar container intermittently fails its Docker health check duringdocker compose up, causing the entire AWF agent startup to fail with exit code 1 before any agent turns run. Affected workflows include Smoke CI, Sub-Issue Closer, and Daily Team Evolution Insights.Failure signature:
Context
25049338576,25049605437,25052667955(all 2026-04-28)containers/api-proxy/; health check and Docker Compose integration are configured insrc/docker-manager.ts(theapi-proxyservice block)Root Cause
The
awf-api-proxyNode.js HTTP server takes longer to bind on some runners (resource contention, cold start), exceeding the Docker health checkstart_period. The Docker Composedepends_on: api-proxy: condition: service_healthyinsrc/docker-manager.tsmeans any health check timeout terminates the entire stack.Specifically:
HEALTHCHECKincontainers/api-proxy/Dockerfilelikely uses defaultstart_period(0s) and lowretries, which is intolerant of runner load variancedocker logs awf-api-proxycapture in the error path insrc/docker-manager.ts, so the actual container failure reason is hiddenProposed Solution
containers/api-proxy/Dockerfile: IncreaseHEALTHCHECKstart_periodto at least15sand setretries=5to tolerate slow cold starts:src/docker-manager.ts: In the error path fordocker compose upfailure, add a step to capture and logdocker logs awf-api-proxyso the failure reason is surfaced inagent-stdio.log.src/docker-manager.ts: Consider adding a retry loop (up to 2 retries withdocker compose up) specifically for transient health check failures, guarded by checking the exit message.src/docker-manager.ts: Theapi-proxyservicedepends_onblock should be reviewed — consider whetherservice_started(instead ofservice_healthy) is appropriate for non-critical proxy paths, or keepservice_healthyand fix the health check timing.Success criteria: Smoke CI passes 5 consecutive runs without
awf-api-proxy Error.