Parent investigation: #28947
Problem Statement
The awf-api-proxy sidecar container intermittently fails its Docker health check during docker compose up, causing the entire agent startup to fail with exit code 1. When this happens, no agent turns run at all — the workflow exits immediately with an opaque exitCode: 1 error.
Affected Workflows & Runs
| Run |
Workflow |
Engine |
Time (UTC) |
| §25049338576 |
Sub-Issue Closer |
copilot |
2026-04-28 11:10 |
| §25049605437 |
Daily Team Evolution Insights |
claude |
2026-04-28 11:14 |
| §25052667955 |
Smoke CI |
copilot |
2026-04-28 12:25 |
Failure Signature
Container awf-api-proxy Waiting
Container awf-squid Healthy
Container awf-api-proxy Error
dependency failed to start: container awf-api-proxy is unhealthy
[ERROR] Failed to start containers: Command failed with exit code 1: docker compose up -d --pull never
Root Cause (Probable)
The awf-api-proxy container starts and passes the Docker Starting/Started phases but then fails its health check before the timeout expires. The failure is intermittent — successful runs on the same day (e.g., §25048535353) show Container awf-api-proxy Healthy. This points to:
- Slow health check endpoint startup — the proxy HTTP server takes longer to bind on some runners, exceeding the health check
start_period
- Resource contention — shared runner load causes the container to start slower intermittently
- Health check configuration — insufficient
retries or interval for the observed startup variance
Proposed Remediation
- Inspect
awf-api-proxy Dockerfile HEALTHCHECK settings (interval, timeout, retries, start-period)
- Increase
start_period (e.g., from default 0 to 10s) and/or retries to tolerate slower cold starts
- Add container-level health check failure log capture (docker logs
awf-api-proxy) in the error path to surface the actual failure reason
- Consider adding a retry loop at the
docker compose up level for transient health check failures
Success Criteria
Generated by [aw] Failure Investigator (6h) · ● 267.1K · ◷
Parent investigation: #28947
Problem Statement
The
awf-api-proxysidecar container intermittently fails its Docker health check duringdocker compose up, causing the entire agent startup to fail with exit code 1. When this happens, no agent turns run at all — the workflow exits immediately with an opaqueexitCode: 1error.Affected Workflows & Runs
Failure Signature
Root Cause (Probable)
The
awf-api-proxycontainer starts and passes the DockerStarting/Startedphases but then fails its health check before the timeout expires. The failure is intermittent — successful runs on the same day (e.g., §25048535353) showContainer awf-api-proxy Healthy. This points to:start_periodretriesorintervalfor the observed startup varianceProposed Remediation
awf-api-proxyDockerfileHEALTHCHECKsettings (interval,timeout,retries,start-period)start_period(e.g., from default 0 to 10s) and/orretriesto tolerate slower cold startsawf-api-proxy) in the error path to surface the actual failure reasondocker compose uplevel for transient health check failuresSuccess Criteria
awf-api-proxy Errorin logsawf-api-proxycontainer logs are surfaced in agent-stdio.log when health check failsRelated to [aw-failures] P0/P1 Failure Report (2026-04-28 07:00–13:00 UTC): awf-api-proxy health check + copilot node-not-found #28947