Conversation
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR improves reliability and debuggability of the awf-api-proxy sidecar startup by relaxing its Docker health check thresholds and adding a one-time retry flow when Docker Compose reports the api-proxy container as unhealthy, along with clearer failure messaging and targeted log dumping.
Changes:
- Increased
awf-api-proxyDocker healthcheck tolerances (interval/timeout/retries/start_period) to reduce flakiness on slow runners. - Added
startContainers()retry-once behavior specifically forawf-api-proxyhealthcheck failures, including best-effort teardown between attempts. - Added helper functions to detect api-proxy unhealthy errors and to dump last 50 lines of container logs for diagnosis, plus improved final error message after double failure.
Show a summary per file
| File | Description |
|---|---|
src/docker-manager.ts |
Relaxes api-proxy healthcheck config; adds api-proxy-specific unhealthy detection, container log dumping, and a retry-once startup flow with clearer terminal error on repeated failure. |
src/docker-manager.test.ts |
Updates healthcheck expectation assertions and adds tests covering retry behavior, final error message on double failure, and “no retry” for other unhealthy containers. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
| try { | ||
| const result = await execa('docker', ['logs', '--tail', '50', containerName], { | ||
| reject: false, | ||
| env: getLocalDockerEnv(), | ||
| }); | ||
| const combined = [result.stdout, result.stderr].filter(Boolean).join('\n').trim(); | ||
| if (combined) { | ||
| logger.error(`${containerName} container logs (last 50 lines):\n${combined}`); | ||
| } |
| // Other error during retry — fall through to generic handler below | ||
| logger.error('Failed to start containers (retry):', retryError); | ||
| throw retryError; | ||
| } |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@copilot address the review feedback |
Done in 8295ec4. Both review issues are addressed:
Two new tests cover these cases: the exit-code suppression and the retry-to-Squid-diagnostic routing. |
This comment has been minimized.
This comment has been minimized.
Smoke Test Results✅ GitHub MCP: chore: optimize test-coverage-improver workflow for ~50% token reduction PASS
|
🔥 Smoke Test Results
PR: fix: retry awf-api-proxy health check failures and improve startup error messaging Overall: PARTIAL — MCP ✅; HTTP/file tests skipped due to unresolved
|
Smoke TestPRs: "fix: retry awf-api-proxy health check failures and improve startup error messaging"; "feat: preflight binary check for codex in AWF agent container" Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🧪 Chroot Version Comparison Results
Overall: ❌ Not all versions match — Python and Node.js versions differ between host and chroot environments.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Smoke Test Results
Overall: FAIL —
|
|
Smoke Test: Copilot BYOK (Offline) Mode — Run #25054697630
Running in BYOK offline mode ( Overall: PASS — triggered by
|
awf-api-proxyintermittently fails its Docker health check on slow/busy runners (observed ~33% on Azure-hosted runners), hard-failing the entire AWF step. When this happens, the misleading downstream error blames the model for never producing output rather than surfacing the actual container startup failure.Changes
Health check tolerances (
src/docker-manager.ts)start_period:10s→30sretries:10→15interval:1s→2s,timeout:2s→3sThese give slower runners more headroom before declaring the container unhealthy.
Retry on transient api-proxy failure
When
docker compose upfails withawf-api-proxy is unhealthy,startContainersnow tears down and retries once. Most transient port-binding delays recover on the second attempt with no user-visible failure.Clearer error on double failure
If api-proxy fails both attempts, a precise error is thrown instead of the generic compose error:
The last 50 lines of
awf-api-proxycontainer logs are dumped to stderr before the retry and on final failure.Non-api-proxy health check failures
All other health check failures (e.g. squid-proxy) continue through the existing Squid log inspection path unchanged — no retry is attempted.
New helpers
isApiProxyUnhealthyError(msg)— detects Docker Compose errors specific toawf-api-proxylogContainerLogsToStderr(containerName)— dumps last 50 lines of a container's logs to stderr for diagnosis