api-proxy healthcheck flaps during startup, causing Docker Compose to fail

## Problem

The `api-proxy` container healthcheck can flap during startup, causing Docker Compose to mark it as unhealthy. Since the agent container has `depends_on: api-proxy: service_healthy`, this cascades into a full startup failure with:

```
dependency failed to start: container awf-api-proxy is unhealthy
```

This is an intermittent failure — the api-proxy Node.js server may not have fully bound to port 10000 before the healthcheck fires, or transient resource contention on CI runners causes a brief unresponsiveness window.

## Current healthcheck config

From `src/docker-manager.ts`:

```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:10000/health"]
  interval: 1s
  timeout: 1s
  retries: 5
  start_period: 2s
```

The `start_period: 2s` gives only 2 seconds of grace before failures count. Combined with `interval: 1s`, `timeout: 1s`, and only `retries: 5`, a slow Node.js startup can exhaust retries quickly.

## Impact

This causes intermittent failures in gh-aw agentic workflows that use `--enable-api-proxy`. A [workaround PR](https://github.com/github/gh-aw/pull/27909) was opened in gh-aw to add a retry wrapper around AWF startup, but the root cause should be fixed here.

## Possible fixes

1. **Increase `start_period`** — give the Node.js server more time to bind (e.g., `5s` or `10s`)
2. **Increase `retries`** — allow more attempts before marking unhealthy (e.g., `10` or `15`)
3. **Increase `timeout`** — give each health probe more time (e.g., `2s` or `3s`)
4. **Add startup readiness signal** — have the api-proxy write a ready file after all ports are bound, and check for that in the healthcheck instead of (or in addition to) curling the endpoint
5. **AWF-level retry** — if Docker Compose reports an unhealthy dependency, AWF itself could retry startup (similar to what the gh-aw PR does externally)

## References

- gh-aw workaround PR: https://github.com/github/gh-aw/pull/27909
- Healthcheck config: `src/docker-manager.ts` (~line 1773)
- api-proxy health endpoint: `containers/api-proxy/server.js` (~line 908)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api-proxy healthcheck flaps during startup, causing Docker Compose to fail #2154

Problem

Current healthcheck config

Impact

Possible fixes

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

api-proxy healthcheck flaps during startup, causing Docker Compose to fail #2154

Description

Problem

Current healthcheck config

Impact

Possible fixes

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions