Skip to content

[awf] Agent + API Proxy: P0/P1 failure report — health check intermittent + node not found in copilot container #2275

@lpcox

Description

@lpcox

Problem

A failure report covering two active failure clusters in AWF workflows observed 2026-04-28 07:00–13:00 UTC (28 failed/cancelled out of 40 total runs):

  1. P0 — awf-api-proxy health check intermittent failure: 3+ confirmed runs failed before any agent turns due to awf-api-proxy container health check timeout during docker compose up
  2. P1 — node: command not found in copilot agent container: 2+ confirmed runs fail at agent invocation because node is not on PATH in the copilot engine container

27 PR-triggered cancellations are normal concurrency-cancel behavior.

Context

Root Cause

P0 (api-proxy health check): See companion tracking issue for details. Health check start_period/retries insufficient for runner load variance.

P1 (node not found): The copilot engine requires node to be on PATH inside the agent container chroot. The agent container (containers/agent/) selectively bind-mounts host binaries under /host/ (see src/docker-manager.ts bind mounts section), but node installed on the host (e.g., via nvm or in /home/runner/.nvm/) may not be included in the whitelisted mount paths (/usr, /bin, /sbin, /lib, /lib64, /opt).

If node is installed under /home/runner/.nvm/ or a non-standard path not included in the bind mounts, it will be unavailable inside the chroot. The entrypoint.sh in containers/agent/ sets up the chroot but does not add extra PATH entries for non-standard node installations.

Proposed Solution

For P1 (node: command not found):

  1. containers/agent/entrypoint.sh: Detect common Node.js installation paths on the host (/home/runner/.nvm/versions/node/*/bin, /usr/local/bin/node, /opt/hostedtoolcache/node/*/x64/bin) and, if found, add them to the PATH exported into the chroot environment.

  2. src/docker-manager.ts: Add /home/runner/.nvm (or the resolved NVM_DIR) to the selective bind mounts list (read-only) so that Node.js installed via nvm is accessible inside the container at the same path.

  3. containers/agent/entrypoint.sh: Alternatively, ensure node is installed directly inside the agent container image (add nodejs to the apt-get install line in containers/agent/Dockerfile) so the copilot engine always has a reliable node regardless of host configuration.

Option 3 (install node in the image) is the most robust fix since it doesn't depend on host layout.

For P0: See companion tracking issue for awf-api-proxy health check fixes.

Generated by Firewall Issue Dispatcher · ● 450K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions