You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running an agentic workflow on a self-hosted runner deployed via actions-runner-controller (ARC) on Kubernetes, with the recommended Docker-in-Docker (dind) sidecar pattern, the MCP gateway fails to start with Docker socket not found at /var/run/docker.sock — even though a Unix Docker socket is correctly exposed on the runner pod and DOCKER_HOST=unix:///var/run/docker.sock is set on the runner container.
Workflow execution aborts at the gateway startup step.
This is the layout described in the ARC documentation for Docker-in-Docker. From the runner container's perspective, /var/run/docker.sock is a real Unix socket and docker commands work.
Observed behaviour
The compiled workflow logs the gateway launch command:
[INFO] Starting MCP Gateway in containerized mode...
[INFO] Auto-detected baked-in WASM guards at /guards
[INFO] MCP_GATEWAY_WASM_GUARDS_DIR=/guards
[INFO] Running in containerized environment
[WARN] Invalid container ID format: arc-gaw-xzpj8-runner-8lthc
[WARN] Could not determine container ID
Error: Docker socket not found at /var/run/docker.sock
Error: Mount the Docker socket: -v /var/run/docker.sock:/var/run/docker.sock
Error: Process completed with exit code 1.
Root-cause analysis
We traced the failure to three independent issues in gh-aw and gh-aw-mcpg. All three are specific to Kubernetes/ARC environments and would not surface on GitHub-hosted runners.
gh-aw-mcpg/run_containerized.sh (lines 49-53) parses the cgroup hierarchy and validates the extracted ID against the Docker hash format:
# Container IDs must be 12-64 hex characters onlyif!echo"$cid"| grep -qE '^[a-f0-9]{12,64}$';then
log_warn "Invalid container ID format: $cid"return 1
fi
On EKS with containerd (and on dind with default cgroup namespacing), the gateway's /proc/self/cgroup view contains the K8s pod name — arc-gaw-xzpj8-runner-8lthc in our log — not a Docker container hash. The regex rejects it and the script falls back to defaults.
This is logged as a [WARN] and is not directly fatal, but combined with #2 below it leads to an avoidable failure path.
2. DOCKER_HOST is set on the runner but not propagated to the gateway container
The same script (lines 87-94) does honour DOCKER_HOST when present:
local socket_path="${DOCKER_HOST:-/var/run/docker.sock}"
socket_path="${socket_path#unix://}"if [ !-S"$socket_path" ];then
log_error "Docker socket not found at $socket_path"
log_error "Mount the Docker socket: -v /var/run/docker.sock:/var/run/docker.sock"exit 1
fi
…but the docker run command generated by gh-aw to launch the gateway does not pass -e DOCKER_HOST from the runner to the gateway container. So even when the runner exports DOCKER_HOST=unix:///var/run/docker.sock (or tcp://… for TCP-only dind), the gateway always falls back to the hardcoded /var/run/docker.sock.
In our specific case the path defaulted to is the same, so this is not what triggers the failure — but it's a latent bug that prevents any custom socket path or TCP daemon from ever working.
3. The bind-mounted socket isn't visible as a socket inside the gateway
The [ -S /var/run/docker.sock ] test fails inside the gateway container, even though the source path is a real Unix socket on the dind sidecar's filesystem and the bind mount is generated correctly by gh-aw.
We have not fully root-caused this yet. Plausible causes:
GID mismatch on the bind-mounted socket. The dind daemon creates the socket with ownership root:123 (custom DOCKER_GROUP_GID, common in docker:dind). The gateway runs as --user 1001:1001 --group-add 0. The hardcoded 0 matches root group on GitHub-hosted runners (the v0.68.6 fix from Add Docker socket supplementary group to MCP gateway container command #26750/fix: compute Docker socket GID separately for shell expansion #26771) but is ineffective here. While [ -S ] should only require directory traversal permission, the actual stat() may fail under certain mount-namespace propagation modes when the file is unreadable to the calling user.
Mount-namespace propagation oddity when dockerd bind-mounts its own listening socket into a child container while both processes share the same emptyDir parent mount.
--user 1001:1001 clashing with the file ownership in a way that makes Docker silently substitute an empty directory (a behaviour that can occur when a bind mount target conflicts with image content).
Suggested directions
This is not a single bug — it's a class of incompatibilities with K8s-based self-hosted runners. We see three orthogonal improvements that would unblock a wide range of ARC setups:
Propagate DOCKER_HOST to the MCP gateway container. Add -e DOCKER_HOST to the generated docker run invocation. Smallest change, biggest impact for self-hosted users.
Make --group-add configurable / auto-detected. Either add a CLI flag (e.g. --docker-group-gid) or detect the docker socket's GID at startup (stat -c '%g' "$socket_path") and pass it through. Hardcoding 0 only works on GitHub-hosted runners.
Make container-ID detection in gh-aw-mcpg robust to non-Docker cgroup formats. Add a containerd/Kubernetes path, or treat detection failure as informational rather than letting it influence downstream logic.
A documentation page on "running gh-aw on ARC" (analogous to self-hosted runners) covering the dind sidecar pattern, the required --group-add GID, and the DOCKER_HOST propagation knob, would also be very welcome.
We're happy to test any patch on our ARC + EKS stack.
gh-aw-firewall/src/docker-manager.tsgetLocalDockerEnv() intentionally strips TCP DOCKER_HOST values to force usage of a local Unix socket — same design assumption that breaks on ARC.
Summary
When running an agentic workflow on a self-hosted runner deployed via actions-runner-controller (ARC) on Kubernetes, with the recommended Docker-in-Docker (dind) sidecar pattern, the MCP gateway fails to start with
Docker socket not found at /var/run/docker.sock— even though a Unix Docker socket is correctly exposed on the runner pod andDOCKER_HOST=unix:///var/run/docker.sockis set on the runner container.Workflow execution aborts at the gateway startup step.
Environment
gh-awCLI: v0.71.1gh-aw-mcpgimage:ghcr.io/github/gh-aw-mcpg:v0.3.0gha-runner-scale-setHelm chart 0.13.1 (OCI registry:ghcr.io/actions/actions-runner-controller-charts)ghcr.io/actions/actions-runner:2.333.1containerdruntime)pod-security.kubernetes.io/enforce: privilegedRunner pod configuration (relevant excerpt)
Standard ARC dind sidecar pattern with K8s native sidecars (
restartPolicy: Alwayson an initContainer) and a sharedemptyDirfor the Docker socket:This is the layout described in the ARC documentation for Docker-in-Docker. From the runner container's perspective,
/var/run/docker.sockis a real Unix socket anddockercommands work.Observed behaviour
The compiled workflow logs the gateway launch command:
The gateway crashes during initialization:
Root-cause analysis
We traced the failure to three independent issues in
gh-awandgh-aw-mcpg. All three are specific to Kubernetes/ARC environments and would not surface on GitHub-hosted runners.1. Container-ID detection rejects K8s/containerd cgroup names
gh-aw-mcpg/run_containerized.sh(lines 49-53) parses the cgroup hierarchy and validates the extracted ID against the Docker hash format:On EKS with containerd (and on dind with default cgroup namespacing), the gateway's
/proc/self/cgroupview contains the K8s pod name —arc-gaw-xzpj8-runner-8lthcin our log — not a Docker container hash. The regex rejects it and the script falls back to defaults.This is logged as a
[WARN]and is not directly fatal, but combined with #2 below it leads to an avoidable failure path.2.
DOCKER_HOSTis set on the runner but not propagated to the gateway containerThe same script (lines 87-94) does honour
DOCKER_HOSTwhen present:…but the
docker runcommand generated bygh-awto launch the gateway does not pass-e DOCKER_HOSTfrom the runner to the gateway container. So even when the runner exportsDOCKER_HOST=unix:///var/run/docker.sock(ortcp://…for TCP-only dind), the gateway always falls back to the hardcoded/var/run/docker.sock.In our specific case the path defaulted to is the same, so this is not what triggers the failure — but it's a latent bug that prevents any custom socket path or TCP daemon from ever working.
3. The bind-mounted socket isn't visible as a socket inside the gateway
The
[ -S /var/run/docker.sock ]test fails inside the gateway container, even though the source path is a real Unix socket on the dind sidecar's filesystem and the bind mount is generated correctly bygh-aw.We have not fully root-caused this yet. Plausible causes:
root:123(customDOCKER_GROUP_GID, common indocker:dind). The gateway runs as--user 1001:1001 --group-add 0. The hardcoded0matches root group on GitHub-hosted runners (the v0.68.6 fix from Add Docker socket supplementary group to MCP gateway container command #26750/fix: compute Docker socket GID separately for shell expansion #26771) but is ineffective here. While[ -S ]should only require directory traversal permission, the actualstat()may fail under certain mount-namespace propagation modes when the file is unreadable to the calling user.dockerdbind-mounts its own listening socket into a child container while both processes share the sameemptyDirparent mount.--user 1001:1001clashing with the file ownership in a way that makes Docker silently substitute an empty directory (a behaviour that can occur when a bind mount target conflicts with image content).Suggested directions
This is not a single bug — it's a class of incompatibilities with K8s-based self-hosted runners. We see three orthogonal improvements that would unblock a wide range of ARC setups:
DOCKER_HOSTto the MCP gateway container. Add-e DOCKER_HOSTto the generateddocker runinvocation. Smallest change, biggest impact for self-hosted users.--group-addconfigurable / auto-detected. Either add a CLI flag (e.g.--docker-group-gid) or detect the docker socket's GID at startup (stat -c '%g' "$socket_path") and pass it through. Hardcoding0only works on GitHub-hosted runners.gh-aw-mcpgrobust to non-Docker cgroup formats. Add a containerd/Kubernetes path, or treat detection failure as informational rather than letting it influence downstream logic.A documentation page on "running gh-aw on ARC" (analogous to self-hosted runners) covering the dind sidecar pattern, the required
--group-addGID, and theDOCKER_HOSTpropagation knob, would also be very welcome.We're happy to test any patch on our ARC + EKS stack.
Related
gh awworkflows) — closed by Add Docker socket supplementary group to MCP gateway container command #26750 / fix: compute Docker socket GID separately for shell expansion #26771 in v0.68.6, which addresses Unix socket GID on GitHub-hosted runners only.gh-aw-firewall/src/docker-manager.tsgetLocalDockerEnv()intentionally strips TCPDOCKER_HOSTvalues to force usage of a local Unix socket — same design assumption that breaks on ARC.