Skip to content

fix: enforce shared net namespace for child containers#1721

Closed
Mossaka wants to merge 3 commits intomainfrom
fix/130-child-container-nat-rules
Closed

fix: enforce shared net namespace for child containers#1721
Mossaka wants to merge 3 commits intomainfrom
fix/130-child-container-nat-rules

Conversation

@Mossaka
Copy link
Copy Markdown
Collaborator

@Mossaka Mossaka commented Apr 6, 2026

Summary

Closes #130

  • When --enable-dind is enabled, child containers spawned by the agent get their own network namespace on awf-net but do NOT inherit NAT rules, allowing proxy bypass
  • Converts docker-stub.sh into a dual-mode wrapper: blocks Docker when DinD is disabled (existing behavior), intercepts docker run/docker create when DinD is enabled
  • The wrapper strips user-specified --network/--net flags and injects --network container:awf-agent to force child containers to share the agent's network namespace
  • Also blocks docker network connect and docker compose to prevent network escapes
  • entrypoint.sh installs the wrapper at /tmp/awf-lib/docker and prepends it to PATH in the chroot script, so it intercepts docker commands before the real binary

Files changed

  • containers/agent/docker-stub.sh — converted from error-only stub to network-enforcing wrapper
  • src/docker-manager.ts — sets AWF_DIND_ENABLED=1 and AWF_AGENT_CONTAINER=awf-agent when DinD enabled
  • containers/agent/entrypoint.sh — installs wrapper in chroot, exports env vars, adds to PATH
  • src/docker-manager.test.ts — 2 new tests for DinD env var presence/absence

Test plan

  • npm run lint passes (0 errors)
  • npm test passes (1313 tests, +2 new)
  • Manual test: sudo awf --enable-dind --allow-domains github.com 'docker run --rm alpine wget -q -O- https://evil.com' should fail
  • Manual test: sudo awf --enable-dind --allow-domains github.com 'docker run --network host --rm alpine wget -q -O- https://evil.com' should strip --network host and use agent namespace
  • Manual test: sudo awf --allow-domains github.com 'docker ps' should still show DinD-disabled error

🤖 Generated with Claude Code

When --enable-dind is active, child containers spawned by the agent get their
own network namespace and don't inherit NAT rules, allowing proxy bypass. This
fix converts docker-stub.sh into a dual-mode wrapper that intercepts docker
run/create commands and injects --network container:awf-agent, forcing child
containers to share the agent's network namespace where all iptables rules apply.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 21:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 86.20% 86.30% 📈 +0.10%
Statements 86.07% 86.17% 📈 +0.10%
Functions 87.41% 87.41% ➡️ +0.00%
Branches 78.57% 78.62% 📈 +0.05%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/docker-manager.ts 86.6% → 87.0% (+0.42%) 86.1% → 86.5% (+0.41%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a firewall bypass when Docker-in-Docker (--enable-dind) is enabled by attempting to force all agent-spawned child containers to share the agent container’s network namespace so the agent’s NAT-to-Squid rules apply uniformly.

Changes:

  • Add DinD-specific environment variables to the agent service so the Docker stub can switch into “wrapper” mode.
  • Install and activate a Docker command wrapper in chroot mode that rewrites docker run/create to use --network container:<agent> and blocks certain network escape commands.
  • Add unit tests asserting DinD env var presence/absence in the generated compose config.
Show a summary per file
File Description
src/docker-manager.ts Sets AWF_DIND_ENABLED / AWF_AGENT_CONTAINER when DinD is enabled.
src/docker-manager.test.ts Adds tests for the new DinD environment variables.
containers/agent/entrypoint.sh Copies wrapper into chroot-accessible path and prepends it to PATH in the generated chroot script.
containers/agent/docker-stub.sh Converts Docker stub into a dual-mode wrapper that rewrites docker run/create and blocks some commands when DinD is enabled.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 5

Comment on lines +40 to +72
# Get the subcommand (first non-flag argument)
get_subcommand() {
for arg in "$@"; do
case "$arg" in
-*) continue ;;
*) echo "$arg"; return ;;
esac
done
}

SUBCOMMAND=$(get_subcommand "$@")

# Block commands that could attach containers to other networks
case "$SUBCOMMAND" in
"network")
# Check for 'docker network connect' which could bypass firewall
# Allow 'docker network ls', 'docker network inspect', etc.
shift # remove 'network'
NETWORK_SUBCMD=$(get_subcommand "$@")
if [ "$NETWORK_SUBCMD" = "connect" ]; then
echo "ERROR: 'docker network connect' is blocked by AWF firewall." >&2
echo "Child containers must share the agent's network namespace for security." >&2
exit 1
fi
exec "$REAL_DOCKER" network "$@"
;;

"run"|"create")
# Intercept 'docker run' and 'docker create' to enforce shared network namespace
# This ensures child containers use the agent's NAT rules (traffic -> Squid proxy)
CMD="$1"
shift # remove 'run' or 'create'

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_subcommand() treats the first non-dash token as the Docker subcommand, but Docker global options often take values (e.g., --context foo, -H unix:///...). In those cases the value (e.g. foo) will be mis-identified as the subcommand, and the later shift/CMD="$1" logic will rewrite or forward arguments incorrectly, breaking docker run/create/network handling and potentially skipping the network enforcement. Parse global options (including options-with-values) to find the actual COMMAND position, preserve global opts separately, and only then apply the run/create/network-specific rewriting.

Copilot uses AI. Check for mistakes.
exit 127
fi

AGENT_CONTAINER="${AWF_AGENT_CONTAINER:-awf-agent}"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWF_AGENT_CONTAINER is taken directly from the environment, which means user code can set it to any existing container name on the host and then run docker run to join that container’s network namespace (potentially one without AWF NAT/proxy rules). To prevent this bypass, don’t accept an arbitrary container name from the user environment—hardcode the expected agent container name or validate that the value matches the known agent container (and fail closed otherwise).

Suggested change
AGENT_CONTAINER="${AWF_AGENT_CONTAINER:-awf-agent}"
EXPECTED_AGENT_CONTAINER="awf-agent"
if [ -n "${AWF_AGENT_CONTAINER:-}" ] && [ "${AWF_AGENT_CONTAINER}" != "${EXPECTED_AGENT_CONTAINER}" ]; then
echo "ERROR: AWF_AGENT_CONTAINER must be '${EXPECTED_AGENT_CONTAINER}' when DinD is enabled." >&2
exit 1
fi
AGENT_CONTAINER="${EXPECTED_AGENT_CONTAINER}"

Copilot uses AI. Check for mistakes.
Comment on lines +717 to +722
# SECURITY: When DinD is enabled, prepend /tmp/awf-lib to PATH so the Docker wrapper
# is found before the real docker binary. Also export AWF env vars needed by the wrapper.
if [ "$AWF_DOCKER_WRAPPER_INSTALLED" = "true" ]; then
echo "# AWF Docker wrapper: enforce shared network namespace for child containers" >> "/host${SCRIPT_FILE}"
echo "export PATH=\"/tmp/awf-lib:\$PATH\"" >> "/host${SCRIPT_FILE}"
echo "export AWF_REAL_DOCKER=\"${AWF_REAL_DOCKER}\"" >> "/host${SCRIPT_FILE}"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only prepending /tmp/awf-lib to PATH doesn’t prevent bypass: user code can invoke the real Docker binary by absolute path, and exporting AWF_REAL_DOCKER into the user environment makes that path trivial to discover. If the goal is to enforce shared netns, consider bind-mounting the wrapper over the real docker path inside /host (so /usr/bin/docker in the chroot is the wrapper), and keep the real binary at a separate non-user-controlled path used only by the wrapper.

Suggested change
# SECURITY: When DinD is enabled, prepend /tmp/awf-lib to PATH so the Docker wrapper
# is found before the real docker binary. Also export AWF env vars needed by the wrapper.
if [ "$AWF_DOCKER_WRAPPER_INSTALLED" = "true" ]; then
echo "# AWF Docker wrapper: enforce shared network namespace for child containers" >> "/host${SCRIPT_FILE}"
echo "export PATH=\"/tmp/awf-lib:\$PATH\"" >> "/host${SCRIPT_FILE}"
echo "export AWF_REAL_DOCKER=\"${AWF_REAL_DOCKER}\"" >> "/host${SCRIPT_FILE}"
# SECURITY: When DinD is enabled, enforce the Docker wrapper by bind-mounting it
# over the real Docker binary path inside /host. This prevents bypass via absolute
# paths and avoids exposing the real Docker path in the user-visible environment.
if [ "$AWF_DOCKER_WRAPPER_INSTALLED" = "true" ]; then
AWF_DOCKER_WRAPPER_PATH="/tmp/awf-lib/docker"
AWF_HOST_DOCKER_PATH="/host${AWF_REAL_DOCKER}"
if [ ! -x "$AWF_DOCKER_WRAPPER_PATH" ]; then
echo "[entrypoint][ERROR] Docker wrapper not found or not executable: $AWF_DOCKER_WRAPPER_PATH"
exit 1
fi
if [ ! -e "$AWF_HOST_DOCKER_PATH" ]; then
echo "[entrypoint][ERROR] Host Docker binary not found: $AWF_HOST_DOCKER_PATH"
exit 1
fi
mount --bind "$AWF_DOCKER_WRAPPER_PATH" "$AWF_HOST_DOCKER_PATH"
echo "# AWF Docker wrapper: enforce shared network namespace for child containers" >> "/host${SCRIPT_FILE}"

Copilot uses AI. Check for mistakes.
Comment on lines +536 to +537
# Make AWF_DIND_ENABLED readonly to prevent tampering by user code
readonly AWF_DIND_ENABLED
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly AWF_DIND_ENABLED here only affects the entrypoint shell; it doesn’t make the variable immutable for user code running later inside the chroot (which can still export AWF_DIND_ENABLED=...). If you intend tamper resistance, enforce it in the docker wrapper itself (e.g., ignore env overrides) or emit readonly in the generated chroot script (though even that can be bypassed by running a different shell).

Suggested change
# Make AWF_DIND_ENABLED readonly to prevent tampering by user code
readonly AWF_DIND_ENABLED
# Do not rely on AWF_DIND_ENABLED for tamper resistance here: a readonly variable
# only affects this shell. DinD/network enforcement is provided by the installed
# Docker wrapper itself, not by making this environment variable readonly.

Copilot uses AI. Check for mistakes.
// Set DinD environment variables so the docker-stub wrapper enforces
// shared network namespace for child containers (prevents proxy bypass)
environment.AWF_DIND_ENABLED = '1';
environment.AWF_AGENT_CONTAINER = 'awf-agent';
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWF_AGENT_CONTAINER is set to the hard-coded string 'awf-agent' even though this file already defines AGENT_CONTAINER_NAME = 'awf-agent'. Using the constant here avoids drift if the container name ever changes and keeps compose generation/tests in sync.

Suggested change
environment.AWF_AGENT_CONTAINER = 'awf-agent';
environment.AWF_AGENT_CONTAINER = AGENT_CONTAINER_NAME;

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Smoke test results — run 24051295776

✅ GitHub MCP: "⚡ Token optimization for pelis-agent-factory-advisor", "Optimize security-review workflow"
✅ Playwright: GitHub page title contains "GitHub"
✅ File write: /tmp/gh-aw/agent/smoke-test-claude-24051295776.txt created and verified
✅ Bash: file content confirmed

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Smoke test results:

  • GitHub MCP (last 2 merged PR titles): ✅ fix: enable color output when --tty flag is set; Optimize security-review workflow: ~67% cost reduction, 55% fewer LLM turns
  • safeinputs-gh PR query (2 PRs): ❌ (safeinputs-gh unavailable)
  • Playwright title contains "GitHub": ❌ (MCP EACCES writing playwright logs)
  • Tavily search results returned: ❌ (Tavily MCP unavailable)
  • File write /tmp/gh-aw/agent/smoke-test-codex-24051295868.txt: ✅
  • Bash cat verification: ✅
  • Discussion mystical comment posted: ✅ (discussion [Security Review] Daily Security Review and Threat Modeling – 2026-03-30 #1503)
  • Build (npm ci && npm run build): ✅
    Overall status: FAIL

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔥 Smoke Test Results

Test Result
GitHub MCP connectivity
GitHub.com HTTP connectivity ❌ (pre-step data unavailable)
File write/read ❌ (pre-step data unavailable)

PR: "fix: enforce shared net namespace for child containers" — @Mossaka
Reviewers: @Copilot

Overall: FAIL — smoke-data step outputs not injected into prompt (template variables unresolved).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.14.1 v20.20.2
Go go1.22.12 go1.22.12

Overall: ❌ Not all versions match — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Smoke Test: GitHub Actions Services Connectivity

Check Status Notes
Redis PING (port 6379) ❌ Failed redis-cli not installed; port unreachable
pg_isready (port 5432) ❌ Failed no response
psql SELECT 1 (smoketest db) ❌ Failed Port closed/unreachable

All 3 checks failed. host.docker.internal resolves to 172.17.0.1 but both ports 6379 and 5432 are unreachable from this environment. The GitHub Actions service containers (Redis, PostgreSQL) do not appear to be running or accessible.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Security Review — PR #1721

The direction here is correct (child containers must share the agent's network namespace to inherit NAT rules). However, the implementation exposes two high-severity bypass paths that should be fixed before merge.


🔴 HIGH — AWF_REAL_DOCKER exported to user environment enables full wrapper bypass

File: containers/agent/entrypoint.sh (line ~727)
File: containers/agent/entrypoint.sh (line ~719, startup script)

# entrypoint.sh — wrapper installation
export AWF_REAL_DOCKER="$REAL_DOCKER_PATH"   # e.g. /usr/bin/docker

# startup script written into the chroot
echo "export AWF_REAL_DOCKER=\"${AWF_REAL_DOCKER}\"" >> "/host${SCRIPT_FILE}"

AWF_REAL_DOCKER is exported into the user's shell environment, giving user code a direct pointer to the real Docker binary. Any agent or script can bypass the wrapper entirely:

# Inside the agent command — calls real docker, no wrapper filtering
$AWF_REAL_DOCKER run --network host --rm alpine wget -q -O- (evil.com/redacted)
# or directly by path if it's /usr/bin/docker:
/usr/bin/docker run --network host --rm alpine wget -q -O- (evil.com/redacted)

Suggested fix: Do not export AWF_REAL_DOCKER into the user environment. The wrapper needs it, but it should only be visible to the wrapper process itself. One approach is to hard-code the path into the wrapper script at install time (e.g., sed -i the path into the script), rather than passing it as a runtime env var.


🔴 HIGH — AWF_AGENT_CONTAINER is mutable; setting it to awf-squid bypasses all domain filtering

File: containers/agent/docker-stub.sh (lines 36–37)

AGENT_CONTAINER="${AWF_AGENT_CONTAINER:-awf-agent}"
...
exec "$REAL_DOCKER" "$CMD" --network "container:${AGENT_CONTAINER}" "${FILTERED_ARGS[@]}"

File: containers/agent/entrypoint.sh (startup script, line ~730)

echo "export AWF_AGENT_CONTAINER=\"${AWF_AGENT_CONTAINER:-awf-agent}\"" >> "/host${SCRIPT_FILE}"

AWF_AGENT_CONTAINER is a regular (non-readonly) env var in the user's shell. The Squid proxy container (awf-squid) at 172.30.0.10 has unconditional outbound ACCEPT in iptables. User code can redirect child containers to share squid's network namespace:

export AWF_AGENT_CONTAINER=awf-squid
docker run --rm alpine wget -q -O- (evil.com/redacted)  # unrestricted outbound

The wrapper dutifully injects --network container:awf-squid, and the child container inherits squid's fully-permissive outbound access, bypassing all domain filtering.

Note: AWF_DIND_ENABLED is protected with readonly in the entrypoint, but that readonly attribute only applies to the entrypoint's bash process — it does not carry through to child processes or the script written to disk. The env var in the user's shell is a plain export, fully mutable.

Suggested fix: Hardcode the container name in the wrapper script at install time (same approach as the AWF_REAL_DOCKER fix above — bake it in with sed or template substitution), so it cannot be overridden at runtime.


🟡 MEDIUM — docker build / docker buildx pass through without network enforcement

File: containers/agent/docker-stub.sh (lines 97–100)

  *)
    # All other commands (ps, logs, inspect, exec, build, images, etc.) pass through
    exec "$REAL_DOCKER" "$@"
    ;;

docker build creates intermediate containers for each RUN step via BuildKit. These build containers are spawned by the Docker daemon in their own network namespace and do not inherit the agent container's NAT rules (which redirect port 80/443 to Squid). A Dockerfile with:

RUN wget -q -O- (evil.com/redacted)

would succeed because the build container has direct outbound access, not subject to the Squid domain allowlist.

Suggested fix: Block docker build and docker buildx build explicitly, or document this as a known limitation of DinD mode. Blocking is straightforward:

  "build"|"buildx")
    echo "ERROR: 'docker build' is blocked by AWF firewall in DinD mode." >&2
    echo "Build containers cannot inherit the agent's NAT rules." >&2
    exit 1
    ;;

Summary

# Severity Issue Needs fix before merge?
1 🔴 HIGH AWF_REAL_DOCKER in user env = wrapper bypass Yes
2 🔴 HIGH AWF_AGENT_CONTAINER mutable → squid namespace takeover Yes
3 🟡 MEDIUM docker build bypasses namespace enforcement Yes (or documented)

The underlying approach — shared network namespace via --network container:awf-agent — is sound. The fixes needed are: bake AWF_REAL_DOCKER and AWF_AGENT_CONTAINER into the wrapper script at install time rather than passing them as runtime env vars, and block docker build.

Generated by Security Guard for issue #1721 · ● 108.7K ·

- Fix get_subcommand() parsing: match against known Docker subcommands
  instead of assuming first non-flag token is the subcommand, preventing
  misidentification of global option values (e.g., --context foo)
- Hardcode agent container name 'awf-agent' in docker-stub.sh instead of
  reading from AWF_AGENT_CONTAINER env var to prevent namespace hijacking
- Stop exporting AWF_REAL_DOCKER to user environment; write the real
  Docker path to /tmp/awf-lib/.docker-path file that only the wrapper reads
- Add comment acknowledging that readonly AWF_DIND_ENABLED only protects
  the entrypoint shell, not subshells — real enforcement is the wrapper
- Use AGENT_CONTAINER_NAME constant instead of string literal in docker-manager.ts
- Block docker build/buildx commands to prevent BuildKit containers from
  bypassing NAT rules with unrestricted network access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Mossaka
Copy link
Copy Markdown
Collaborator Author

Mossaka commented Apr 6, 2026

Review feedback addressed in f18e2a0

All 5 Copilot review comments have been addressed, plus one additional security fix:

1. get_subcommand() parsing bug (docker-stub.sh)

Replaced the "first non-flag token" heuristic with a check against a list of known Docker subcommands (run, create, build, network, compose, etc.). Unknown tokens are now skipped, so global option values like --context foo are no longer misidentified as subcommands.

2. AWF_AGENT_CONTAINER tamperable from user code (docker-stub.sh)

Hardcoded the container name awf-agent directly in the script instead of reading from the environment variable. User code can no longer set AWF_AGENT_CONTAINER to join an arbitrary container's network namespace.

3. AWF_REAL_DOCKER exposure (entrypoint.sh + docker-stub.sh)

AWF_REAL_DOCKER is no longer exported to the user environment. Instead, the real Docker binary path is written to a private file (/tmp/awf-lib/.docker-path) during wrapper installation, and docker-stub.sh reads it from there. This prevents user code from discovering and invoking the real Docker binary directly.

4. readonly AWF_DIND_ENABLED limitation (entrypoint.sh)

Added a comment explaining that shell readonly only affects the current process and does not propagate to subshells/user code. The comment clarifies that the real security enforcement is the Docker wrapper's interception of docker commands via PATH precedence, not the env var.

5. Hardcoded 'awf-agent' vs constant (docker-manager.ts)

Changed environment.AWF_AGENT_CONTAINER = 'awf-agent' to use the existing AGENT_CONTAINER_NAME constant.

6. docker build bypasses namespace enforcement (docker-stub.sh) — bonus fix

Added docker build and docker buildx to the blocked commands list, since BuildKit intermediate containers get their own network namespace and are not subject to NAT rules, allowing unrestricted outbound requests that bypass the Squid proxy.

All changes pass lint (0 errors) and all 1313 unit tests.

@Mossaka
Copy link
Copy Markdown
Collaborator Author

Mossaka commented Apr 6, 2026

Security Review (Post-Fix)

Reviewed all 4 changed files in the context of the existing entrypoint, docker-manager, and container architecture. This is a well-structured fix for a real proxy bypass vulnerability. Below are findings ranked by severity.


[MEDIUM] Real Docker binary path readable by user code via /tmp/awf-lib/.docker-path

  • File: containers/agent/docker-stub.sh:37-38 and containers/agent/entrypoint.sh:525-526
  • Code:
    echo "$REAL_DOCKER_PATH" > /host/tmp/awf-lib/.docker-path
    chmod 444 /host/tmp/awf-lib/.docker-path
  • Explanation: The .docker-path file is chmod 444 (world-readable). User code running as the unprivileged host user can cat /tmp/awf-lib/.docker-path to discover the real Docker binary path (e.g., /usr/bin/docker.real or /usr/local/bin/docker). Since the Docker socket is mounted read-write (/var/run/docker.sock:/host/var/run/docker.sock:rw) when DinD is enabled, the user can invoke the real binary directly: $(/tmp/awf-lib/.docker-path) run --network host alpine wget evil.com.
  • Impact: Complete bypass of the Docker wrapper. The wrapper only works because PATH precedence puts /tmp/awf-lib/docker first, but if the user knows the real path, they can call it directly.
  • Suggested action: Make the file readable only by root (chmod 400). The wrapper runs the cat before capsh --user drops to the unprivileged user, so root-only permissions work. Alternatively, delete the file after reading it in the wrapper (but this breaks subsequent docker calls). The best approach is chmod 400 combined with the wrapper executing as a SUID-root helper, but given the entrypoint already runs as root and privilege drop happens later, chmod 400 alone should suffice if the wrapper is invoked before privilege drop. However, in the current flow the wrapper runs after privilege drop (user code calls docker run), so the wrapper itself runs as the unprivileged user and cannot read a root-only file. This means chmod 400 alone would break the wrapper. A better approach: have entrypoint.sh embed the real docker path directly into the wrapper script (e.g., sed -i "s|REAL_DOCKER_PLACEHOLDER|$REAL_DOCKER_PATH|" /host/tmp/awf-lib/docker) instead of using an external config file. Then there is no file to read.

[MEDIUM] get_subcommand() skips global options with values, enabling bypass for unknown global flags

  • File: containers/agent/docker-stub.sh:60-78
  • Code:
    get_subcommand() {
      for arg in "$@"; do
        case "$arg" in
          -*) continue ;;
          *)
            for cmd in $KNOWN_SUBCOMMANDS; do
              if [ "$arg" = "$cmd" ]; then echo "$arg"; return; fi
            done
            continue ;;
        esac
      done
    }
  • Explanation: Docker global options like --context, --host/-H, --config, --tls* take values. The function skips flag-like tokens (-*) and skips unknown non-flag tokens (assuming they are values to global options). This is reasonable but has an edge case: docker --host unix:///var/run/docker.sock run --network host alpine sh. Here unix:///var/run/docker.sock is correctly skipped as unknown, and run is found. This works. However, consider: docker -H run alpine sh where -H expects a value and run is consumed as that value by Docker. The wrapper would see run as a known subcommand and intercept, but Docker itself would fail because -H run is invalid. This is a false positive (safe side), not a bypass.
  • Actual concern: If Docker adds new global options in the future that take values matching known subcommand names, or if a Docker plugin introduces a subcommand not in KNOWN_SUBCOMMANDS, the wrapper would pass it through unintercepted. The *) exec "$REAL_DOCKER" "$@" catch-all passes unknown subcommands directly to the real Docker binary without any network enforcement.
  • Suggested action: Consider a deny-by-default approach: instead of passing unknown subcommands through, block them with an error. Only allow explicitly known-safe subcommands (ps, logs, inspect, images, exec, rm, rmi, tag, stop, start, kill, wait, top, stats, cp, diff, export, history, import, load, save, port, rename, version, info, events). This prevents future Docker subcommands from silently bypassing the wrapper.

[MEDIUM] shift in network case operates on wrong positional parameters

  • File: containers/agent/docker-stub.sh:87-95
  • Code:
    "network")
      shift  # remove 'network'
      NETWORK_SUBCMD=$(get_subcommand "$@")
  • Explanation: The shift here operates on the script's positional parameters (the full docker argument list), not on a local scope. When the script is invoked as docker network connect foo bar, $@ is network connect foo bar. The shift removes network, leaving $@ as connect foo bar, then get_subcommand correctly finds connect. However, exec "$REAL_DOCKER" network "$@" then calls docker network connect foo bar which is correct because network was shifted off and re-added. This actually works correctly. No issue.

[LOW] AWF_AGENT_CONTAINER env var exported but unused by wrapper

  • File: src/docker-manager.ts (new line) and containers/agent/docker-stub.sh:51
  • Code:
    environment.AWF_AGENT_CONTAINER = AGENT_CONTAINER_NAME;
    AGENT_CONTAINER="awf-agent"  # hardcoded in wrapper
  • Explanation: docker-manager.ts sets AWF_AGENT_CONTAINER=awf-agent in the container environment, but the wrapper hardcodes AGENT_CONTAINER="awf-agent" and never reads the env var. The env var is dead code that leaks implementation details. While not a bypass (the wrapper ignores it), it is unnecessary information exposure.
  • Suggested action: Remove environment.AWF_AGENT_CONTAINER = AGENT_CONTAINER_NAME; from docker-manager.ts since the wrapper intentionally hardcodes the value. Also remove the corresponding test assertion.

[LOW] Wrapper can be bypassed via direct Docker API calls over the socket

  • File: containers/agent/entrypoint.sh (general architecture)
  • Explanation: The wrapper intercepts the docker CLI binary via PATH precedence. But when DinD is enabled, the Docker socket (/var/run/docker.sock) is mounted read-write. User code can communicate with the Docker daemon directly via the socket using curl --unix-socket /var/run/docker.sock http://localhost/v1.43/containers/create ... or any HTTP library, completely bypassing the wrapper.
  • Impact: Full proxy bypass. This is an inherent limitation of the wrapper-based approach when the Docker socket is accessible.
  • Suggested action: This is acknowledged as a known limitation of the DinD feature (the PR description says "firewall bypass possible" was the old warning). For defense-in-depth, consider documenting this limitation clearly. A more robust solution would require a Docker authorization plugin (--authorization-plugin) that enforces network namespace sharing at the daemon level, but that is a larger effort. At minimum, add a comment in the code acknowledging this limitation.

[LOW] No TOCTOU concern for wrapper installation

  • File: containers/agent/entrypoint.sh:515-545
  • Explanation: The wrapper is installed before privilege drop and before user code executes (the script file with PATH prepend is written to SCRIPT_FILE, then capsh --user executes it). There is no window where user code runs concurrently with wrapper installation. This is secure.

[INFO] docker exec passes through without network enforcement

  • File: containers/agent/docker-stub.sh:155 (the *) exec "$REAL_DOCKER" "$@" catch-all)
  • Explanation: docker exec into a child container is passed through. This is correct -- exec runs a command in an existing container's namespace, so it inherits whatever network the container already has (which should be the agent's namespace due to docker run interception).

[INFO] --net-alias, --network-alias not stripped

  • File: containers/agent/docker-stub.sh:103-122
  • Explanation: The wrapper strips --network and --net but not --net-alias or --network-alias. These flags set DNS aliases within a network but do not change which network the container joins. Since the wrapper forces --network container:awf-agent, these aliases are irrelevant (container network mode doesn't support aliases). Docker will likely ignore them or error. No bypass.

Summary

The fix correctly addresses the core vulnerability: child containers spawned via docker run/docker create are now forced to share the agent's network namespace, inheriting NAT rules. The approach of intercepting via PATH precedence and stripping --network flags is sound.

Key concern: The .docker-path file is world-readable, allowing user code to discover and directly invoke the real Docker binary, bypassing the wrapper entirely. This should be addressed by embedding the path directly into the wrapper script during installation rather than using an external config file.

Secondary concern: The catch-all *) exec "$REAL_DOCKER" "$@" passes unknown subcommands through without enforcement. A deny-by-default approach would be more robust against future Docker CLI changes.

Architectural limitation: The Docker socket is mounted read-write, so any code that speaks the Docker API directly (via the socket) can bypass the wrapper entirely. This is inherent to the DinD approach and should be documented.

The docker build, docker buildx, docker compose, and docker network connect blocks are appropriate and well-justified. The --network/--net stripping handles both =value and separate-value forms correctly.

Security hardening for the Docker wrapper in DinD mode:

1. Bind-mount wrapper over real Docker binary: The wrapper is now
   bind-mounted over the original Docker binary path (e.g., /usr/bin/docker)
   inside /host, so even absolute-path invocations go through the wrapper.
   Previously, only PATH precedence was used, which could be bypassed.

2. Hide real Docker binary: The real Docker binary is copied to a hidden
   location (/tmp/awf-lib/.docker-real) and the path is hardcoded in the
   wrapper script. No environment variables or world-readable config files
   expose the real binary location.

3. Remove AWF_AGENT_CONTAINER env var: The agent container name is now
   hardcoded in the wrapper script only. Previously it was passed as an
   env var that user code could override to join arbitrary container
   namespaces (e.g., awf-squid which has unrestricted outbound access).

4. Fix global options parsing: Added split_at_subcommand() to properly
   separate Docker global options from subcommand arguments, fixing cases
   like `docker --context foo run` where global option values were
   misidentified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Mossaka
Copy link
Copy Markdown
Collaborator Author

Mossaka commented Apr 6, 2026

Additional security hardening in e8162c8

The previous fix (f18e2a0) addressed the three security findings by storing the real Docker path in a file instead of an env var, hardcoding the agent container name, and blocking docker build/buildx. However, a deeper analysis revealed that the wrapper could still be bypassed by invoking the real Docker binary by absolute path (e.g., /usr/bin/docker). This commit adds defense-in-depth:

1. Bind-mount wrapper over real Docker binary (strongest fix)

The wrapper is now bind-mounted over the original Docker binary path inside /host (e.g., mount --bind /host/tmp/awf-lib/docker /host/usr/bin/docker). This means even if user code calls /usr/bin/docker directly, it hits the wrapper. If the bind-mount fails (e.g., insufficient privileges), PATH precedence is still in effect as a fallback.

2. Hidden real Docker binary with hardcoded path

The real Docker binary is copied to /tmp/awf-lib/.docker-real (a hidden file), and this path is hardcoded in the wrapper script — no config file or environment variable points to it. The previous .docker-path config file (mode 444, world-readable) has been removed entirely.

3. Removed AWF_AGENT_CONTAINER from Docker Compose environment

Since the wrapper already hardcodes awf-agent, passing AWF_AGENT_CONTAINER as an env var was unnecessary and could be misleading. Removed from docker-manager.ts; test updated to assert it's undefined.

4. Fixed global options parsing in wrapper

Added split_at_subcommand() to properly separate Docker global options (before the subcommand) from subcommand arguments (after). This fixes the original Copilot review finding about docker --context foo run being misparsed, ensuring global options are preserved and forwarded correctly.

All 1313 unit tests pass, 0 lint errors.

@Mossaka
Copy link
Copy Markdown
Collaborator Author

Mossaka commented Apr 6, 2026

we should

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] Child containers don't inherit NAT rules - proxy bypass possible

2 participants