-
Notifications
You must be signed in to change notification settings - Fork 2
feat: remove Docker-in-Docker support #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Test Coverage Report
Coverage ThresholdsThe project has the following coverage thresholds configured:
Coverage report generated by `npm run test:coverage` |
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges... |
|
📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident... |
Add verification steps to ensure Node.js 22 is installed correctly: - Remove any existing nodejs packages before installation - Verify Node.js version is v22.x after installation - Verify npx is available This fixes CI failures where NodeSource setup silently failed and fell back to Ubuntu's default Node.js 12, which doesn't include npx. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
|
💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges... |
Smoke Test ResultsLast 2 Merged PRs:
Test Results:
Status: PASS
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR removes all Docker-in-Docker (DinD) functionality from the firewall, transitioning to an architecture where agents only connect to remote MCP servers and cannot spawn local Docker containers. This change eliminates a significant attack surface by removing Docker socket exposure, container escape vectors, and privilege escalation opportunities via the docker group.
Changes:
- Removed Docker CLI installation, docker-wrapper.sh interception script, and all Docker socket mounting logic
- Deleted docker-egress.test.ts (349 lines) and docker-in-docker.sh example
- Updated 14 documentation files to remove DinD references and related troubleshooting
- Removed test-docker-egress CI job and docker-in-docker example test
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| containers/agent/Dockerfile | Removed Docker CLI installation and wrapper symlink setup |
| containers/agent/entrypoint.sh | Removed Docker socket permissions and docker group configuration |
| containers/agent/docker-wrapper.sh | Deleted Docker command interceptor script |
| src/docker-manager.ts | Removed Docker socket mount, DOCKER_HOST/DOCKER_CONTEXT env vars, and .docker config creation |
| src/docker-manager.test.ts | Removed test assertions for Docker socket mounts and .docker config |
| src/types.ts | Updated JSDoc to remove Docker socket reference |
| src/cli.ts | Updated --env-all help text to remove DOCKER_HOST reference |
| tests/integration/docker-egress.test.ts | Deleted entire DinD security test suite |
| tests/integration/volume-mounts.test.ts | Renumbered tests after removing Docker socket test |
| examples/docker-in-docker.sh | Deleted DinD example script |
| examples/README.md | Removed docker-in-docker.sh from examples table |
| docs/usage.md | Removed Docker-in-Docker examples and --network host warnings |
| docs/troubleshooting.md | Removed DinD troubleshooting section and docker-wrapper.log reference |
| docs/environment.md | Removed DOCKER_HOST/DOCKER_CONTEXT/DOCKER_CONFIG from excluded variables list |
| docs/architecture.md | Removed Docker CLI, socket mount, and wrapper references |
| docs-site/src/content/docs/reference/security-architecture.md | Removed docker-wrapper.sh section and spawned container bypass examples |
| docs-site/src/content/docs/reference/cli-reference.md | Removed Docker socket from default mounts list |
| README.md | Removed Docker-in-Docker support from features list |
| CLAUDE.md | Removed MCP configuration section and Docker wrapper references |
| AGENTS.md | Removed Docker CLI, socket mount, and wrapper references |
| .github/workflows/test-integration.yml | Removed test-docker-egress job |
| .github/workflows/test-examples.yml | Removed docker-in-docker.sh test |
| .github/workflows/security-guard.md | Removed Docker wrapper from security components list |
| .github/workflows/firewall-escape-test.md | Removed docker-wrapper.sh from source files list and wrapper question from architecture section |
| .claude/skills/debug-firewall/SKILL.md | Removed docker-wrapper.log references from troubleshooting |
Comments suppressed due to low confidence (2)
docs-site/src/content/docs/reference/security-architecture.md:65
- The 'Spawned Containers' subgraph should be removed from the architecture diagram as docker-in-docker functionality is being eliminated. This includes the associated edges showing AGENT spawning CHILD and CHILD traffic routing.
subgraph "Spawned Containers"
CHILD["docker run ..."]
end
docs-site/src/content/docs/reference/security-architecture.md:93
- The reference to 'containers we didn't create directly' and 'When the agent runs
docker run, the spawned container joinsawf-net' should be removed or reworded. With DinD removal, the agent no longer spawns containers. The description should focus on the DOCKER-USER chain enforcing rules for the awf-net network without mentioning spawned containers.
**Host iptables (DOCKER-USER chain)** — The outermost defense. Docker evaluates DOCKER-USER rules *before* container-specific chains, making it the right place to catch traffic from containers we didn't create directly. When the agent runs `docker run`, the spawned container joins `awf-net` and its egress hits DOCKER-USER where we route it through the proxy.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add tests to verify Docker CLI is not available in agent container after PR #205 removed Docker-in-Docker support. Tests verify: - docker command not available - docker run fails gracefully - docker-compose not available - docker socket not mounted Note: Tests currently fail against pre-built registry images which still have Docker installed. Tests will pass once new images are built from current code, or when using buildLocal (pending NodeSource fix). Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Add tests to verify Docker CLI is not available in agent container after PR #205 removed Docker-in-Docker support. Tests verify: - docker command not available - docker run fails gracefully - docker-compose not available - docker socket not mounted Note: Tests currently fail against pre-built registry images which still have Docker installed. Tests will pass once new images are built from current code, or when using buildLocal (pending NodeSource fix). Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
* Initial plan * test: add Docker-in-Docker removal regression tests Add tests to verify Docker CLI is not available in agent container after PR #205 removed Docker-in-Docker support. Tests verify: - docker command not available - docker run fails gracefully - docker-compose not available - docker socket not mounted Note: Tests currently fail against pre-built registry images which still have Docker installed. Tests will pass once new images are built from current code, or when using buildLocal (pending NodeSource fix). Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Resolved conflicts in containers/agent/Dockerfile: - Kept BASE_IMAGE argument for custom base image support - Combined Node.js installation approaches: check if v22 exists first (for runner images), otherwise install with verification from main - Adopted docker-stub.sh from main (Docker-in-Docker support removed in v0.9.1) - Removed Docker CLI installation and docker wrapper logic (no longer needed) Docker-in-Docker support was removed in main via PR #205, so the docker wrapper improvements from this branch are now obsolete. The BASE_IMAGE feature and Node.js handling improvements remain relevant for runner parity. All tests pass (551 tests).
Investigated failing GitHub MCP and Playwright tests in run #21231821036. Root cause: MCP servers configured to use Docker containers, but AWF removed Docker-in-Docker support in v0.9.1 (PR #205). The docker-stub.sh inside the agent container returns error 127 for any Docker command. This is an architecture incompatibility, not a bug in gh-aw-firewall. The fix needs to be made in the gh-aw repository. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n-Docker (#627) The nightly MCP stress test workflow was blocked with 0% test coverage because it attempted to launch MCP server containers from within AWF, where Docker-in-Docker was removed in v0.9.1. ## Changes **Workflow configuration** (`.github/workflows/nightly-mcp-stress-test.md`) - Add `sandbox.mcp` configuration with gateway container and 20 MCP servers - Remove Go setup step (no longer building gateway in AWF) - Restrict filesystem mount to `/tmp/mcp-test-fs` subdirectory **Agent instructions** (`.github/agentics/nightly-mcp-stress-test.md`) - Remove gateway build/launch commands (`make build`, `./awmg`) - Update test approach to use MCP tools through pre-configured infrastructure - Remove gateway lifecycle management steps ## Architecture **Before:** ``` AWF Container └─ Agent → build gateway → launch containers → ❌ Docker unavailable ``` **After:** ``` MCP Gateway Container (outside AWF) └─ Launches 20 MCP servers via Docker ✅ AWF Container └─ Agent → HTTP/MCP → Gateway ✅ ``` The gateway now runs as a trusted external service where Docker is available, while the agent communicates with it via HTTP from within AWF's security boundary. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>[mcp-stress-test] Nightly MCP Stress Test Blocked: Docker-in-Docker Not Available in AWF Environment</issue_title> <issue_description>## Critical Blocker for Nightly Stress Test Workflow The nightly MCP server stress test workflow **cannot execute** due to a fundamental environment constraint: Docker-in-Docker support is not available in the AWF firewall container. ### Test Session Details - **Test Session:** `stress-test-20260204-033819` - **Test Date:** 2026-02-04T03:42:00Z - **Workflow:** `.github/workflows/nightly-mcp-stress-test.md` - **Status:** ❌ **BLOCKED - Cannot Execute** ### Problem Summary The stress test attempts to launch 20 MCP servers as Docker containers, but **all 20 servers fail immediately** because Docker commands are blocked by AWF. **Error Message from MCP Gateway:** `````` ERROR: Docker-in-Docker support was removed in AWF v0.9.1 Docker commands are no longer available inside the firewall container. If you need to: - Use MCP servers: Migrate to stdio-based MCP servers (see docs) - Run Docker: Execute Docker commands outside AWF wrapper - Build images: Run Docker build before invoking AWF See PR #205: github/gh-aw-firewall#205 `````` ### Root Cause 1. **AWF Security Policy:** Docker-in-Docker explicitly disabled in AWF v0.9.1 (PR #205) 2. **Test Design:** All 20 MCP servers configured as `container: "mcp/*"` or `container: "ghcr.io/*"` 3. **Gateway Behavior:** Gateway uses `docker run` to launch container-based servers 4. **Environment:** Workflow runs inside AWF firewall container with no Docker access 5. **Result:** Zero servers can launch → zero servers can be tested ### Impact **Test Coverage:** 0/20 servers tested (0%) All 20 attempted servers failed with identical Docker availability errors: - `github` (ghcr.io/github/github-mcp-server:v0.30.2) - `filesystem` (mcp/filesystem) - `memory` (mcp/memory) - `sqlite` (mcp/sqlite) - `postgres` (mcp/postgres) - `brave-search` (mcp/brave-search) - `fetch` (mcp/fetch) - `puppeteer` (mcp/puppeteer) - `slack` (mcp/slack) - `gdrive` (mcp/gdrive) - `google-maps` (mcp/google-maps) - `everart` (mcp/everart) - `sequential-thinking` (mcp/sequential-thinking) - `aws-kb-retrieval` (mcp/aws-kb-retrieval) - `linear` (mcp/linear) - `sentry` (mcp/sentry) - `raygun` (mcp/raygun) - `git` (mcp/git) - `time` (mcp/time) - `axiom` (mcp/axiom) ### What Actually Worked ✅ The MCP Gateway behaved correctly: - Binary compiled successfully - Configuration parsed correctly (20 servers loaded) - Server started and bound to port 3000 - Detected AWF environment correctly - Provided clear, actionable error messages **This is not a gateway bug** - it's an environment incompatibility between the test design and AWF constraints. ## Resolution Options ### Option 1: Run Workflow Outside AWF (Recommended) **Pros:** - No code changes needed - Tests gateway as designed (with container launching) - Quick to implement **Cons:** - Less security isolation - May require different workflow runner **Implementation:** - Modify workflow to run on standard GitHub runner (not AWF container) - OR: Run workflow on self-hosted runner with Docker access ### Option 2: Use HTTP-Based MCP Servers **Pros:** - Servers run outside workflow (no Docker needed) - Tests gateway's HTTP proxy capabilities - Maintains security boundary **Cons:** - Requires pre-deployed MCP servers - Doesn't test gateway's container launching - Complex infrastructure setup **Implementation:** - Deploy MCP servers externally (e.g., cloud instances) - Configure stress test with `type: "http"` and `url` instead of `container` ### Option 3: Use Stdio-Based Non-Container Servers **Pros:** - Can run inside AWF - Tests gateway stdio capabilities - No Docker dependency **Cons:** - Requires rewriting/rebuilding MCP servers as binaries - Most MCP servers distributed as containers only - Significant development effort **Implementation:** - Build or find stdio-compatible MCP server binaries - Deploy binaries into workflow environment - Configure with `command` instead of `container` ### Option 4: Hybrid Approach **Pros:** - Partial test coverage better than none - Incremental improvement possible - Flexible **Cons:** - Incomplete coverage - Maintains complexity **Implementation:** - Identify which servers can run as stdio processes - Test subset (e.g., 5-10 servers) - Document remaining servers as "requires Docker" ### Option 5: Disable Stress Test **Pros:** - Acknowledges limitation clearly - Frees up workflow resources - Simple **Cons:** - Zero multi-server test coverage - No regression detection for scaling issues **Implementation:** - Disable `.github/workflows/nightly-mcp-stress-test.md` workflow - Document as known limitation in README ## Recommendations ### Immediate Actions 1. ✅ **Document blocker** (this issue) 2. 🔲 **Disable workflow** until resol... </details> > **Custom agent used: agentic-workflows** > GitHub Agentic Workflows (gh-aw) - Create, debug, and upgrade AI-powered workflows with intelligent prompt routing <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #626 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.
Complete removal of docker-in-docker functionality. After this change, agents connect only to remote MCP servers and cannot spawn local Docker containers from within the agent container.
Deleted Files
containers/agent/docker-wrapper.sh— Docker command interceptortests/integration/docker-egress.test.ts— DinD security testsexamples/docker-in-docker.sh— DinD exampleCore Changes
DOCKER_HOST/DOCKER_CONTEXTenv vars,.dockerconfig dir creationCI/CD
test-docker-egressjob fromtest-integration.ymltest-examples.ymlDocumentation
Updated 14 files across
docs/,docs-site/,CLAUDE.md,AGENTS.md,README.md, and workflow/skill files to remove all DinD references.Security Impact
Eliminates attack surface:
Original prompt
Plan: Complete Removal of Docker-in-Docker (DinD) Support
Overview
This plan details the complete removal of docker-in-docker functionality from gh-aw-firewall. After this change, agents will only connect to remote MCP servers and will not be able to spawn local Docker containers from within the agent container.
Scope Summary
Phase 1: Remove Core DinD Implementation
1.1 DELETE
containers/agent/docker-wrapper.shFile:
containers/agent/docker-wrapper.sh(101 lines)This script intercepts
docker runcommands and injects network/proxy configuration. Remove entirely.1.2 MODIFY
containers/agent/DockerfileFile:
containers/agent/DockerfileRemove Docker CLI installation (lines 19-25):
# DELETE: Install Docker CLI for MCP servers that run as containers install -m 0755 -d /etc/apt/keyrings && \ curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc && \ ... apt-get install -y docker-ce-cli && \Remove docker-wrapper setup (lines 39-49):
1.3 MODIFY
containers/agent/entrypoint.shFile:
containers/agent/entrypoint.shRemove Docker socket setup (lines 101-114):
Remove DISABLE_DOCKER_ACCESS handling and docker group assignment (lines 129-145):
1.4 MODIFY
src/docker-manager.tsFile:
src/docker-manager.tsRemove from EXCLUDED_ENV_VARS set (lines 203-205):
Remove DOCKER_HOST and DOCKER_CONTEXT env vars (lines 224-225):
Remove Docker socket mount (lines 267-268):
Remove clean Docker config mounts (lines 270-273):
Remove Docker config directory creation (lines 381-396 in writeConfigs()):
Phase 2: Remove Tests
2.1 DELETE
tests/integration/docker-egress.test.tsFile: `tests/in...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.