Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker #627

Copilot · 2026-02-04T03:48:50Z

The nightly MCP stress test workflow was blocked with 0% test coverage because it attempted to launch MCP server containers from within AWF, where Docker-in-Docker was removed in v0.9.1.

Changes

Workflow configuration (.github/workflows/nightly-mcp-stress-test.md)

Add sandbox.mcp configuration with gateway container and 20 MCP servers
Remove Go setup step (no longer building gateway in AWF)
Restrict filesystem mount to /tmp/mcp-test-fs subdirectory

Agent instructions (.github/agentics/nightly-mcp-stress-test.md)

Remove gateway build/launch commands (make build, ./awmg)
Update test approach to use MCP tools through pre-configured infrastructure
Remove gateway lifecycle management steps

Architecture

Before:

AWF Container
└─ Agent → build gateway → launch containers → ❌ Docker unavailable

After:

MCP Gateway Container (outside AWF)
└─ Launches 20 MCP servers via Docker ✅

AWF Container
└─ Agent → HTTP/MCP → Gateway ✅

The gateway now runs as a trusted external service where Docker is available, while the agent communicates with it via HTTP from within AWF's security boundary.

Original prompt

This section details on the original issue you should resolve

<issue_title>[mcp-stress-test] Nightly MCP Stress Test Blocked: Docker-in-Docker Not Available in AWF Environment</issue_title>
<issue_description>## Critical Blocker for Nightly Stress Test Workflow

The nightly MCP server stress test workflow cannot execute due to a fundamental environment constraint: Docker-in-Docker support is not available in the AWF firewall container.

Test Session Details

Test Session: stress-test-20260204-033819
Test Date: 2026-02-04T03:42:00Z
Workflow: .github/workflows/nightly-mcp-stress-test.md
Status: ❌ BLOCKED - Cannot Execute

Problem Summary

The stress test attempts to launch 20 MCP servers as Docker containers, but all 20 servers fail immediately because Docker commands are blocked by AWF.

Error Message from MCP Gateway:

ERROR: Docker-in-Docker support was removed in AWF v0.9.1

Docker commands are no longer available inside the firewall container.

If you need to:
- Use MCP servers: Migrate to stdio-based MCP servers (see docs)
- Run Docker: Execute Docker commands outside AWF wrapper
- Build images: Run Docker build before invoking AWF

See PR github/gh-aw-mcpg#205: https://github.com/github/gh-aw-firewall/pull/205

Root Cause

AWF Security Policy: Docker-in-Docker explicitly disabled in AWF v0.9.1 (PR [duplicate-code] Config Validation Logic Duplication (Medium Severity) #205)
Test Design: All 20 MCP servers configured as container: "mcp/*" or container: "ghcr.io/*"
Gateway Behavior: Gateway uses docker run to launch container-based servers
Environment: Workflow runs inside AWF firewall container with no Docker access
Result: Zero servers can launch → zero servers can be tested

Impact

Test Coverage: 0/20 servers tested (0%)

All 20 attempted servers failed with identical Docker availability errors:

github (ghcr.io/github/github-mcp-server:v0.30.2)
filesystem (mcp/filesystem)
memory (mcp/memory)
sqlite (mcp/sqlite)
postgres (mcp/postgres)
brave-search (mcp/brave-search)
fetch (mcp/fetch)
puppeteer (mcp/puppeteer)
slack (mcp/slack)
gdrive (mcp/gdrive)
google-maps (mcp/google-maps)
everart (mcp/everart)
sequential-thinking (mcp/sequential-thinking)
aws-kb-retrieval (mcp/aws-kb-retrieval)
linear (mcp/linear)
sentry (mcp/sentry)
raygun (mcp/raygun)
git (mcp/git)
time (mcp/time)
axiom (mcp/axiom)

What Actually Worked ✅

The MCP Gateway behaved correctly:

Binary compiled successfully
Configuration parsed correctly (20 servers loaded)
Server started and bound to port 3000
Detected AWF environment correctly
Provided clear, actionable error messages

This is not a gateway bug - it's an environment incompatibility between the test design and AWF constraints.

Resolution Options

Option 1: Run Workflow Outside AWF (Recommended)

Pros:

No code changes needed
Tests gateway as designed (with container launching)
Quick to implement

Cons:

Less security isolation
May require different workflow runner

Implementation:

Modify workflow to run on standard GitHub runner (not AWF container)
OR: Run workflow on self-hosted runner with Docker access

Option 2: Use HTTP-Based MCP Servers

Pros:

Servers run outside workflow (no Docker needed)
Tests gateway's HTTP proxy capabilities
Maintains security boundary

Cons:

Requires pre-deployed MCP servers
Doesn't test gateway's container launching
Complex infrastructure setup

Implementation:

Deploy MCP servers externally (e.g., cloud instances)
Configure stress test with type: "http" and url instead of container

Option 3: Use Stdio-Based Non-Container Servers

Pros:

Can run inside AWF
Tests gateway stdio capabilities
No Docker dependency

Cons:

Requires rewriting/rebuilding MCP servers as binaries
Most MCP servers distributed as containers only
Significant development effort

Implementation:

Build or find stdio-compatible MCP server binaries
Deploy binaries into workflow environment
Configure with command instead of container

Option 4: Hybrid Approach

Pros:

Partial test coverage better than none
Incremental improvement possible
Flexible

Cons:

Incomplete coverage
Maintains complexity

Implementation:

Identify which servers can run as stdio processes
Test subset (e.g., 5-10 servers)
Document remaining servers as "requires Docker"

Option 5: Disable Stress Test

Pros:

Acknowledges limitation clearly
Frees up workflow resources
Simple

Cons:

Zero multi-server test coverage
No regression detection for scaling issues

Implementation:

Disable .github/workflows/nightly-mcp-stress-test.md workflow
Document as known limitation in README

Recommendations

Immediate Actions

✅ Document blocker (this issue)
🔲 Disable workflow until resol...

Custom agent used: agentic-workflows
GitHub Agentic Workflows (gh-aw) - Create, debug, and upgrade AI-powered workflows with intelligent prompt routing

Fixes [mcp-stress-test] Nightly MCP Stress Test Blocked: Docker-in-Docker Not Available in AWF Environment #626

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

- Add sandbox.mcp configuration with 20 MCP servers - Remove gateway build/launch from agent instructions - Update test approach to use MCP tools directly through gateway - Remove unnecessary Go setup step - Simplify test instructions to use pre-configured infrastructure Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

- Change filesystem mount from /tmp to /tmp/mcp-test-fs - Prevents filesystem server from accessing sensitive host files - Maintains required write access in a restricted path Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

- Use specific format 'stress-test-YYYYMMDD-HHMMSS' instead of generic '{timestamp}' - Improves clarity for agents following the instructions Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Copilot

Pull request overview

This PR fixes the nightly MCP stress test workflow which was blocked due to Docker-in-Docker being removed from AWF v0.9.1. The solution moves the MCP Gateway to run as an external service (outside AWF) where Docker is available, while the agent communicates with it via MCP tools from within the secure AWF environment.

Changes:

Added sandbox.mcp configuration with the MCP Gateway container and 20 MCP server definitions to the workflow
Removed Go setup step and gateway build/launch commands from agent instructions
Updated testing approach to use pre-configured MCP infrastructure instead of manually building and launching the gateway

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`.github/workflows/nightly-mcp-stress-test.md`	Added sandbox.mcp configuration with gateway container (v0.0.94) and 20 MCP servers including GitHub, filesystem, memory, sqlite, postgres, and 15 other services; removed Go setup step
`.github/agentics/nightly-mcp-stress-test.md`	Removed gateway build/launch instructions, updated testing approach to use pre-configured MCP servers, streamlined test execution steps

Comments suppressed due to low confidence (1)

.github/agentics/nightly-mcp-stress-test.md:160

Incomplete merge or editing error in Step 4 section. Lines 148-158 contain orphaned content from the old version including:

Incomplete sentence at line 148 ("Create a comprehensive test report documenting your findings.")
Orphaned bash command with unclosed code block (lines 150-152)
Numbered step "3. Analyze gateway performance" (lines 154-158) without corresponding steps 1 and 2
Duplicate heading "## Step 5: Generate Test Report" at line 160 (with emoji 📊), when Step 4 already has the same title with emoji 📝 at line 146

This appears to be leftover content from the old version that should have been removed. The section should flow directly from Step 4's title to the "Summary Statistics" subsection without these orphaned fragments.

## Step 4: Generate Test Report 📝

Create a comprehensive test report documenting your findings.
   
   # Parse for errors
   grep -i error /tmp/mcp-stress-test/logs/*.log > /tmp/mcp-stress-results/errors.txt

Analyze gateway performance:
- Check for memory leaks
- Measure startup time for each server
- Count total requests and failures
- Identify slowest servers

Step 5: Generate Test Report 📊

</details>



---

💡 <a href="/github/gh-aw-mcpg/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

Copilot · 2026-02-04T04:16:29Z

.github/agentics/nightly-mcp-stress-test.md

+### Example Test Pattern

-**Authentication Required:**
- Error message contains "authentication", "unauthorized", "token", "API key"
- HTTP 401 status code
- Tool invocation fails due to missing credentials
+For the GitHub server (which has authentication configured):
+```bash
+# You can directly use MCP tools configured in the workflow
+# The MCP gateway handles the routing automatically
+# Example: Use bash to log your testing approach
+echo "Testing github server..."
+```

-**Protocol Error:**
- Invalid JSON-RPC response
- MCP protocol violation
- Malformed request/response
+Then attempt to use a GitHub MCP tool. If it works, record success. If it fails, record the error and category.


The testing instructions lack concrete guidance on how to discover and invoke MCP tools from the configured servers. While lines 79-81 state what the agent should do (discover tools, invoke them, record results), and lines 87-92 suggest which tools to try, the instructions don't explain:

How to programmatically discover what tools each server provides (is there a list_tools function? Should the agent try to introspect?)

The exact syntax/mechanism for invoking MCP tools from bash scripts or directly

How to capture and parse tool responses and errors systematically

Compare this to the removed old version which had explicit curl commands with JSON-RPC requests. The new approach assumes the agent knows how to interact with pre-configured MCP servers, but provides no concrete examples or API reference.

Consider adding a concrete example showing how to invoke at least one MCP tool and check its response, to serve as a pattern the agent can follow for the other 19 servers.

See below for a potential fix:

For the GitHub server (which has authentication configured), follow this concrete pattern: 1. **Set the MCP Gateway URL** ```bash # MCP gateway base URL (injected by the workflow or use a default) MCP_GATEWAY_URL="${MCP_GATEWAY_URL:-http://127.0.0.1:3000}" echo "Testing github server via MCP gateway at ${MCP_GATEWAY_URL}..."

Discover available tools for the github server

This uses a JSON-RPC tools/list call to ask the gateway which tools are exposed for the github server.

curl -sS "${MCP_GATEWAY_URL}" \ -H 'Content-Type: application/json' \ -d '{ "jsonrpc": "2.0", "id": "list-github-tools", "method": "tools/list", "params": { "server": "github" } }' | tee github-tools-response.json echo "Discovered tools for github server:" jq '.result.tools // .error' github-tools-response.json

If .result.tools is present, the server responded successfully with a tool list.

If .error is present instead, record the error message and categorize it (e.g., auth, timeout, protocol).

Invoke a specific GitHub MCP tool

After listing tools, pick a simple, read-only tool (replace get_repo and its params with a real tool and arguments from the previous step):

curl -sS "${MCP_GATEWAY_URL}" \ -H 'Content-Type: application/json' \ -d '{ "jsonrpc": "2.0", "id": "call-github-tool", "method": "tools/call", "params": { "server": "github", "tool": "get_repo", "arguments": { "owner": "octocat", "repo": "hello-world" } } }' | tee github-tool-call-response.json echo "Tool call result (or error) for github server:" jq '{result: .result, error: .error}' github-tool-call-response.json

If .result is non-null, record this server as Success and capture a brief summary of the output.

If .error is non-null, record this server as Failure and categorize the error (auth, protocol, timeout, etc.).

Then repeat this pattern for each of the other MCP servers:

Use tools/list with server: "<server-name>" to discover available tools.

Select a simple, read-only tool from the list.

Use tools/call with the chosen tool name and minimal arguments.

Parse responses and errors with jq and record the outcome for your final report.

For servers without authentication:

Attempt to use a tool with tools/call as above.

If it fails due to missing authentication, document the required token or permission in your notes.

Move to the next server and continue testing.

Copilot · 2026-02-04T04:16:29Z

.github/workflows/nightly-mcp-stress-test.md

@@ -18,18 +18,80 @@ network:
    - defaults
    - go


Network configuration includes "go" but Go setup step was removed from the workflow. Since the workflow no longer builds the gateway locally (it uses a pre-built container), the "go" network allowlist entry is unnecessary. The agent won't be running any Go commands or accessing Go module repositories.

Consider removing "go" from the network allowed list as it's no longer needed after removing the Go setup step.

Suggested change

- go

Copilot · 2026-02-04T04:16:30Z

.github/workflows/nightly-mcp-stress-test.md

+        mounts:
+          - "/tmp/mcp-test-fs:/workspace:rw"


The mounts field is used in the filesystem server configuration but is not documented in the MCP server schema definition (agentic-workflow.json:5740-5880 stdio_mcp_tool). The field is supported by the gateway's StdinServerConfig (internal/config/config_stdin.go:54) and used in the example config.json file, but missing from the JSON schema that validates workflow configurations.

While this won't cause runtime errors (the gateway supports it), it creates a documentation gap where the schema doesn't reflect the actual supported fields. Consider adding a mounts field definition to the stdio_mcp_tool schema in .github/aw/schemas/agentic-workflow.json.

Initial plan

442cab4

Copilot AI assigned Copilot and lpcox Feb 4, 2026

Copilot started work on behalf of lpcox February 4, 2026 03:49 View session

Copilot AI and others added 3 commits February 4, 2026 03:53

Clarify timestamp placeholder format in example output

bf77cc5

- Use specific format 'stress-test-YYYYMMDD-HHMMSS' instead of generic '{timestamp}' - Improves clarity for agents following the instructions Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix Docker-in-Docker support for nightly MCP stress test~~ Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker Feb 4, 2026

Copilot AI requested a review from lpcox February 4, 2026 03:57

Copilot finished work on behalf of lpcox February 4, 2026 03:57

lpcox marked this pull request as ready for review February 4, 2026 04:10

Copilot AI review requested due to automatic review settings February 4, 2026 04:10

lpcox merged commit e0a0306 into main Feb 4, 2026

lpcox deleted the copilot/fix-docker-in-docker-support branch February 4, 2026 04:10

Copilot started reviewing on behalf of lpcox February 4, 2026 04:11 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker #627

Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker #627

Copilot AI commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker #627

Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker #627

Conversation

Copilot AI commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Architecture

Test Session Details

Problem Summary

Root Cause

Impact

What Actually Worked ✅

Resolution Options

Option 1: Run Workflow Outside AWF (Recommended)

Option 2: Use HTTP-Based MCP Servers

Option 3: Use Stdio-Based Non-Container Servers

Option 4: Hybrid Approach

Option 5: Disable Stress Test

Recommendations

Immediate Actions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Step 5: Generate Test Report 📊

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 4, 2026 •

edited

Loading