-
Notifications
You must be signed in to change notification settings - Fork 3
Fix nightly MCP stress test: use external gateway instead of Docker-in-Docker #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add sandbox.mcp configuration with 20 MCP servers - Remove gateway build/launch from agent instructions - Update test approach to use MCP tools directly through gateway - Remove unnecessary Go setup step - Simplify test instructions to use pre-configured infrastructure Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
- Change filesystem mount from /tmp to /tmp/mcp-test-fs - Prevents filesystem server from accessing sensitive host files - Maintains required write access in a restricted path Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
- Use specific format 'stress-test-YYYYMMDD-HHMMSS' instead of generic '{timestamp}'
- Improves clarity for agents following the instructions
Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes the nightly MCP stress test workflow which was blocked due to Docker-in-Docker being removed from AWF v0.9.1. The solution moves the MCP Gateway to run as an external service (outside AWF) where Docker is available, while the agent communicates with it via MCP tools from within the secure AWF environment.
Changes:
- Added
sandbox.mcpconfiguration with the MCP Gateway container and 20 MCP server definitions to the workflow - Removed Go setup step and gateway build/launch commands from agent instructions
- Updated testing approach to use pre-configured MCP infrastructure instead of manually building and launching the gateway
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
.github/workflows/nightly-mcp-stress-test.md |
Added sandbox.mcp configuration with gateway container (v0.0.94) and 20 MCP servers including GitHub, filesystem, memory, sqlite, postgres, and 15 other services; removed Go setup step |
.github/agentics/nightly-mcp-stress-test.md |
Removed gateway build/launch instructions, updated testing approach to use pre-configured MCP servers, streamlined test execution steps |
Comments suppressed due to low confidence (1)
.github/agentics/nightly-mcp-stress-test.md:160
- Incomplete merge or editing error in Step 4 section. Lines 148-158 contain orphaned content from the old version including:
- Incomplete sentence at line 148 ("Create a comprehensive test report documenting your findings.")
- Orphaned bash command with unclosed code block (lines 150-152)
- Numbered step "3. Analyze gateway performance" (lines 154-158) without corresponding steps 1 and 2
- Duplicate heading "## Step 5: Generate Test Report" at line 160 (with emoji 📊), when Step 4 already has the same title with emoji 📝 at line 146
This appears to be leftover content from the old version that should have been removed. The section should flow directly from Step 4's title to the "Summary Statistics" subsection without these orphaned fragments.
## Step 4: Generate Test Report 📝
Create a comprehensive test report documenting your findings.
# Parse for errors
grep -i error /tmp/mcp-stress-test/logs/*.log > /tmp/mcp-stress-results/errors.txt
- Analyze gateway performance:
- Check for memory leaks
- Measure startup time for each server
- Count total requests and failures
- Identify slowest servers
Step 5: Generate Test Report 📊
</details>
---
💡 <a href="/github/gh-aw-mcpg/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
| ### Example Test Pattern | ||
|
|
||
| **Authentication Required:** | ||
| - Error message contains "authentication", "unauthorized", "token", "API key" | ||
| - HTTP 401 status code | ||
| - Tool invocation fails due to missing credentials | ||
| For the GitHub server (which has authentication configured): | ||
| ```bash | ||
| # You can directly use MCP tools configured in the workflow | ||
| # The MCP gateway handles the routing automatically | ||
| # Example: Use bash to log your testing approach | ||
| echo "Testing github server..." | ||
| ``` | ||
|
|
||
| **Protocol Error:** | ||
| - Invalid JSON-RPC response | ||
| - MCP protocol violation | ||
| - Malformed request/response | ||
| Then attempt to use a GitHub MCP tool. If it works, record success. If it fails, record the error and category. |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testing instructions lack concrete guidance on how to discover and invoke MCP tools from the configured servers. While lines 79-81 state what the agent should do (discover tools, invoke them, record results), and lines 87-92 suggest which tools to try, the instructions don't explain:
- How to programmatically discover what tools each server provides (is there a list_tools function? Should the agent try to introspect?)
- The exact syntax/mechanism for invoking MCP tools from bash scripts or directly
- How to capture and parse tool responses and errors systematically
Compare this to the removed old version which had explicit curl commands with JSON-RPC requests. The new approach assumes the agent knows how to interact with pre-configured MCP servers, but provides no concrete examples or API reference.
Consider adding a concrete example showing how to invoke at least one MCP tool and check its response, to serve as a pattern the agent can follow for the other 19 servers.
See below for a potential fix:
For the GitHub server (which has authentication configured), follow this concrete pattern:
1. **Set the MCP Gateway URL**
```bash
# MCP gateway base URL (injected by the workflow or use a default)
MCP_GATEWAY_URL="${MCP_GATEWAY_URL:-http://127.0.0.1:3000}"
echo "Testing github server via MCP gateway at ${MCP_GATEWAY_URL}..."
-
Discover available tools for the
githubserverThis uses a JSON-RPC
tools/listcall to ask the gateway which tools are exposed for thegithubserver.curl -sS "${MCP_GATEWAY_URL}" \ -H 'Content-Type: application/json' \ -d '{ "jsonrpc": "2.0", "id": "list-github-tools", "method": "tools/list", "params": { "server": "github" } }' | tee github-tools-response.json echo "Discovered tools for github server:" jq '.result.tools // .error' github-tools-response.json
- If
.result.toolsis present, the server responded successfully with a tool list. - If
.erroris present instead, record the error message and categorize it (e.g., auth, timeout, protocol).
- If
-
Invoke a specific GitHub MCP tool
After listing tools, pick a simple, read-only tool (replace
get_repoand its params with a real tool and arguments from the previous step):curl -sS "${MCP_GATEWAY_URL}" \ -H 'Content-Type: application/json' \ -d '{ "jsonrpc": "2.0", "id": "call-github-tool", "method": "tools/call", "params": { "server": "github", "tool": "get_repo", "arguments": { "owner": "octocat", "repo": "hello-world" } } }' | tee github-tool-call-response.json echo "Tool call result (or error) for github server:" jq '{result: .result, error: .error}' github-tool-call-response.json
- If
.resultis non-null, record this server as Success and capture a brief summary of the output. - If
.erroris non-null, record this server as Failure and categorize the error (auth, protocol, timeout, etc.).
- If
Then repeat this pattern for each of the other MCP servers:
- Use
tools/listwithserver: "<server-name>"to discover available tools. - Select a simple, read-only tool from the list.
- Use
tools/callwith the chosen tool name and minimal arguments. - Parse responses and errors with
jqand record the outcome for your final report.
For servers without authentication:
- Attempt to use a tool with
tools/callas above. - If it fails due to missing authentication, document the required token or permission in your notes.
- Move to the next server and continue testing.
| @@ -18,18 +18,80 @@ network: | |||
| - defaults | |||
| - go | |||
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Network configuration includes "go" but Go setup step was removed from the workflow. Since the workflow no longer builds the gateway locally (it uses a pre-built container), the "go" network allowlist entry is unnecessary. The agent won't be running any Go commands or accessing Go module repositories.
Consider removing "go" from the network allowed list as it's no longer needed after removing the Go setup step.
| - go |
| mounts: | ||
| - "/tmp/mcp-test-fs:/workspace:rw" |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mounts field is used in the filesystem server configuration but is not documented in the MCP server schema definition (agentic-workflow.json:5740-5880 stdio_mcp_tool). The field is supported by the gateway's StdinServerConfig (internal/config/config_stdin.go:54) and used in the example config.json file, but missing from the JSON schema that validates workflow configurations.
While this won't cause runtime errors (the gateway supports it), it creates a documentation gap where the schema doesn't reflect the actual supported fields. Consider adding a mounts field definition to the stdio_mcp_tool schema in .github/aw/schemas/agentic-workflow.json.
The nightly MCP stress test workflow was blocked with 0% test coverage because it attempted to launch MCP server containers from within AWF, where Docker-in-Docker was removed in v0.9.1.
Changes
Workflow configuration (
.github/workflows/nightly-mcp-stress-test.md)sandbox.mcpconfiguration with gateway container and 20 MCP servers/tmp/mcp-test-fssubdirectoryAgent instructions (
.github/agentics/nightly-mcp-stress-test.md)make build,./awmg)Architecture
Before:
After:
The gateway now runs as a trusted external service where Docker is available, while the agent communicates with it via HTTP from within AWF's security boundary.
Original prompt
This section details on the original issue you should resolve
<issue_title>[mcp-stress-test] Nightly MCP Stress Test Blocked: Docker-in-Docker Not Available in AWF Environment</issue_title>
<issue_description>## Critical Blocker for Nightly Stress Test Workflow
The nightly MCP server stress test workflow cannot execute due to a fundamental environment constraint: Docker-in-Docker support is not available in the AWF firewall container.
Test Session Details
stress-test-20260204-033819.github/workflows/nightly-mcp-stress-test.mdProblem Summary
The stress test attempts to launch 20 MCP servers as Docker containers, but all 20 servers fail immediately because Docker commands are blocked by AWF.
Error Message from MCP Gateway:
Root Cause
container: "mcp/*"orcontainer: "ghcr.io/*"docker runto launch container-based serversImpact
Test Coverage: 0/20 servers tested (0%)
All 20 attempted servers failed with identical Docker availability errors:
github(ghcr.io/github/github-mcp-server:v0.30.2)filesystem(mcp/filesystem)memory(mcp/memory)sqlite(mcp/sqlite)postgres(mcp/postgres)brave-search(mcp/brave-search)fetch(mcp/fetch)puppeteer(mcp/puppeteer)slack(mcp/slack)gdrive(mcp/gdrive)google-maps(mcp/google-maps)everart(mcp/everart)sequential-thinking(mcp/sequential-thinking)aws-kb-retrieval(mcp/aws-kb-retrieval)linear(mcp/linear)sentry(mcp/sentry)raygun(mcp/raygun)git(mcp/git)time(mcp/time)axiom(mcp/axiom)What Actually Worked ✅
The MCP Gateway behaved correctly:
This is not a gateway bug - it's an environment incompatibility between the test design and AWF constraints.
Resolution Options
Option 1: Run Workflow Outside AWF (Recommended)
Pros:
Cons:
Implementation:
Option 2: Use HTTP-Based MCP Servers
Pros:
Cons:
Implementation:
type: "http"andurlinstead ofcontainerOption 3: Use Stdio-Based Non-Container Servers
Pros:
Cons:
Implementation:
commandinstead ofcontainerOption 4: Hybrid Approach
Pros:
Cons:
Implementation:
Option 5: Disable Stress Test
Pros:
Cons:
Implementation:
.github/workflows/nightly-mcp-stress-test.mdworkflowRecommendations
Immediate Actions
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.