-
Notifications
You must be signed in to change notification settings - Fork 296
Description
AI-assisted report
Summary
When an agent spends several minutes on analysis before invoking safe-output tools, the HTTP connection to the safe-outputs MCP server goes idle and is dropped. The agent then fails to call any safe-output tools, producing zero outputs, but the workflow still reports success. This is a silent failure — no labels, assignments, or comments are applied.
Impact
In a sample of the last 20 issue-triage workflow runs in microsoft/vscode-engineering, 7 out of 20 runs (35%) produced zero safe outputs due to this failure. All 7 reported workflow conclusion success. The affected issues remained untriaged with triage-needed still applied.
Reproduction
- Create an agentic workflow with safe-output tools (e.g.,
add_labels,assign_to_user) - Give the agent a task that requires multi-step analysis before invoking safe-output tools (reading files, classification, searching references)
- If the agent spends ~5+ minutes analyzing before its first safe-output tool call, the MCP transport closes
Error Sequence (from agent logs)
2026-03-13T13:43:16.749Z [ERROR] MCP client for safeoutputs errored TypeError: fetch failed
2026-03-13T13:43:16.750Z [ERROR] MCP client for safeoutputs errored TypeError: fetch failed
2026-03-13T13:44:29.233Z [ERROR] MCP transport for safeoutputs closed
2026-03-13T13:44:29.233Z [ERROR] MCP client for safeoutputs closed
The agent completes its analysis and attempts to invoke safeoutputs-add_labels, but the transport is already dead. No reconnect is attempted. outputs.jsonl is never written.
Root Cause Analysis
The safe-outputs MCP server communicates over HTTP through the MCP Gateway. During the agent's analysis phase (reading issue data, skill files, working-areas references), no requests are sent to the safe-outputs server. After ~5 minutes of idle time, the HTTP connection is dropped — likely by the Docker network stack, OS TCP keepalive, or the gateway's connection management.
Key gaps identified:
| Gap | Location | Detail |
|---|---|---|
| No HTTP keepalive/ping | MCP HTTP transport | No heartbeat mechanism to keep idle connections alive |
| No auto-reconnect | copilot-agent-runtime StreamableHTTPClientTransport |
Client does not retry on TypeError: fetch failed |
timeout config unused |
MCPServerConfig in copilot-agent-runtime |
Field exists but is never wired to the HTTP transport layer |
| Silent success | Workflow conclusion job | outputs.jsonl missing → no actions taken, but workflow reports success |
Suggested Fixes
- Add keepalive/heartbeat to MCP HTTP transport — periodic pings to prevent idle connection closure
- Implement auto-reconnect — detect
fetch failedand re-establish the transport before retrying the tool call - Increase visibility — emit a warning annotation when the safe-outputs artifact is missing despite the agent job completing
Environment
- gh-aw compiler: v0.50.0
- Agent runtime: 0.0.415
- AWF: v0.20.2
- MCP Gateway: v0.1.5
- GitHub MCP Server: v0.31.0