Safe-outputs MCP transport silently closes on idle during long agent runs — no outputs produced, workflow reports success

AI-assisted report

### Summary
When an agent spends several minutes on analysis before invoking safe-output tools, the HTTP connection to the safe-outputs MCP server goes idle and is dropped. The agent then fails to call any safe-output tools, producing **zero outputs**, but the workflow still reports **success**. This is a silent failure — no labels, assignments, or comments are applied.
### Impact
In a sample of the last 20 `issue-triage` workflow runs in `microsoft/vscode-engineering`, **7 out of 20 runs (35%)** produced zero safe outputs due to this failure. All 7 reported workflow conclusion `success`. The affected issues remained untriaged with `triage-needed` still applied.
### Reproduction
1. Create an agentic workflow with safe-output tools (e.g., `add_labels`, `assign_to_user`)
2. Give the agent a task that requires multi-step analysis before invoking safe-output tools (reading files, classification, searching references)
3. If the agent spends ~5+ minutes analyzing before its first safe-output tool call, the MCP transport closes
### Error Sequence (from agent logs)
```
2026-03-13T13:43:16.749Z [ERROR] MCP client for safeoutputs errored TypeError: fetch failed
2026-03-13T13:43:16.750Z [ERROR] MCP client for safeoutputs errored TypeError: fetch failed
2026-03-13T13:44:29.233Z [ERROR] MCP transport for safeoutputs closed
2026-03-13T13:44:29.233Z [ERROR] MCP client for safeoutputs closed
```
The agent completes its analysis and attempts to invoke `safeoutputs-add_labels`, but the transport is already dead. No reconnect is attempted. `outputs.jsonl` is never written.
### Root Cause Analysis
The safe-outputs MCP server communicates over HTTP through the MCP Gateway. During the agent's analysis phase (reading issue data, skill files, working-areas references), **no requests are sent to the safe-outputs server**. After ~5 minutes of idle time, the HTTP connection is dropped — likely by the Docker network stack, OS TCP keepalive, or the gateway's connection management.
Key gaps identified:
| Gap | Location | Detail |
|-----|----------|--------|
| No HTTP keepalive/ping | MCP HTTP transport | No heartbeat mechanism to keep idle connections alive |
| No auto-reconnect | copilot-agent-runtime `StreamableHTTPClientTransport` | Client does not retry on `TypeError: fetch failed` |
| `timeout` config unused | `MCPServerConfig` in copilot-agent-runtime | Field exists but is never wired to the HTTP transport layer |
| Silent success | Workflow conclusion job | `outputs.jsonl` missing → no actions taken, but workflow reports success |

### Suggested Fixes
1. **Add keepalive/heartbeat to MCP HTTP transport** — periodic pings to prevent idle connection closure
2. **Implement auto-reconnect** — detect `fetch failed` and re-establish the transport before retrying the tool call
3. **Increase visibility** — emit a warning annotation when the safe-outputs artifact is missing despite the agent job completing
### Environment
- gh-aw compiler: v0.50.0
- Agent runtime: 0.0.415
- AWF: v0.20.2
- MCP Gateway: v0.1.5
- GitHub MCP Server: v0.31.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safe-outputs MCP transport silently closes on idle during long agent runs — no outputs produced, workflow reports success #20885

Summary

Impact

Reproduction

Error Sequence (from agent logs)

Root Cause Analysis

Suggested Fixes

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gap	Location	Detail
No HTTP keepalive/ping	MCP HTTP transport	No heartbeat mechanism to keep idle connections alive
No auto-reconnect	copilot-agent-runtime `StreamableHTTPClientTransport`	Client does not retry on `TypeError: fetch failed`
`timeout` config unused	`MCPServerConfig` in copilot-agent-runtime	Field exists but is never wired to the HTTP transport layer
Silent success	Workflow conclusion job	`outputs.jsonl` missing → no actions taken, but workflow reports success

Safe-outputs MCP transport silently closes on idle during long agent runs — no outputs produced, workflow reports success #20885

Description

Summary

Impact

Reproduction

Error Sequence (from agent logs)

Root Cause Analysis

Suggested Fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions