Summary
The MCP gateway drops safeoutputs sessions after approximately 30 minutes of inactivity. When an agent runs a long CPU-bound task (e.g., ML training, large builds) inside a shell tool call, no MCP requests are made during that period. The session expires server-side and becomes unrecoverable — all subsequent safeoutputs calls fail with session not found, and the agent has no way to deliver its results.
This is a correctness issue: safeoutputs is the only channel for the agent to report results, and it silently expires during the work the agent was asked to do.
Upstream issue
github/gh-aw#23153 — two independent reports:
Reproduction timeline (autoresearch_local)
| Time (UTC) |
Event |
| 22:29:07 |
safeoutputs MCP server started (v0.2.9), session established |
| 22:29:32 |
First initialize + tools/list calls succeed |
| 22:50–23:00 |
Agent runs ML training (~30 min, no MCP calls) |
| 23:00:34 |
Training completes successfully |
| 23:02:21 |
First safeoutputs failure: session not found |
| 23:02–23:13 |
All subsequent calls fail — noop, create_pull_request, push_repo_memory, add_comment, missing_tool |
| 23:13:48 |
Agent gives up: "safeoutputs MCP session is permanently expired" |
Error
All calls return the same error:
✗ noop (MCP: safeoutputs)
└ MCP server 'safeoutputs': Error: Streamable HTTP error: Error POSTing to endpoint: session not found
✗ create_pull_request (MCP: safeoutputs)
└ MCP server 'safeoutputs': Error: Streamable HTTP error: Error POSTing to endpoint: session not found
Impact
- Zero safe outputs completed — no PR, no comments, no repo-memory
- Training succeeded (val_bpb 2.236 → 2.107) but results were lost
- Agent wasted ~11 minutes retrying with sleep waits before giving up
- Client-side workarounds (keepalive prompts) don't help because the agent can't send MCP calls while blocked on a long shell execution
Proposed fixes
Any of these would resolve the issue:
-
Remove or significantly extend session timeout for safeoutputs — these sessions should live for the duration of the workflow (up to 6 hours for autoloop). A 30-minute idle timeout is incompatible with long-running tasks.
-
Automatic keepalive from the gateway side — the gateway could ping/refresh sessions internally rather than relying on client activity.
-
Transparent session reconnect — allow the client to re-establish a session when it receives session not found, without requiring manual intervention from the agent.
References
Summary
The MCP gateway drops safeoutputs sessions after approximately 30 minutes of inactivity. When an agent runs a long CPU-bound task (e.g., ML training, large builds) inside a shell tool call, no MCP requests are made during that period. The session expires server-side and becomes unrecoverable — all subsequent safeoutputs calls fail with
session not found, and the agent has no way to deliver its results.This is a correctness issue: safeoutputs is the only channel for the agent to report results, and it silently expires during the work the agent was asked to do.
Upstream issue
github/gh-aw#23153 — two independent reports:
dsyme/fv-squad— 45-minute job, session expiredgithubnext/autoresearch_local— 30-minute training run, session expiredReproduction timeline (autoresearch_local)
initialize+tools/listcalls succeedsession not foundError
All calls return the same error:
Impact
Proposed fixes
Any of these would resolve the issue:
Remove or significantly extend session timeout for safeoutputs — these sessions should live for the duration of the workflow (up to 6 hours for autoloop). A 30-minute idle timeout is incompatible with long-running tasks.
Automatic keepalive from the gateway side — the gateway could ping/refresh sessions internally rather than relying on client activity.
Transparent session reconnect — allow the client to re-establish a session when it receives
session not found, without requiring manual intervention from the agent.References