Skip to content

[aw-failures] Go Logger Enhancement: MCP connection timeout kills build verification step #28653

@github-actions

Description

@github-actions

Problem Statement

The Go Logger Enhancement workflow has failed twice with MCP-related root causes. Today's failure (§24967310561) is now confirmed: all MCP server connections timed out mid-session, causing the final mcpscripts.make build-verification step to fail with MCP error -32003: context canceled.

Evidence

Timeline from agent-stdio.log:

21:25:55Z [DEBUG] MCP server "github": Connection error: The operation timed out.
21:25:55Z [DEBUG] MCP server "mcpscripts": Connection error: The operation timed out.
21:25:55Z [DEBUG] MCP server "safeoutputs": Connection error: The operation timed out.
  ... ~10 min of native tool use (Read/Grep/Edit) ...
21:35:52Z [DEBUG] MCP server "mcpscripts": Tool 'make' failed after 6s:
          MCP error -32003: context canceled / client is closing
21:35:54Z [DEBUG] MCP server "mcpscripts": Tool 'make' failed after 0s:
          Unable to connect. Is the computer able to access the url?

Session metrics (audit run 24967310561):

  • Duration: 18.5 min total, avg 9.2 min between turns
  • Cache warning: avg TBT (9.2m) exceeds Anthropic 5-min cache TTL — prompt cache expired between turns
  • MCP servers healthy at startup: 2/2; dropped at ~5 min idle
  • Turns: 3 (exploratory, read-only — agent edited code but could not verify build)

Recurrence: Prior run §24912564019 (2026-04-24/25) also failed with MCP-related issues (413 turns, 9 anomalies — root cause unresolved then).

Root Cause

MCP server connections (HTTP/WebSocket) time out after ~5 minutes of inactivity between turns. The Go Logger workflow's long exploration turns (reading 80+ files over ~6–9 min each) exceed this idle threshold. When the agent finally calls mcpscripts.make to verify the build, the MCP transport has been closed and cannot be re-established without a restart.

The cache_memory_miss (no prior session in /tmp/gh-aw/cache-memory/go-logger/) is a secondary issue: a cold-start forces maximum exploration, amplifying turn length.

Proposed Remediation

  1. Move build verification to a post-step (highest impact): Replace mcpscripts.make / mcpscripts.go test calls in the agent turn with a native Bash tool call (bash_make build or bash_go build ./...). Native tools don't depend on MCP connectivity and survive idle-induced MCP disconnects.

  2. Reduce exploration scope: Configure a cache-memory warm-start so the agent can skip file discovery on subsequent runs and keep turns under 5 minutes.

  3. Reduce turn duration or add intermediate checkpoints: If the workflow must use mcpscripts.make, restructure the prompt to call it earlier (immediately after edits) rather than at the end of a long exploration pass.

Success Criteria

  • Go Logger Enhancement completes with conclusion: success on the next scheduled run
  • mcpscripts.make or equivalent build verification returns a valid result
  • No MCP error -32003 or Unable to connect errors in agent-stdio.log

Affected Run IDs

Parent: #28268

Generated by [aw] Failure Investigator (6h) · ● 399K ·

  • expires on May 4, 2026, 1:24 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions