Skip to content

fix: increase HTTP backend connect timeout from 5s to 30s and make configurable#3782

Merged
lpcox merged 4 commits intomainfrom
fix/http-connect-timeout-3718
Apr 14, 2026
Merged

fix: increase HTTP backend connect timeout from 5s to 30s and make configurable#3782
lpcox merged 4 commits intomainfrom
fix/http-connect-timeout-3718

Conversation

@lpcox
Copy link
Copy Markdown
Collaborator

@lpcox lpcox commented Apr 14, 2026

Problem

When an HTTP MCP backend is slow to initialize (e.g. cold-starting LiteLLM proxy taking 21s), the gateway's hardcoded 5-second per-transport connect timeout causes all three SDK transport attempts to fail. The tool registration silently drops the backend's tools and the agent starts without them.

From the issue logs:

[2026-04-13T16:37:36Z] [DEBUG] GetOrLaunch called for server: opslevel
[2026-04-13T16:37:57Z] [INFO]  Successfully registered tools from opslevel (took 21.213743169s)

The 21s is consumed by the transport fallback chain: streamable HTTP (5s timeout) → SSE (5s timeout) → plain JSON (~11s). If the backend only supports streamable HTTP, all three transports fail and tools are silently dropped.

Fix

1. Configurable per-transport connect timeout (default 30s → was 5s)

New connect_timeout field on server config:

TOML:

[servers.opslevel]
type = "http"
url = "http://opslevel-proxy:8080/mcp"
connect_timeout = 60  # seconds per transport attempt

JSON stdin:

{
  "mcpServers": {
    "opslevel": {
      "type": "http",
      "url": "http://opslevel-proxy:8080/mcp",
      "connect_timeout": 60
    }
  }
}

2. Better error visibility

  • Failed backend tool registration promoted from WARN → ERROR
  • Summary error log lists which backends failed and impact on agents:
    Tool registration incomplete: 1 of 3 backends failed: [opslevel] — agents will not see tools from these servers

3. Reconnect uses stored timeout

The connect timeout is stored on the Connection struct so reconnectSDKTransport reuses the same value instead of a hardcoded 10s.

Changes

File Change
config_core.go Add ConnectTimeout field, HTTPConnectTimeout() helper, DefaultConnectTimeout constant
config_stdin.go Add ConnectTimeout to StdinServerConfig, map it during conversion
connection.go Add connectTimeout to Connection, apply default of 30s when ≤ 0
http_transport.go Pass configurable timeout through trySDKTransport and newHTTPConnection
launcher.go Pass serverCfg.HTTPConnectTimeout() to NewHTTPConnection
tool_registry.go Promote failed backends to ERROR, add summary log
Tests Updated all NewHTTPConnection call sites with new parameter

Fixes #3718

… configurable

The hardcoded 5-second per-transport connect timeout caused slow HTTP
backends to silently lose their tools during startup. When a backend
takes longer than 5s to respond to initialize (e.g. cold-starting
LiteLLM proxy at 21s), all three SDK transports time out, tool
registration fails silently, and the agent starts without those tools.

Changes:
- Add connect_timeout per-server config field (TOML and JSON stdin)
- Add ServerConfig.HTTPConnectTimeout() helper (default 30s)
- Pass configurable timeout through NewHTTPConnection → trySDKTransport
- Store timeout on Connection for use during reconnect
- Apply default of 30s when connectTimeout ≤ 0 in NewHTTPConnection
- Promote failed backend tool registration from WARN to ERROR
- Add summary error log listing which backends failed and their impact

Fixes #3718

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses missing HTTP MCP backend tools when backends are slow to initialize by increasing the per-transport HTTP connect timeout (now configurable per server, defaulting to 30s) and improving logging when tool registration fails.

Changes:

  • Add per-server connect_timeout (seconds) to config (TOML + stdin JSON) and plumb it through HTTP connection creation / reconnect.
  • Replace hardcoded SDK transport connect timeouts with a configurable timeout and persist it on the Connection.
  • Improve tool registration failure visibility by logging per-backend failures as ERROR and emitting an ERROR summary.
Show a summary per file
File Description
internal/config/config_core.go Adds DefaultConnectTimeout, ServerConfig.ConnectTimeout, and HTTPConnectTimeout() helper.
internal/config/config_core_test.go Adds tests for parsing and HTTPConnectTimeout() behavior.
internal/config/config_stdin.go Adds connect_timeout to stdin JSON server config and maps it into ServerConfig.
internal/launcher/launcher.go Passes per-server connect timeout into mcp.NewHTTPConnection.
internal/mcp/connection.go Adds connectTimeout to Connection, applies default, and reuses it on SDK reconnect.
internal/mcp/http_transport.go Threads connect timeout through SDK transport connection attempts and stores it on the connection.
internal/server/tool_registry.go Promotes backend tool registration failures to ERROR and adds an ERROR summary log.
internal/mcp/http_transport_test.go Updates NewHTTPConnection callsites for the new signature.
internal/mcp/http_connection_test.go Updates NewHTTPConnection callsites for the new signature.
internal/mcp/http_error_propagation_test.go Updates NewHTTPConnection callsites for the new signature.
internal/mcp/connection_test.go Updates NewHTTPConnection/newHTTPConnection callsites for the new signature.
internal/mcp/connection_stderr_test.go Updates NewHTTPConnection callsites for the new signature.
internal/mcp/connection_arguments_test.go Updates NewHTTPConnection callsites for the new signature.
test/integration/http_error_test.go Updates integration tests for new NewHTTPConnection signature and adds a short connect timeout in the intermittent-failure test.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

internal/config/config_stdin.go:129

  • UnmarshalJSON tracks unknown per-server fields in AdditionalProperties via a hardcoded knownFields list. Since connect_timeout isn’t in that list, it will be treated as an additional/custom property, which can interfere with custom schema validation and makes debugging harder. Add connect_timeout to knownFields so it’s recognized as a first-class field.
	// ConnectTimeout is the per-transport timeout (in seconds) for connecting to HTTP backends.
	// Only applies to HTTP server types. Default: 30 seconds.
	ConnectTimeout *int `json:"connect_timeout,omitempty"`

  • Files reviewed: 14/14 changed files
  • Comments generated: 3

Comment thread internal/mcp/connection.go
Comment thread internal/config/config_core.go Outdated
Comment thread internal/config/config_stdin.go
lpcox and others added 3 commits April 14, 2026 10:43
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: HTTP MCP backend tools missing from agent when tools/list response is slow

2 participants