fix: increase HTTP backend connect timeout from 5s to 30s and make configurable#3782
Merged
fix: increase HTTP backend connect timeout from 5s to 30s and make configurable#3782
Conversation
… configurable The hardcoded 5-second per-transport connect timeout caused slow HTTP backends to silently lose their tools during startup. When a backend takes longer than 5s to respond to initialize (e.g. cold-starting LiteLLM proxy at 21s), all three SDK transports time out, tool registration fails silently, and the agent starts without those tools. Changes: - Add connect_timeout per-server config field (TOML and JSON stdin) - Add ServerConfig.HTTPConnectTimeout() helper (default 30s) - Pass configurable timeout through NewHTTPConnection → trySDKTransport - Store timeout on Connection for use during reconnect - Apply default of 30s when connectTimeout ≤ 0 in NewHTTPConnection - Promote failed backend tool registration from WARN to ERROR - Add summary error log listing which backends failed and their impact Fixes #3718 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses missing HTTP MCP backend tools when backends are slow to initialize by increasing the per-transport HTTP connect timeout (now configurable per server, defaulting to 30s) and improving logging when tool registration fails.
Changes:
- Add per-server
connect_timeout(seconds) to config (TOML + stdin JSON) and plumb it through HTTP connection creation / reconnect. - Replace hardcoded SDK transport connect timeouts with a configurable timeout and persist it on the
Connection. - Improve tool registration failure visibility by logging per-backend failures as ERROR and emitting an ERROR summary.
Show a summary per file
| File | Description |
|---|---|
internal/config/config_core.go |
Adds DefaultConnectTimeout, ServerConfig.ConnectTimeout, and HTTPConnectTimeout() helper. |
internal/config/config_core_test.go |
Adds tests for parsing and HTTPConnectTimeout() behavior. |
internal/config/config_stdin.go |
Adds connect_timeout to stdin JSON server config and maps it into ServerConfig. |
internal/launcher/launcher.go |
Passes per-server connect timeout into mcp.NewHTTPConnection. |
internal/mcp/connection.go |
Adds connectTimeout to Connection, applies default, and reuses it on SDK reconnect. |
internal/mcp/http_transport.go |
Threads connect timeout through SDK transport connection attempts and stores it on the connection. |
internal/server/tool_registry.go |
Promotes backend tool registration failures to ERROR and adds an ERROR summary log. |
internal/mcp/http_transport_test.go |
Updates NewHTTPConnection callsites for the new signature. |
internal/mcp/http_connection_test.go |
Updates NewHTTPConnection callsites for the new signature. |
internal/mcp/http_error_propagation_test.go |
Updates NewHTTPConnection callsites for the new signature. |
internal/mcp/connection_test.go |
Updates NewHTTPConnection/newHTTPConnection callsites for the new signature. |
internal/mcp/connection_stderr_test.go |
Updates NewHTTPConnection callsites for the new signature. |
internal/mcp/connection_arguments_test.go |
Updates NewHTTPConnection callsites for the new signature. |
test/integration/http_error_test.go |
Updates integration tests for new NewHTTPConnection signature and adds a short connect timeout in the intermittent-failure test. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
internal/config/config_stdin.go:129
UnmarshalJSONtracks unknown per-server fields inAdditionalPropertiesvia a hardcodedknownFieldslist. Sinceconnect_timeoutisn’t in that list, it will be treated as an additional/custom property, which can interfere with custom schema validation and makes debugging harder. Addconnect_timeouttoknownFieldsso it’s recognized as a first-class field.
// ConnectTimeout is the per-transport timeout (in seconds) for connecting to HTTP backends.
// Only applies to HTTP server types. Default: 30 seconds.
ConnectTimeout *int `json:"connect_timeout,omitempty"`
- Files reviewed: 14/14 changed files
- Comments generated: 3
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This was referenced Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When an HTTP MCP backend is slow to initialize (e.g. cold-starting LiteLLM proxy taking 21s), the gateway's hardcoded 5-second per-transport connect timeout causes all three SDK transport attempts to fail. The tool registration silently drops the backend's tools and the agent starts without them.
From the issue logs:
The 21s is consumed by the transport fallback chain: streamable HTTP (5s timeout) → SSE (5s timeout) → plain JSON (~11s). If the backend only supports streamable HTTP, all three transports fail and tools are silently dropped.
Fix
1. Configurable per-transport connect timeout (default 30s → was 5s)
New
connect_timeoutfield on server config:TOML:
JSON stdin:
{ "mcpServers": { "opslevel": { "type": "http", "url": "http://opslevel-proxy:8080/mcp", "connect_timeout": 60 } } }2. Better error visibility
Tool registration incomplete: 1 of 3 backends failed: [opslevel] — agents will not see tools from these servers3. Reconnect uses stored timeout
The connect timeout is stored on the Connection struct so
reconnectSDKTransportreuses the same value instead of a hardcoded 10s.Changes
config_core.goConnectTimeoutfield,HTTPConnectTimeout()helper,DefaultConnectTimeoutconstantconfig_stdin.goConnectTimeouttoStdinServerConfig, map it during conversionconnection.goconnectTimeouttoConnection, apply default of 30s when ≤ 0http_transport.gotrySDKTransportandnewHTTPConnectionlauncher.goserverCfg.HTTPConnectTimeout()toNewHTTPConnectiontool_registry.goNewHTTPConnectioncall sites with new parameterFixes #3718