Skip to content

fix(system): add configurable timeout to HTTP client#205

Open
efe-arv wants to merge 1 commit intovxcontrol:masterfrom
efe-arv:fix/http-client-timeout
Open

fix(system): add configurable timeout to HTTP client#205
efe-arv wants to merge 1 commit intovxcontrol:masterfrom
efe-arv:fix/http-client-timeout

Conversation

@efe-arv
Copy link

@efe-arv efe-arv commented Mar 15, 2026

Summary

GetHTTPClient in system/utils.go creates http.Client instances without a Timeout field. When an external API (LLM provider, search tool, scraper, etc.) stops responding or hangs, the calling goroutine blocks indefinitely — leading to goroutine leaks and eventual resource exhaustion.

This affects all 17 call sites across the codebase:

  • Every LLM provider: OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, GLM, Ollama, custom
  • All search tools: Tavily, DuckDuckGo, Sploitus, Perplexity, Google, Traversaal, SearxNG
  • Embeddings provider

Changes

File Change
backend/pkg/config/config.go Add HTTPClientTimeout config field (HTTP_CLIENT_TIMEOUT env var, default 600s)
backend/pkg/system/utils.go Set Timeout on all http.Client instances; use default timeout for nil config
backend/pkg/system/utils_test.go Add 5 tests: default, custom, zero-fallback, nil-config, and proxy scenarios
.env.example Document the new HTTP_CLIENT_TIMEOUT env var

Design Decisions

  • Default: 10 minutes (600s) — long enough for slow LLM responses (large context, complex reasoning), short enough to catch genuinely hung connections.
  • Configurable via env var — operators can tune per deployment. Users behind slow proxies or running local Ollama with large models may need longer.
  • Zero = use default — setting HTTP_CLIENT_TIMEOUT=0 falls back to the 10-minute default rather than disabling timeouts entirely, preventing accidental infinite hangs.
  • Nil config gets a timeout too — previously returned http.DefaultClient (no timeout); now returns a client with the default timeout.

Relation to Existing Issues

This is a contributing factor to #176 (agent delegation context canceled). While #195 addresses the detached command context issue, the underlying HTTP client still has no timeout — meaning a hanging LLM API call will block the goroutine forever even after the context fix.

Testing

Added 5 unit tests:

  • TestGetHTTPClient_NilConfig — nil config returns client with default timeout
  • TestGetHTTPClient_DefaultTimeout — config with default value (600s)
  • TestGetHTTPClient_CustomTimeout — config with custom value (120s)
  • TestGetHTTPClient_ZeroTimeoutUsesDefault — zero falls back to default
  • TestGetHTTPClient_TimeoutWithProxy — timeout applies with proxy configured

Verification

  • Changes are backward-compatible (default matches current behavior for non-hanging requests)
  • No breaking changes to existing callers (all use GetHTTPClient → automatically get timeout)
  • .env.example updated with documentation

GetHTTPClient creates http.Client without a Timeout field, causing
goroutines to hang indefinitely when an external API (LLM provider,
search tool, etc.) stops responding. This affects all 17 call sites
across every provider (OpenAI, Anthropic, Gemini, DeepSeek, Kimi,
Qwen, GLM, Ollama, custom) and all search tools (Tavily, DuckDuckGo,
Sploitus, Perplexity, Google, Traversaal, SearxNG).

Changes:
- Add HTTP_CLIENT_TIMEOUT env var (default: 600s / 10 minutes)
- Set Timeout on all http.Client instances returned by GetHTTPClient
- When cfg is nil, return a client with the default timeout instead
  of http.DefaultClient (which has no timeout)
- Add 5 unit tests covering default, custom, zero, nil, and proxy
  timeout scenarios
- Document the new env var in .env.example

Relates to vxcontrol#176 (context canceled on agent delegation), which
identified the missing HTTP client timeout as a contributing factor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants