Problem
The MCP Structural Analysis report (Apr 14, 2026) detected the first-ever GitHub MCP installation rate-limit event: 4 tools hit the 15,000 request/reset limit simultaneously. Without a circuit breaker or throttling strategy, high-frequency workflows trigger cascading rate limits across multiple agents sharing the same GitHub App installation token.
Source: github/gh-aw#26239
Current behavior
The gateway has no rate-limit awareness for tool calls:
| Component |
Status |
isTransientHTTPError() detects 429 |
✅ But only used for config schema fetch, not tool calls |
X-RateLimit-* headers propagated in proxy mode |
✅ But not inspected or acted upon |
| Circuit breaker |
❌ Not implemented |
| Retry with backoff for tool calls |
❌ Not implemented |
| Throttling / request budget |
❌ Not implemented |
When the GitHub MCP server returns a rate-limited response, callBackendTool (unified.go) and the proxy handler propagate the error directly to the agent. The agent retries immediately, worsening the rate-limit storm.
Affected code paths
- Gateway mode:
internal/server/unified.go → callBackendTool() → executeBackendToolCall() — no 429 handling
- Proxy mode:
internal/proxy/handler.go → copyResponseHeaders() propagates X-RateLimit-* headers but does not inspect them for backoff decisions
Proposed solution
Phase 1: Rate-limit aware backoff (both modes)
Gateway mode (internal/server/unified.go):
- In
executeBackendToolCall, inspect the backend MCP response for rate-limit indicators
- When the GitHub MCP server returns a tool result indicating rate limiting (error code or
X-RateLimit-Remaining: 0), apply exponential backoff before retrying (up to 3 attempts)
- Log rate-limit events at ERROR level with the
X-RateLimit-Reset timestamp so operators can see when the limit resets
Proxy mode (internal/proxy/handler.go):
- After
copyResponseHeaders, inspect X-RateLimit-Remaining from the upstream response
- When remaining is 0 (or the response is HTTP 429), inject a
Retry-After header into the response to the agent
- Log rate-limit events at ERROR level with reset time and the tool that triggered it
Phase 2: Per-backend circuit breaker
Add a circuit breaker per backend server ID in the gateway:
States: CLOSED → OPEN → HALF-OPEN → CLOSED
- CLOSED (normal): requests pass through
- OPEN (tripped): after N consecutive rate-limit errors, reject requests immediately with a descriptive error and the
X-RateLimit-Reset time — no upstream call made
- HALF-OPEN (probe): after the reset time elapses, allow one probe request. If it succeeds, transition to CLOSED; if rate-limited again, stay OPEN
Configuration (per-server in TOML/JSON):
[servers.github]
type = "http"
url = "..."
# Circuit breaker settings
rate_limit_threshold = 3 # consecutive 429s before opening circuit
rate_limit_cooldown = 60 # seconds to stay OPEN before probing
Phase 3: Request budget / throttling (optional)
Per-session or per-workflow request budget:
- Track request count per
(sessionID, serverID) pair
- When approaching the rate limit (e.g.,
X-RateLimit-Remaining < 100), throttle by adding artificial delay between requests
- Surface budget usage in the gateway health endpoint
Implementation notes
- The
isTransientHTTPError() function in internal/config/validation_schema.go already correctly classifies 429 as transient — this logic should be reused
- The
X-RateLimit-* headers are already captured in httpRequestResult.Header (http_transport.go:391) — they just need to be inspected
- The
copyResponseHeaders() in proxy mode already forwards rate-limit headers — adding inspection is a small change
- Circuit breaker state should live on the
UnifiedServer struct, keyed by server ID
- The existing
lockable pattern from the logger package could be used for the circuit breaker mutex
Problem
The MCP Structural Analysis report (Apr 14, 2026) detected the first-ever GitHub MCP installation rate-limit event: 4 tools hit the 15,000 request/reset limit simultaneously. Without a circuit breaker or throttling strategy, high-frequency workflows trigger cascading rate limits across multiple agents sharing the same GitHub App installation token.
Source: github/gh-aw#26239
Current behavior
The gateway has no rate-limit awareness for tool calls:
isTransientHTTPError()detects 429X-RateLimit-*headers propagated in proxy modeWhen the GitHub MCP server returns a rate-limited response,
callBackendTool(unified.go) and the proxy handler propagate the error directly to the agent. The agent retries immediately, worsening the rate-limit storm.Affected code paths
internal/server/unified.go→callBackendTool()→executeBackendToolCall()— no 429 handlinginternal/proxy/handler.go→copyResponseHeaders()propagatesX-RateLimit-*headers but does not inspect them for backoff decisionsProposed solution
Phase 1: Rate-limit aware backoff (both modes)
Gateway mode (
internal/server/unified.go):executeBackendToolCall, inspect the backend MCP response for rate-limit indicatorsX-RateLimit-Remaining: 0), apply exponential backoff before retrying (up to 3 attempts)X-RateLimit-Resettimestamp so operators can see when the limit resetsProxy mode (
internal/proxy/handler.go):copyResponseHeaders, inspectX-RateLimit-Remainingfrom the upstream responseRetry-Afterheader into the response to the agentPhase 2: Per-backend circuit breaker
Add a circuit breaker per backend server ID in the gateway:
X-RateLimit-Resettime — no upstream call madeConfiguration (per-server in TOML/JSON):
Phase 3: Request budget / throttling (optional)
Per-session or per-workflow request budget:
(sessionID, serverID)pairX-RateLimit-Remaining < 100), throttle by adding artificial delay between requestsImplementation notes
isTransientHTTPError()function ininternal/config/validation_schema.goalready correctly classifies 429 as transient — this logic should be reusedX-RateLimit-*headers are already captured inhttpRequestResult.Header(http_transport.go:391) — they just need to be inspectedcopyResponseHeaders()in proxy mode already forwards rate-limit headers — adding inspection is a small changeUnifiedServerstruct, keyed by server IDlockablepattern from the logger package could be used for the circuit breaker mutex