Increase tool call read timeout 30s -> 120s and add exponential backo… #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I ran the MCP atlas eval like so:
and experienced many MCP read timeouts https://pastebin.com/z6MH1K1E .
The current behavior does a 10s delay prior to retry causing possible thundering herd behavior. This can be partially mitigated using exponential backoff + jitter. I also increased the read timeout from 30s to 120s since I still saw some MCP tools consistently timeout (e.g. OSM); this is likely due to these tools simply having a larger response time.
After this change I found that nearly all read timeouts were mitigated; less than a handful occurring afterwards. I still think there is room to improve by having a granular tool-dependent timeout (e.g. some tools take longer than others and thus should have larger timeouts).