fix: retry transient HTTP errors (500/502/503/504) for all providers#21418
Closed
rich-jojo wants to merge 2 commits intoanomalyco:devfrom
Closed
fix: retry transient HTTP errors (500/502/503/504) for all providers#21418rich-jojo wants to merge 2 commits intoanomalyco:devfrom
rich-jojo wants to merge 2 commits intoanomalyco:devfrom
Conversation
Add RETRYABLE_CODES set (429, 500, 502, 503, 504, 529) and use it alongside the AI SDK isRetryable flag for all providers. Previously non-OpenAI providers (Anthropic, Google, etc.) blindly trusted input.error.isRetryable, but the Vercel AI SDK only marks 429/529 as retryable for Anthropic — generic 500/502/503/504 responses got isRetryable:false and were silently dropped instead of retried. Now transient status codes force a retry regardless of SDK flag, matching the existing 404 override already in place for OpenAI.
Contributor
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
Contributor
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
Author
|
Also closes #19394 (direct duplicate — same root cause, same fix). |
Remove the separate helper function and express all retry logic in one place. RETRYABLE_CODES covers universally transient codes for every provider; the OpenAI-only 404 override is now an explicit inline guard, making provider-specific exceptions immediately visible.
Contributor
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #21420
Closes #19394
Type of change
What does this PR do?
When Anthropic's API returns a transient server error (HTTP 500, 502, 503, or 504), OpenCode immediately fails instead of retrying. This happens because
parseAPICallError()inpackages/opencode/src/provider/error.tsdelegatesisRetryableentirely to the Vercel AI SDK for non-OpenAI providers. The@ai-sdk/anthropicpackage only setsisRetryable: truefor status 429 and 529 — all other 5xx codes getisRetryable: false, soSessionRetry.retryable()returnsundefinedand the retry loop exits immediately.The fix adds a module-level
RETRYABLE_CODESconstant (Set([429, 500, 502, 503, 504, 529])) and checks it explicitly in both the OpenAI and non-OpenAI paths ofparseAPICallError(). These codes are universally transient across all providers (standard HTTP semantics), so there are no false positives — non-transient codes like 401, 403, 422, and 501 remain unaffected.OpenAI already had a partial override for 404 via
isOpenAiErrorRetryable(); this PR generalises that pattern properly across all providers.How did you verify your code works?
LLM.stream()throw throughMessageV2.fromError()→parseAPICallError()→SessionRetry.retryable()→processor.tsretry loop.@ai-sdk/anthropicsource: 429 and 529 are the only codes it marksisRetryable: true.RETRYABLE_CODEScovers the exact RFC 7231 / Anthropic-documented transient codes and excludes all non-transient codes.Screenshots / recordings
N/A — backend-only change, no UI impact.
Checklist