Skip to content

fix: retry transient HTTP errors (500/502/503/504) for all providers#21418

Closed
rich-jojo wants to merge 2 commits intoanomalyco:devfrom
rich-jojo:fix-claude-retry
Closed

fix: retry transient HTTP errors (500/502/503/504) for all providers#21418
rich-jojo wants to merge 2 commits intoanomalyco:devfrom
rich-jojo:fix-claude-retry

Conversation

@rich-jojo
Copy link
Copy Markdown

@rich-jojo rich-jojo commented Apr 8, 2026

Closes #21420
Closes #19394

Type of change

  • Bug fix

What does this PR do?

When Anthropic's API returns a transient server error (HTTP 500, 502, 503, or 504), OpenCode immediately fails instead of retrying. This happens because parseAPICallError() in packages/opencode/src/provider/error.ts delegates isRetryable entirely to the Vercel AI SDK for non-OpenAI providers. The @ai-sdk/anthropic package only sets isRetryable: true for status 429 and 529 — all other 5xx codes get isRetryable: false, so SessionRetry.retryable() returns undefined and the retry loop exits immediately.

The fix adds a module-level RETRYABLE_CODES constant (Set([429, 500, 502, 503, 504, 529])) and checks it explicitly in both the OpenAI and non-OpenAI paths of parseAPICallError(). These codes are universally transient across all providers (standard HTTP semantics), so there are no false positives — non-transient codes like 401, 403, 422, and 501 remain unaffected.

OpenAI already had a partial override for 404 via isOpenAiErrorRetryable(); this PR generalises that pattern properly across all providers.

How did you verify your code works?

  • Traced the full error → retry flow from LLM.stream() throw through MessageV2.fromError()parseAPICallError()SessionRetry.retryable()processor.ts retry loop.
  • Confirmed @ai-sdk/anthropic source: 429 and 529 are the only codes it marks isRetryable: true.
  • Verified that RETRYABLE_CODES covers the exact RFC 7231 / Anthropic-documented transient codes and excludes all non-transient codes.

Screenshots / recordings

N/A — backend-only change, no UI impact.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Add RETRYABLE_CODES set (429, 500, 502, 503, 504, 529) and use it
alongside the AI SDK isRetryable flag for all providers. Previously
non-OpenAI providers (Anthropic, Google, etc.) blindly trusted
input.error.isRetryable, but the Vercel AI SDK only marks 429/529 as
retryable for Anthropic — generic 500/502/503/504 responses got
isRetryable:false and were silently dropped instead of retried.

Now transient status codes force a retry regardless of SDK flag,
matching the existing 404 override already in place for OpenAI.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions github-actions bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:issue labels Apr 8, 2026
@rich-jojo
Copy link
Copy Markdown
Author

Also closes #19394 (direct duplicate — same root cause, same fix).

Remove the separate helper function and express all retry logic in one
place. RETRYABLE_CODES covers universally transient codes for every
provider; the OpenAI-only 404 override is now an explicit inline guard,
making provider-specific exceptions immediately visible.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Apr 8, 2026
@github-actions github-actions bot closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Claude/Anthropic 500/502/503/504 server errors are not retried Anthropic 5xx errors incorrectly marked non-retryable, stopping agent loop

1 participant