Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the .gitignore file, modifies proxy tests to support decompression on non-2xx statuses, and adjusts Qwen executor test assertions. It also simplifies the Claude budget normalization logic and enhances stale state tests. Feedback focuses on the removal of the Claude thinking provider test suite, which reduces maintainability, and the potential for 400 errors from the Anthropic API due to the simplified budget normalization logic; it is recommended to disable thinking rather than forwarding invalid requests.
| @@ -1,99 +0,0 @@ | |||
| package claude | |||
There was a problem hiding this comment.
Deleting the entire test suite for the Claude thinking provider significantly reduces the maintainability and reliability of this module. Even if the underlying logic has been simplified, the tests should be updated to verify the new behavior rather than being removed entirely. This is especially important given the changes made to normalizeClaudeBudget in apply.go.
| // If enforcing the max_tokens constraint would push the budget below the model minimum, | ||
| // leave the request unchanged. | ||
| return body |
There was a problem hiding this comment.
The removal of the logic to handle unsatisfiable constraints (where budget_tokens < max_tokens cannot be met without falling below min_budget) means the proxy will now forward requests that are guaranteed to be rejected with a 400 error by the Anthropic API. It is generally better to disable thinking entirely rather than allowing a known-invalid request to proceed.
// If enforcing the max_tokens constraint would push the budget below the model minimum,
// disable thinking to avoid a 400 error from the upstream API.
body, _ = sjson.SetBytes(body, "thinking.type", "disabled")
body, _ = sjson.DeleteBytes(body, "thinking.budget_tokens")
return bodyThere was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d390b95b76
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if minBudget > 0 && adjustedBudget > 0 && adjustedBudget < minBudget { | ||
| // Enforcing budget_tokens < max_tokens pushed the budget below the model minimum. | ||
| // Try raising max_tokens to fit the original budget. | ||
| needed := budgetTokens + 1 | ||
| maxAllowed := 0 | ||
| if modelInfo != nil { | ||
| maxAllowed = modelInfo.MaxCompletionTokens | ||
| } | ||
| if maxAllowed > 0 && needed > maxAllowed { | ||
| // Cannot use original budget; cap max_tokens at model limit. | ||
| needed = maxAllowed | ||
| } | ||
| cappedBudget := needed - 1 | ||
| if cappedBudget < minBudget { | ||
| // Impossible to satisfy both budget >= minBudget and budget < max_tokens | ||
| // within the model's completion limit. Disable thinking entirely. | ||
| body, _ = sjson.DeleteBytes(body, "thinking") | ||
| return body | ||
| } | ||
| body, _ = sjson.SetBytes(body, "max_tokens", needed) | ||
| if cappedBudget != budgetTokens { | ||
| body, _ = sjson.SetBytes(body, "thinking.budget_tokens", cappedBudget) | ||
| } | ||
| // If enforcing the max_tokens constraint would push the budget below the model minimum, | ||
| // leave the request unchanged. | ||
| return body |
There was a problem hiding this comment.
Preserve Claude budget/max_tokens invariant in min-budget branch
Returning early here can leave thinking.budget_tokens greater than max_tokens, which violates the Anthropic constraint this function is supposed to enforce and causes 400 responses for otherwise valid thinking configs. In the common case where a request provides a small max_tokens (for example 1000) and a valid Claude budget (for example 5000 with model min 1024), adjustedBudget drops below the minimum and this branch now returns without repairing either field, whereas the previous logic raised/capped max_tokens or disabled thinking to keep requests valid.
Useful? React with 👍 / 👎.
No description provided.