fix(auth): honor disable-cooling and enrich no-auth errors#2576
fix(auth): honor disable-cooling and enrich no-auth errors#2576luispater merged 1 commit intorouter-for-me:devfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces error enrichment for authentication failures and adds support for disabling the cooldown mechanism via metadata. Specifically, the enrichAuthSelectionError function was implemented to provide detailed context and troubleshooting hints for authentication errors. Additionally, the authentication manager was updated to respect a disable_cooling flag in the auth metadata, preventing automatic suspension or retry delays for specific error codes. Unit tests were included to verify both the error enrichment logic and the cooldown override behavior. I have no feedback to provide.
4c3c643 to
56acff1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56acff1049
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
56acff1 to
1a3b64c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a3b64cf34
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
1a3b64c to
0ea7680
Compare
luispater
left a comment
There was a problem hiding this comment.
Summary:
The disable-cooling handling is now applied consistently across both model- and auth-level failure paths, and the auth selection error enrichment makes the failure mode materially more diagnosable.
Key findings:
- No blocking findings.
Test plan:
- Reviewed the new handler/auth conductor regression tests.
This is an automated Codex review result and still requires manual verification by a human reviewer.
luispater
left a comment
There was a problem hiding this comment.
Summary:
This makes disable-cooling behavior more consistent across model-level and auth-level failure paths, and the error enrichment is backed by focused handler/auth regression tests.
Key findings:
- No blocking issues found in the diff I reviewed.
Test plan:
- Reviewed the
sdk/cliproxy/auth/conductor.gochanges. - Reviewed the added tests in
sdk/api/handlersandsdk/cliproxy/auth.
This is an automated Codex review result and still requires manual verification by a human reviewer.
概要
本 PR 修复两类线上可见问题:
auth_not_found/auth_unavailable错误信息过于泛化,排障成本高。disable-cooling: true在部分失败路径下未完全生效,仍会把 auth/model 打入“不可用窗口”。关联:#1706
线上问题表现(修复前)
问题 A:
no auth available返回信息不可诊断/v1/messages(Anthropic Messages 格式)请求失败。auth unavailable: no auth available,且很多情况下落成500。问题 B:
disable-cooling: true仍会“用着用着突然不可用”403或429。disable-cooling后,不应进入 cooldown/blackout。403等状态仍可能写入不可用时间或触发挂起;429即使不产生 backoff 时长,仍可能触发 quota 挂起副作用;401/402/403/404/408/5xx仍会写NextRetryAfter。no auth available,看起来像“突然坏了”。根因分析
根因 A(错误可观测性不足)
500。根因 B(disable-cooling 覆盖不完整)
disable-cooling判定散落在多个分支,语义不一致。429分支存在“无 cooldown 但仍挂起”的副作用路径。修复内容
1) 鉴权错误信息增强(API Handler)
涉及文件:
sdk/api/handlers/handlers.gosdk/api/handlers/handlers_error_response_test.go改动:
enrichAuthSelectionError(err, providers, model)。ExecuteWithAuthManager、ExecuteCountWithAuthManager、ExecuteStreamWithAuthManager统一调用。auth_not_found/auth_unavailable:providers=...、model=...;claude时,增加/v0/management/auth-files排障提示;503 Service Unavailable;2)
disable-cooling语义修复(Auth Conductor)涉及文件:
sdk/cliproxy/auth/conductor.gosdk/cliproxy/auth/conductor_overrides_test.go改动:
disableCooling判定。disable-cooling=true:401/402/403/404/408/5xx不再设置NextRetryAfter;429不再触发模型挂起/配额挂起副作用(避免进入 blackout 状态)。applyAuthFailureState同步同样规则,消除路径差异。行为对比(Before / After)
500,仅no auth available503/v0/management/auth-files提示disable-cooling=true+403disable-cooling=true+429测试
新增/更新:
TestEnrichAuthSelectionError_DefaultsTo503WithContextTestEnrichAuthSelectionError_PreservesExplicitStatusTestEnrichAuthSelectionError_IgnoresOtherErrorsTestManager_MarkResult_RespectsAuthDisableCoolingOverride_On403TestManager_Execute_DisableCooling_DoesNotBlackoutAfter403执行结果:
go test -count=1 ./sdk/cliproxy/auth ./sdk/api/handlers