feat(core): support reasoning effort 'max' tier (DeepSeek extension)#3800
feat(core): support reasoning effort 'max' tier (DeepSeek extension)#3800
Conversation
DeepSeek's chat-completions endpoint added an extra-strong `max` tier to `reasoning_effort` (per https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid values are now `high` and `max`, with `low`/`medium` mapping to `high` for backward compat). Plumb it end-to-end: - `ContentGeneratorConfig.reasoning.effort` union now includes 'max'. - DeepSeek OpenAI-compat provider: translate the standard nested `reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort` body parameter so user-configured effort actually takes effect (the nested shape was previously sent verbatim and silently ignored, defaulting to `high`). low/medium → high mirrors the documented server-side behavior so dashboards / logs match wire reality. An explicit top-level `reasoning_effort` (via samplingParams or extra_body) wins over the nested form. - Anthropic converter: pass 'max' through to `output_config.effort` unchanged and bump the `thinking.budget_tokens` budget for the new tier (low 16k / medium 32k / high 64k / max 128k). - Gemini converter: clamp 'max' to HIGH since Gemini has no higher thinking level. Without this, 'max' would silently fall through to THINKING_LEVEL_UNSPECIFIED. Live verification against api.deepseek.com: - `reasoning_effort: high` → 200 - `reasoning_effort: max` → 200 (the new tier) - `reasoning_effort: bogus`→ 400 with valid-set list confirming [high, low, medium, max, xhigh] 108 anthropic/openai-deepseek/gemini tests pass; full core suite (6601 tests) green; lint + typecheck clean.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end support for DeepSeek’s extended reasoning_effort: 'max' tier, ensuring user-configured reasoning.effort reaches DeepSeek correctly and behaves sensibly on other providers.
Changes:
- Extend
ContentGeneratorConfig.reasoning.effortto include'max'. - DeepSeek OpenAI-compat provider: translate nested
reasoning.effortinto top-levelreasoning_effort(with low/medium → high mapping and no-clobber behavior). - Clamp
'max'for Gemini thinking level, and expand Anthropic generator typing + budget ladder to account for'max', with tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/core/openaiContentGenerator/provider/deepseek.ts | Adds translateReasoningEffort() and wires it into buildRequest() before message reshaping. |
| packages/core/src/core/openaiContentGenerator/provider/deepseek.test.ts | Adds tests covering nested→flat translation, mapping, and no-clobber semantics. |
| packages/core/src/core/geminiContentGenerator/geminiContentGenerator.ts | Maps reasoning.effort: 'max' to Gemini’s thinkingLevel: 'HIGH'. |
| packages/core/src/core/geminiContentGenerator/geminiContentGenerator.test.ts | Adds a test asserting 'max' clamps to HIGH. |
| packages/core/src/core/contentGenerator.ts | Expands the config union type for reasoning.effort to include 'max' with provider caveats in comments. |
| packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts | Expands output_config.effort type and increases the thinking budget ladder for 'max'. |
| packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts | Adds a test asserting 'max' passes through and bumps the thinking budget. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address PR review (copilot × 2) and add missing user docs:
1. (J698) `translateReasoningEffort` claimed in the PR description that
it surfaces the DeepSeek backward-compat mapping client-side, but
only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match
the doc and stay symmetric with the low/medium branch.
2. (J6-A) `output_config.effort: 'max'` would have been emitted on
any anthropic-protocol provider whenever a user configured it, even
when the baseURL points at real `api.anthropic.com` (which only
accepts low/medium/high and would 400). Reuse the existing
`isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'`
on non-DeepSeek anthropic backends, with a debugLogger.warn so the
downgrade is visible. DeepSeek anthropic-compatible endpoints still
pass through unchanged.
3. New docs:
- `docs/users/configuration/model-providers.md`: a "Reasoning /
thinking configuration" section under generationConfig — single
example targeting DeepSeek + a per-provider behavior table
(OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for
other servers, real Anthropic clamp, Anthropic-compatible
DeepSeek passthrough, Gemini thinkingLevel mapping).
- `docs/users/configuration/settings.md`: extend the
`model.generationConfig` description to mention `reasoning`
(the field was undocumented before this PR even though it
already existed as a typed field) and link to the new section.
96 anthropic + deepseek tests pass; lint + typecheck clean.
Review feedback addressed (commit b5b05ae) + docs
附带做了用户文档的补全(
96 个直接相关单测通过;lint + typecheck 干净。 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wenshao
left a comment
There was a problem hiding this comment.
Code Review — deepseek-v4-pro via Qwen Code /review
Verdict: Approve ✅ (downgraded to Comment: self-PR, approve/reject not permitted by GitHub)
Review Summary
14 files, +541 −38. ESLint clean, tsc no new errors, all CI checks passing.
The reasoning effort 'max' tier plumbing is sound:
- DeepSeek: nested → flat translation correct, no-clobber guard works
- Anthropic: budget ladder 16K/32K/64K/128K correct, type union expanded
- Gemini: 'max' properly clamped to HIGH
- Error handling: + idempotency guard correctly prevent double-wrapping
Additional notes (beyond Copilot's review)
- ** format coupling** (): The guard duplicates knowledge of 's output format. If the prefix/suffix changes, the guard silently fails. Consider adding a format-consistency unit test.
- ** vs ** (): Suffix matching uses unanchored — would be more precise (very low risk in practice).
— deepseek-v4-pro via Qwen Code /review
…c fix Address PR review round 2 (copilot × 2): 1. (J8aG) The `contentGenerator.ts` comment claimed passing `reasoning.effort: 'max'` to real Anthropic was "up to the user", but commit b5b05ae actively clamps 'max' → 'high' (with a debug log) on non-DeepSeek anthropic backends. Update the comment to describe current runtime behavior. 2. (J8aL) The clamp ran inside `buildOutputConfig()` only — the effort label was downgraded but `buildThinkingConfig()` still used the raw user value to size the budget, so a non-DeepSeek anthropic request could end up with `output_config.effort: 'high'` paired with a 'max'-sized 128K thinking budget. Inconsistent label vs. budget on the wire. Refactor: hoist the normalization into a single `resolveEffectiveEffort()` helper that runs once per request in `buildRequest()`. Both `buildThinkingConfig` and `buildOutputConfig` now consume the same clamped value, so the budget ladder and the effort label stay aligned. The debug log fires once per request. Add a regression test asserting that on a non-DeepSeek anthropic provider with `effort: 'max'` configured, the wire request carries both `output_config.effort: 'high'` AND `thinking.budget_tokens: 64_000` (the 'high' tier), not the 128K 'max' budget. 96 tests pass; lint + typecheck clean.
Review feedback round 2 addressed (commit 02bf7d7)
CI 备注Ubuntu-24.x 一个 job 失败的是
新提交会触发新一轮 CI,应当稳定通过。 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…n side queries Address PR review round 3 (copilot × 3): 1. (J-2v) When request.config.thinkingConfig.includeThoughts is false, pipeline.buildRequest's post-processing only deleted the nested `reasoning` key. The DeepSeek provider's translateReasoningEffort may have already flattened an extra_body-injected reasoning into top-level `reasoning_effort` by that point, so a side query (e.g. suggestionGenerator) could still ship reasoning_effort on the wire. Extend the post-processing to also delete `reasoning_effort`. 2. (J-2z) The warn for clamping 'max' on non-DeepSeek anthropic ran on every request needing the downgrade — the docstring claimed "first time only" but the implementation didn't latch. Add a private `effortClampWarned` boolean on the generator so the warning fires once per generator lifetime. 3. (J-23) `resolveEffectiveEffort` used the broad `isDeepSeekAnthropicProvider` detector for the clamp decision, but that helper falls back to model-name matching to cover sglang/vllm self-hosted DeepSeek deployments. A model configured as e.g. "deepseek-distill" but routed to real api.anthropic.com would bypass the clamp and trigger HTTP 400. Split the detector: keep `isDeepSeekAnthropicProvider` (broad) for the thinking-block injection workaround where false-positives are harmless, and add `isDeepSeekAnthropicHostname` (hostname-only) for decisions where a model-name false-positive would route DeepSeek-only behavior to a stricter backend. The clamp now uses the hostname-only check. New regression test: a config with model name containing "deepseek" but baseURL pointing at api.anthropic.com still clamps `'max'` to `'high'`. Existing "passes max through" test updated to set a DeepSeek baseURL since model name alone no longer suffices for the clamp bypass. 385 tests pass; lint + typecheck clean.
Review feedback round 3 addressed (commit b7740ec)
新增回归测试:模型名含 'deepseek' 但 baseURL 是 385 单测通过;lint + typecheck 干净。 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address PR review round 4 (copilot × 3) — three documentation accuracy
fixes, no behavior change:
1. (KBcw) The post-processing comment in pipeline.ts misdescribed the
call order ("after this branch already ran during the same
buildRequest pass") — provider.buildRequest actually runs BEFORE
the includeThoughts=false post-processing in the same pass.
Reword to match the actual order: provider hook flattens nested
reasoning to reasoning_effort first, this cleanup runs after and
strips both shapes.
2. (KBdC, KBdE) The "Reasoning / thinking configuration" section in
model-providers.md and the model.generationConfig description in
settings.md both implied `reasoning` is honored on every provider.
For OpenAI-compatible providers, when `generationConfig.samplingParams`
is set, `ContentGenerationPipeline.buildGenerateContentConfig()`
ships samplingParams verbatim and skips the separate `reasoning`
injection entirely. Configs like
`{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }`
would silently drop the reasoning field on OpenAI/DeepSeek
requests.
Add an explicit "Interaction with samplingParams" warning section
in model-providers.md and a parenthetical note in settings.md
directing users to put `reasoning_effort` inside `samplingParams`
(or `extra_body`) when both are configured.
385 tests pass; lint + typecheck clean.
Review feedback round 4 addressed (commit 9c56ec1)
385 单测通过;lint + typecheck 干净。文档改动,行为不变。 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wenshao
left a comment
There was a problem hiding this comment.
When user sets `{ effort: 'max', budget_tokens: N }` on a non-DeepSeek
anthropic backend, the effort label gets clamped to 'high' (otherwise
the server 400s on the unknown enum) but the explicit budget_tokens is
preserved verbatim. The wire-shape mismatch is intentional, not a bug:
the clamp only protects the enum field, while budget is a free integer
the server accepts within the context window, so an explicit override
stays explicit. Document the contract on the early-return and add a
regression test that locks it in.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two doc-only nits called out in review: 1. `buildRequest` JSDoc said non-text parts are "rejected", but `flattenContentParts` actually substitutes a textual placeholder (`[Unsupported content type: <type>]`) so the request still goes through with a breadcrumb. Reword the JSDoc accordingly. 2. `translateReasoningEffort`'s strip comment claimed it strips the nested form to avoid shipping both shapes, but it only drops the duplicated `effort` key when other keys (e.g. `budget_tokens`) are present. Reword to describe the actual selective behavior and why keeping orthogonal keys is intentional. Behavior unchanged.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ostname The provider class is selected via the broader `isDeepSeekProvider` check, which falls back to model-name matching to cover self-hosted DeepSeek deployments (sglang/vllm/ollama, see #3613). That fallback is the right call for content-part flattening — it's a model-format constraint baked into the model itself, not the API surface. But the same broad detection was also gating `translateReasoningEffort`, which rewrites the standard `reasoning: { effort }` config into DeepSeek's flat `reasoning_effort` body parameter. That's a wire-shape decision, not a model-format one: strict OpenAI-compat backends in self-hosted setups may not accept the DeepSeek extension and would have happily handled the original shape. Split the two decisions: keep `isDeepSeekProvider` (broad) for flattening, add a hostname-only `isDeepSeekHostname` and gate the body rewrite on it. Self-hosted DeepSeek users who actually want the translation can either use a baseUrl containing api.deepseek.com or inject `reasoning_effort` directly via `samplingParams`/`extra_body`. Regression tests: - self-hosted (sglang) with deepseek-named model + nested `reasoning.effort` → flattening still runs, body shape preserved - `isDeepSeekHostname` matches api.deepseek.com but not custom hosts
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CodeQL flagged a high-severity URL substring sanitization issue on the
new `isDeepSeekHostname` helper. The naive
`baseUrl.includes('api.deepseek.com')` check would false-positive on
hostile hosts like `https://api.deepseek.com.evil.com/v1` and
incorrectly inject the DeepSeek-only `reasoning_effort` body parameter
into requests routed elsewhere. Switch to `new URL(...).hostname` with
exact match against `api.deepseek.com` (and `.api.deepseek.com`
subdomains), mirroring `isDeepSeekAnthropicHostname` on the Anthropic
side. Invalid URLs treated as non-DeepSeek.
`isDeepSeekProvider` already routes through `isDeepSeekHostname`, so
the hardening applies to both decision paths.
Regression tests cover:
- subdomain match (us.api.deepseek.com)
- hostile substrings (api.deepseek.com.evil.com,
evil.com/api.deepseek.com/v1, api.deepseek.comevil.com,
api-deepseek-com.example.com)
- invalid / empty baseUrl
Also fix two doc-level mismatches: the `'max'` clamp on Anthropic logs
via `debugLogger.warn` (warning level, once per generator), not "with
a debug log". Update both `ContentGeneratorConfig.reasoning` JSDoc and
the per-provider behavior table in model-providers.md.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
DeepSeek extended
reasoning_effortwith amaxtier — the strongest reasoning level abovehigh. Plumb it end-to-end so users can opt in viareasoning.effort: 'max'in settings.Per the DeepSeek docs:
Two issues this fixes:
maxwas not accepted at all.ContentGeneratorConfig.reasoning.effortwas typed as'low' | 'medium' | 'high'only.reasoning.effortwas being silently ignored on DeepSeek. The OpenAI pipeline ships the standard{ reasoning: { effort } }shape verbatim, but DeepSeek expects a flatreasoning_effortbody parameter — so user-configured effort never reached the server, which silently defaulted tohigh.Changes
ContentGeneratorConfig.reasoning.effortunion now includes'max'.deepseek.ts): newtranslateReasoningEffort()step inbuildRequestflattensreasoning.effort→ top-levelreasoning_effort, mirrors the documentedlow/medium→highmapping, and refuses to clobber an already-set top-level value (usersamplingParams/extra_bodystill wins).output_config.efforttype expanded to include'max';thinking.budget_tokensladder bumped to 128K for the new tier (low 16K / medium 32K / high 64K / max 128K).'max'clamps toHIGH(no higher Gemini tier exists), preventing a silent fall-through toTHINKING_LEVEL_UNSPECIFIED.Verification
Live API check against
api.deepseek.com/v1/chat/completions:reasoning_effort: 'high'reasoning_effort: 'max'reasoning_effort: 'bogus'(sanity)[high, low, medium, max, xhigh]Test plan
vitest run packages/core/— 6601 / 4 skipped pass (108 directly affected: 18 deepseek + 33 anthropic generator + 43 anthropic converter + 11 gemini + 3 gemini index)low/medium→highmapping, no-clobber on explicit top-level, partial nested-object preservation, no-op on missing effort, Anthropicmaxbudget bump, Geminimax→ HIGH clamptsc --noEmitcleaneslintclean