fix: add mapping of deepseek_v32 model type by mpashkovskii · Pull Request #42767 · huggingface/transformers

mpashkovskii · 2025-12-10T12:03:13Z

What does this PR do?

Adds the missing mapping for model type deepseek_v32 to deepseek_v3 model and DeepseekV3Config

Fixes #42590

Before submitting

(almost) This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Cyrilvallez could you please review the changes?

mpashkovskii · 2025-12-10T13:28:12Z

I think the tests are failing because of the irrelevant ResNet precision error.

closes: huggingface#42590

huzama · 2025-12-11T06:56:02Z

I noticed that deepseek-ai/DeepSeek-V3.2 uses DeepSeek's native sparse attention. Does the current deepseek_v3 architecture support this? I don't see the Indexer or selector in the code here, so I wonder if this mapping is safe:

transformers/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py

Lines 430 to 461 in 8ebfd84

    
           if self.q_lora_rank is None: 
        
               q_states = self.q_proj(hidden_states) 
        
           else: 
        
               q_states = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(hidden_states))) 
        
           q_states = q_states.view(query_shape).transpose(1, 2) 
        
           q_pass, q_rot = torch.split(q_states, [self.qk_nope_head_dim, self.qk_rope_head_dim], dim=-1) 
        
           compressed_kv = self.kv_a_proj_with_mqa(hidden_states) 
        
           k_pass, k_rot = torch.split(compressed_kv, [self.kv_lora_rank, self.qk_rope_head_dim], dim=-1) 
        
           k_pass = self.kv_b_proj(self.kv_a_layernorm(k_pass)).view(key_shape).transpose(1, 2) 
        
           k_pass, value_states = torch.split(k_pass, [self.qk_nope_head_dim, self.v_head_dim], dim=-1) 
        
           k_rot = k_rot.view(batch_size, 1, seq_length, self.qk_rope_head_dim) 
        
           cos, sin = position_embeddings 
        
           if self.config.rope_interleave:  # support using interleaved weights for efficiency 
        
               q_rot, k_rot = apply_rotary_pos_emb_interleave(q_rot, k_rot, cos, sin) 
        
           else: 
        
               q_rot, k_rot = apply_rotary_pos_emb(q_rot, k_rot, cos, sin) 
        
           k_rot = k_rot.expand(*k_pass.shape[:-1], -1) 
        
           query_states = torch.cat((q_pass, q_rot), dim=-1) 
        
           key_states = torch.cat((k_pass, k_rot), dim=-1) 
        
           if past_key_values is not None: 
        
               # sin and cos are specific to RoPE models; cache_position needed for the static cache 
        
               cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} 
        
               key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs) 
        
           if self.config._attn_implementation == "flash_attention_2" and self.qk_head_dim != self.v_head_dim: 
        
               value_states = F.pad(value_states, [0, self.qk_head_dim - self.v_head_dim])

Rocketknight1 · 2025-12-11T14:12:28Z

Yes, I don't think we can just map the new model to the old architecture!

github-actions · 2025-12-15T21:22:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_v32

mpashkovskii · 2025-12-15T21:26:24Z

Hi @huzama and @Rocketknight1, thanks for pointing that out. I’ve added the initial DeepSeek v3.2 implementation, but it still needs more testing and validation. I’d appreciate any feedback you have.

Do you know if anyone else is actively working on this? If so, does it make sense to complete the implementation in this PR?

github-actions · 2025-12-15T21:28:53Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42767&sha=f17882

huzama · 2025-12-15T23:29:12Z

@mpashkovskii, I’m working on implementing an indexer and top k feature for a personal project. However, there are some minor changes needed to make it into a pull request.

You can try writing the code for the Indexer of DSA yourself. Alternatively, once I have a well-drafted version, I can also push the changes.

freedom-cui · 2025-12-23T05:14:37Z

Hello @mpashkovskii @huzama
Does this temporary PR already support dpsk-v3.2?

freedom-cui · 2025-12-23T08:46:58Z

@mpashkovskii hello
When I used your Pr, I noticed that the loaded model uses the DeepSeekv3 model structure instead of DeepSeekv32.

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "DeepSeek-v3.2"

config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)

model = AutoModelForCausalLM.from_config(
    config=config, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=dtype
)

huzama · 2025-12-23T08:49:46Z

@freedom-cui the model is not implemented yet as of last commit. If you need only inference please check out VLLM library!

freedom-cui · 2025-12-23T08:57:24Z

@freedom-cui the model is not implemented yet as of last commit. If you need only inference please check out VLLM library!

Thank you very much for your reply. Is there a complete schedule available for supporting Deepseek v3.2 at this time?

vasqu · 2026-01-12T17:55:35Z

Please see #41251 (comment) cc @ArthurZucker

mpashkovskii added 2 commits December 10, 2025 16:39

fix: add mapping of deepseek_v32 model type

68fbd2b

closes: huggingface#42590

fix: update model mapping

2f72c0f

mpashkovskii force-pushed the fix/add-deepseek_v32 branch from 0cbfa6e to 2f72c0f Compare December 10, 2025 14:39

feat: implement DeepSeek v3.2 model

f178825

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

dagil-nvidia mentioned this pull request Apr 30, 2026

fix(recipes): pin transformers==4.57.6 in deepseek-v32-fp4 perf jobs ai-dynamo/dynamo#8690

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add mapping of deepseek_v32 model type#42767

fix: add mapping of deepseek_v32 model type#42767
mpashkovskii wants to merge 3 commits intohuggingface:mainfrom
mpashkovskii:fix/add-deepseek_v32

mpashkovskii commented Dec 10, 2025 •

edited

Loading

Uh oh!

mpashkovskii commented Dec 10, 2025

Uh oh!

huzama commented Dec 11, 2025

Uh oh!

Rocketknight1 commented Dec 11, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

mpashkovskii commented Dec 15, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

huzama commented Dec 15, 2025

Uh oh!

freedom-cui commented Dec 23, 2025

Uh oh!

freedom-cui commented Dec 23, 2025

Uh oh!

huzama commented Dec 23, 2025

Uh oh!

freedom-cui commented Dec 23, 2025

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mpashkovskii commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

mpashkovskii commented Dec 10, 2025

Uh oh!

huzama commented Dec 11, 2025

Uh oh!

Rocketknight1 commented Dec 11, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

mpashkovskii commented Dec 15, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

huzama commented Dec 15, 2025

Uh oh!

freedom-cui commented Dec 23, 2025

Uh oh!

freedom-cui commented Dec 23, 2025

Uh oh!

huzama commented Dec 23, 2025

Uh oh!

freedom-cui commented Dec 23, 2025

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mpashkovskii commented Dec 10, 2025 •

edited

Loading