Add deepseek 3.2 exp by ArthurZucker · Pull Request #41251 · huggingface/transformers

ArthurZucker · 2025-10-01T12:41:21Z

What does this PR do?

from transformers import FineGrainedFP8Config, AutoModelForCausalLM, AutoTokenizer
import torch 


model_name = "deepseek-ai/DeepSeek-V3.2"
quantization_config = FineGrainedFP8Config(
    modules_to_not_convert=["model.layers.*.mlp.gate.*", "*.self_attn.indexer.weights_proj.*"],
    weight_block_size = (128, 128),

)
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto", quantization_config=quantization_config)

tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda:0")

output = quantized_model.generate(**input_ids, max_new_tokens=10)
print(tokenizer.decode(output[0], skip_special_tokens=True))

HuggingFaceDocBuilderDev · 2025-10-01T12:50:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

HandH1998 · 2025-10-09T04:23:39Z

Hello, thanks for your support for deepseek v3.2! I wonder when this PR will be ready?

ArthurZucker · 2025-10-09T14:53:54Z

Working on it! Hoping by next week 🤗

ArthurZucker · 2025-11-17T12:03:18Z

wow this got old!

nfywsh · 2025-12-03T08:47:26Z

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

yunkchen · 2025-12-03T08:59:39Z

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

nfywsh · 2025-12-03T09:37:29Z

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

There is stillDeepseekV32Attention a bug when using LLMC to quantify the model:[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 248, in
[rank0]: main(config)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 27, in main
[rank0]: model = MODEL_REGISTRYconfig.model.type
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/deepseekv3.py", line 20, in init
[rank0]: super().init(config, device_map, use_cache)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/base_model.py", line 40, in init
[rank0]: self.build_model()
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/deepseekv3.py", line 40, in build_model
[rank0]: self.model = AutoModelForCausalLM.from_pretrained(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
[rank0]: model = cls(config, *model_args, model_kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 555, in init
[rank0]: self.model = DeepseekV32Model(config)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 477, in init
[rank0]: [DeepseekV32DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 477, in
[rank0]: [DeepseekV32DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 404, in init
[rank0]: self.self_attn = DeepseekV32Attention(config=config, layer_idx=layer_idx)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 316, in init
[rank0]: self.softmax_scale = self.qk_head_dim-0.5
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1928, in getattr
[rank0]: raise AttributeError(
[rank0]: AttributeError: 'DeepseekV32Attention' object has no attribute 'qk_head_dim'

yunkchen · 2025-12-03T09:52:17Z

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

New commit pushed, sorry.

nfywsh · 2025-12-03T11:14:47Z

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

New commit pushed, sorry.

A bug still occurred when running the LLMC quantization model:

[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 248, in
[rank0]: main(config)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 55, in main
[rank0]: model.collect_first_block_input(calib_data, padding_mask)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/base_model.py", line 246, in collect_first_block_input
[rank0]: self.model(**data)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py", line 918, in wrapper
[rank0]: output = func(self, *args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 526, in forward
[rank0]: outputs: BaseModelOutputWithPast = self.model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: TypeError: check_model_inputs..wrapped_fn() got an unexpected keyword argument 'input_ids'

bmtwl · 2025-12-03T16:43:15Z

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

New commit pushed, sorry.

I'm seeing problems in this branch with rope factor/beta_fast/beta_slow values not being floats. Is this an oversight?

jyliu24 · 2025-12-03T20:23:51Z

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

Isn't this implementation still O(L^2) since it just masks full attention to the indexer's topk?

michaelroyzen · 2025-12-24T01:53:57Z

@ArthurZucker Is this ready to merge? I'd really love to experiment with some DeepSeek 3.2 Speciale fine tunes.

freedom-cui · 2025-12-25T08:42:00Z

@ArthurZucker @yunkchen
hello， Has there been any progress?

michaelroyzen · 2026-01-07T07:27:40Z

@ArthurZucker Happy holidays, checking in again :)

Can we get this merged please?

ArthurZucker · 2026-01-08T09:52:15Z

Hey! Thanks, just got back from holidays, we shipped https://github.com/huggingface/transformers/blob/57278c904c5158999d31a0db8bfcd63360c37b48 but now I should be able to get back! sorry for the delay everyone v5 needed a slow down in model addition to support all the new features, especially default FP8 weight support!

michaelroyzen · 2026-01-14T18:24:36Z

Thanks @ArthurZucker, do you have an ETA? Getting this in would be massively helpful to me and the community. Happy to help however I can.

RissyRan · 2026-01-16T02:10:08Z

Hi @ArthurZucker, just checking in to see if there are any updates on this? There is a lot of interest for this change, so we’re excited to see it move forward! :) Thanks!

ArthurZucker · 2026-03-25T13:51:03Z

glm <=> deepseek

…eepseek-exp

leideng · 2026-04-13T09:29:00Z

Looking forward to the merge asap

github-actions · 2026-04-14T16:40:40Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_v32, glm_moe_dsa

github-actions · 2026-04-14T16:49:12Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41251&sha=129ad2

Root cause of DYN-2878: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 pins transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf's `transformers>=4.56.0` floor; with default pip resolution this picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Fix: add transformers<5 to the pip install in both perf.yaml files. The pin keeps aiperf's floor satisfied (resolves to 4.57.6 today) and prevents the silent upgrade past native deepseek_v32 support. Verified end-to-end: - transformers 4.55.0 base + `pip install aiperf==0.6.0` -> 5.7.0 -> fail - transformers 4.55.0 base + `pip install "aiperf==0.6.0" "transformers<5"` -> 4.57.6 -> aiperf Tokenizer.from_pretrained('nvidia/DeepSeek-V3.2-NVFP4') loads LlamaTokenizerFast, vocab=128000 Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>

Mirror dynamo's pyproject `transformers>=4.56.0` floor and add an upper bound of <5 to fix DYN-2878. Root cause: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor. Without an upper bound, default pip resolution picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Fix: add `transformers>=4.56.0,<5` to the pip install in both perf.yaml files. The lower bound matches dynamo/pyproject.toml so the perf job runs against the same transformers contract as the rest of dynamo; the upper bound prevents the silent upgrade past native deepseek_v32 support. Verified end-to-end: - transformers 4.55.0 base + `pip install aiperf==0.6.0` -> 5.7.0 -> fail - transformers 4.55.0 base + `pip install "aiperf==0.6.0" "transformers>=4.56.0,<5"` -> 4.57.6 -> aiperf Tokenizer.from_pretrained('nvidia/DeepSeek-V3.2-NVFP4') loads LlamaTokenizerFast, vocab=128000 Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>

Match dynamo/pyproject.toml's declared transformers floor (>=4.56.0) by exact-pinning transformers==4.56.0 in the perf-job pip install. Fixes DYN-2878. Root cause: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor; with default pip resolution this picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Both nvidia/DeepSeek-V3.2-NVFP4 and deepseek-ai/DeepSeek-V3.2 carry model_type=deepseek_v32 and fail identically on transformers >= 5.x; this regressed silently when transformers 5.0 shipped, with no change in this repo. Fix: add `transformers==4.56.0` to the pip install in both perf.yaml files. The version matches dynamo/pyproject.toml's stated floor so the perf job runs against the same transformers contract as the rest of dynamo, and the exact pin is deterministic across job re-runs. Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>

@nealvaidya

Exact-pin transformers to the version verified to load the model_type=deepseek_v32 tokenizer (per @nealvaidya's review). Fixes DYN-2878. Root cause: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor; with default pip resolution this picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Both nvidia/DeepSeek-V3.2-NVFP4 and deepseek-ai/DeepSeek-V3.2 carry model_type=deepseek_v32 and fail identically on transformers >= 5.x; this regressed silently when transformers 5.0 shipped, with no change in this repo. Fix: pin `transformers==4.57.6` in the pip install in both perf.yaml files. 4.57.6 is the latest 4.x release and is verified to load the deepseek_v32 tokenizer end-to-end via aiperf's Tokenizer wrapper. Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>

ArthurZucker added 3 commits October 1, 2025 10:35

initial commit

57ba98f

updates

06acab8

up

c379296

ArthurZucker mentioned this pull request Oct 1, 2025

[Hack] Support config file for DeepSeek-V3.2 #41197

Closed

ArthurZucker mentioned this pull request Oct 1, 2025

[Model] Support Deepseek-V3.2-Exp #41196

Open

Rocketknight1 mentioned this pull request Oct 2, 2025

does DeepSeek-V3.2-Exp have update？ #41274

Closed

nv-yilinf mentioned this pull request Oct 28, 2025

[None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench NVIDIA/TensorRT-LLM#8729

Merged

1 task

ArthurZucker marked this pull request as ready for review November 17, 2025 12:19

Merge branch 'main' into add-deepseek-exp

296fd44

Rocketknight1 mentioned this pull request Dec 3, 2025

model type deepseek_v32 #42590

Open

michaelroyzen mentioned this pull request Dec 27, 2025

DeepSeek v3.2 support #43037

Open

vasqu mentioned this pull request Jan 12, 2026

fix: add mapping of deepseek_v32 model type #42767

Open

5 tasks

yuchen-ecnu mentioned this pull request Jan 14, 2026

[data][llm] DeepSeek v3.2 fails to initialize ray-project/ray#60056

Closed

ArthurZucker added 4 commits March 25, 2026 16:10

Merge branch 'main' of github.com:huggingface/transformers into add-d…

04ea98d

…eepseek-exp

nit

6841457

nits

b89efd0

:)

79370bf

shuningjin mentioned this pull request Mar 26, 2026

checkpoint utility: optimize to_maxtext, add deepseek AI-Hypercomputer/maxtext#3184

Merged

4 tasks

ArthurZucker and others added 14 commits April 14, 2026 10:30

Merge branch 'main' into add-deepseek-exp

8ddcf6c

config updates

bd9a615

fresh start

8990b72

up

339e7a2

up

a613ef6

mini up

b1c6ff2

another nit

569385c

fix rope yarn requirements

8fb360d

fix tokenizer and config

24c4ba7

nit

f5f9894

push the rope fix

8df236a

remove einsums

89cad53

rope yarn needs head dim

a98f43a

up

129ad28

tarekziade mentioned this pull request Apr 28, 2026

Add deepseek 3.2 exp tarekziade/tarekziade-transformers-reviewer-test#10

Open

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

dagil-nvidia mentioned this pull request Apr 30, 2026

fix(recipes): pin transformers==4.57.6 in deepseek-v32-fp4 perf jobs ai-dynamo/dynamo#8690

Merged

3 tasks

Conversation

ArthurZucker commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

HandH1998 commented Oct 9, 2025

Uh oh!

ArthurZucker commented Oct 9, 2025

Uh oh!

ArthurZucker commented Nov 17, 2025

Uh oh!

nfywsh commented Dec 3, 2025

Uh oh!

yunkchen commented Dec 3, 2025

Uh oh!

nfywsh commented Dec 3, 2025

Uh oh!

yunkchen commented Dec 3, 2025

Uh oh!

nfywsh commented Dec 3, 2025

Uh oh!

bmtwl commented Dec 3, 2025

Uh oh!

jyliu24 commented Dec 3, 2025

Uh oh!

michaelroyzen commented Dec 24, 2025

Uh oh!

freedom-cui commented Dec 25, 2025

Uh oh!

michaelroyzen commented Jan 7, 2026

Uh oh!

ArthurZucker commented Jan 8, 2026

Uh oh!

michaelroyzen commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RissyRan commented Jan 16, 2026

Uh oh!

ArthurZucker commented Mar 25, 2026

Uh oh!

leideng commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

ArthurZucker commented Oct 1, 2025 •

edited

Loading

michaelroyzen commented Jan 14, 2026 •

edited

Loading