Skip to content

Add deepseek 3.2 exp#41251

Open
ArthurZucker wants to merge 54 commits intomainfrom
add-deepseek-exp
Open

Add deepseek 3.2 exp#41251
ArthurZucker wants to merge 54 commits intomainfrom
add-deepseek-exp

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Oct 1, 2025

What does this PR do?

from transformers import FineGrainedFP8Config, AutoModelForCausalLM, AutoTokenizer
import torch 


model_name = "deepseek-ai/DeepSeek-V3.2"
quantization_config = FineGrainedFP8Config(
    modules_to_not_convert=["model.layers.*.mlp.gate.*", "*.self_attn.indexer.weights_proj.*"],
    weight_block_size = (128, 128),

)
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto", quantization_config=quantization_config)

tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda:0")

output = quantized_model.generate(**input_ids, max_new_tokens=10)
print(tokenizer.decode(output[0], skip_special_tokens=True))

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@HandH1998
Copy link
Copy Markdown

Hello, thanks for your support for deepseek v3.2! I wonder when this PR will be ready?

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

Working on it! Hoping by next week 🤗

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

wow this got old!

@ArthurZucker ArthurZucker marked this pull request as ready for review November 17, 2025 12:19
@nfywsh
Copy link
Copy Markdown

nfywsh commented Dec 3, 2025

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

@yunkchen
Copy link
Copy Markdown

yunkchen commented Dec 3, 2025

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

@nfywsh
Copy link
Copy Markdown

nfywsh commented Dec 3, 2025

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

There is stillDeepseekV32Attention a bug when using LLMC to quantify the model:[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 248, in
[rank0]: main(config)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 27, in main
[rank0]: model = MODEL_REGISTRYconfig.model.type
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/deepseekv3.py", line 20, in init
[rank0]: super().init(config, device_map, use_cache)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/base_model.py", line 40, in init
[rank0]: self.build_model()
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/deepseekv3.py", line 40, in build_model
[rank0]: self.model = AutoModelForCausalLM.from_pretrained(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
[rank0]: model = cls(config, *model_args, model_kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 555, in init
[rank0]: self.model = DeepseekV32Model(config)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 477, in init
[rank0]: [DeepseekV32DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 477, in
[rank0]: [DeepseekV32DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 404, in init
[rank0]: self.self_attn = DeepseekV32Attention(config=config, layer_idx=layer_idx)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 316, in init
[rank0]: self.softmax_scale = self.qk_head_dim
-0.5
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1928, in getattr
[rank0]: raise AttributeError(
[rank0]: AttributeError: 'DeepseekV32Attention' object has no attribute 'qk_head_dim'

@yunkchen
Copy link
Copy Markdown

yunkchen commented Dec 3, 2025

@nfywsh
Copy link
Copy Markdown

nfywsh commented Dec 3, 2025

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

New commit pushed, sorry.

A bug still occurred when running the LLMC quantization model:

[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 248, in
[rank0]: main(config)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/main.py", line 55, in main
[rank0]: model.collect_first_block_input(calib_data, padding_mask)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/hcufs/env_scripts/xyf_test/llmc/llmc/models/base_model.py", line 246, in collect_first_block_input
[rank0]: self.model(**data)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py", line 918, in wrapper
[rank0]: output = func(self, *args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/deepseek_v32/modeling_deepseek_v32.py", line 526, in forward
[rank0]: outputs: BaseModelOutputWithPast = self.model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: TypeError: check_model_inputs..wrapped_fn() got an unexpected keyword argument 'input_ids'

@bmtwl
Copy link
Copy Markdown

bmtwl commented Dec 3, 2025

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

New commit pushed, sorry.

I'm seeing problems in this branch with rope factor/beta_fast/beta_slow values not being floats. Is this an oversight?

@jyliu24
Copy link
Copy Markdown

jyliu24 commented Dec 3, 2025

The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated?

https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32

Isn't this implementation still O(L^2) since it just masks full attention to the indexer's topk?

@michaelroyzen
Copy link
Copy Markdown

@ArthurZucker Is this ready to merge? I'd really love to experiment with some DeepSeek 3.2 Speciale fine tunes.

@freedom-cui
Copy link
Copy Markdown

@ArthurZucker @yunkchen
hello, Has there been any progress?

@michaelroyzen
Copy link
Copy Markdown

@ArthurZucker Happy holidays, checking in again :)

Can we get this merged please?

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

Hey! Thanks, just got back from holidays, we shipped https://github.com/huggingface/transformers/blob/57278c904c5158999d31a0db8bfcd63360c37b48 but now I should be able to get back! sorry for the delay everyone v5 needed a slow down in model addition to support all the new features, especially default FP8 weight support!

@michaelroyzen
Copy link
Copy Markdown

michaelroyzen commented Jan 14, 2026

Thanks @ArthurZucker, do you have an ETA? Getting this in would be massively helpful to me and the community. Happy to help however I can.

@RissyRan
Copy link
Copy Markdown
Contributor

Hi @ArthurZucker, just checking in to see if there are any updates on this? There is a lot of interest for this change, so we’re excited to see it move forward! :) Thanks!

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

glm <=> deepseek

@leideng
Copy link
Copy Markdown

leideng commented Apr 13, 2026

Looking forward to the merge asap

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_v32, glm_moe_dsa

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41251&sha=129ad2

dagil-nvidia added a commit to ai-dynamo/dynamo that referenced this pull request Apr 30, 2026
Root cause of DYN-2878:

- nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 pins transformers==4.55.0.
- The perf.yaml install line `pip install aiperf==0.6.0` upgrades
  transformers to satisfy aiperf's `transformers>=4.56.0` floor; with
  default pip resolution this picks the latest release, currently 5.7.0.
- transformers 5.x has no native support for model_type=deepseek_v32
  (still pending in huggingface/transformers#41251 and #42767), so
  AutoTokenizer.from_pretrained() raises
  AttributeError: 'PreTrainedConfig' object has no attribute
  'max_position_embeddings' before reading tokenizer.json.
- aiperf wraps the exception as
  TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'.

Fix: add transformers<5 to the pip install in both perf.yaml files.
The pin keeps aiperf's floor satisfied (resolves to 4.57.6 today) and
prevents the silent upgrade past native deepseek_v32 support.

Verified end-to-end:
- transformers 4.55.0 base + `pip install aiperf==0.6.0` -> 5.7.0 -> fail
- transformers 4.55.0 base + `pip install "aiperf==0.6.0" "transformers<5"`
  -> 4.57.6 -> aiperf Tokenizer.from_pretrained('nvidia/DeepSeek-V3.2-NVFP4')
  loads LlamaTokenizerFast, vocab=128000

Files:
- recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml
- recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml

Signed-off-by: Dan Gil <dagil@nvidia.com>
dagil-nvidia added a commit to ai-dynamo/dynamo that referenced this pull request Apr 30, 2026
Mirror dynamo's pyproject `transformers>=4.56.0` floor and add an upper
bound of <5 to fix DYN-2878.

Root cause:

- nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0.
- The perf.yaml install line `pip install aiperf==0.6.0` upgrades
  transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor.
  Without an upper bound, default pip resolution picks the latest
  release, currently 5.7.0.
- transformers 5.x has no native support for model_type=deepseek_v32
  (still pending in huggingface/transformers#41251 and #42767), so
  AutoTokenizer.from_pretrained() raises
  AttributeError: 'PreTrainedConfig' object has no attribute
  'max_position_embeddings' before reading tokenizer.json.
- aiperf wraps the exception as
  TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'.

Fix: add `transformers>=4.56.0,<5` to the pip install in both perf.yaml
files. The lower bound matches dynamo/pyproject.toml so the perf job
runs against the same transformers contract as the rest of dynamo; the
upper bound prevents the silent upgrade past native deepseek_v32
support.

Verified end-to-end:
- transformers 4.55.0 base + `pip install aiperf==0.6.0` -> 5.7.0 -> fail
- transformers 4.55.0 base + `pip install "aiperf==0.6.0" "transformers>=4.56.0,<5"`
  -> 4.57.6 -> aiperf Tokenizer.from_pretrained('nvidia/DeepSeek-V3.2-NVFP4')
  loads LlamaTokenizerFast, vocab=128000

Files:
- recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml
- recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml

Signed-off-by: Dan Gil <dagil@nvidia.com>
dagil-nvidia added a commit to ai-dynamo/dynamo that referenced this pull request Apr 30, 2026
Match dynamo/pyproject.toml's declared transformers floor (>=4.56.0) by
exact-pinning transformers==4.56.0 in the perf-job pip install. Fixes
DYN-2878.

Root cause:

- nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0.
- The perf.yaml install line `pip install aiperf==0.6.0` upgrades
  transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor;
  with default pip resolution this picks the latest release, currently
  5.7.0.
- transformers 5.x has no native support for model_type=deepseek_v32
  (still pending in huggingface/transformers#41251 and #42767), so
  AutoTokenizer.from_pretrained() raises
  AttributeError: 'PreTrainedConfig' object has no attribute
  'max_position_embeddings' before reading tokenizer.json.
- aiperf wraps the exception as
  TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'.

Both nvidia/DeepSeek-V3.2-NVFP4 and deepseek-ai/DeepSeek-V3.2 carry
model_type=deepseek_v32 and fail identically on transformers >= 5.x;
this regressed silently when transformers 5.0 shipped, with no change
in this repo.

Fix: add `transformers==4.56.0` to the pip install in both perf.yaml
files. The version matches dynamo/pyproject.toml's stated floor so the
perf job runs against the same transformers contract as the rest of
dynamo, and the exact pin is deterministic across job re-runs.

Files:
- recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml
- recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml

Signed-off-by: Dan Gil <dagil@nvidia.com>
dagil-nvidia added a commit to ai-dynamo/dynamo that referenced this pull request Apr 30, 2026
Exact-pin transformers to the version verified to load the
model_type=deepseek_v32 tokenizer (per @nealvaidya's review). Fixes
DYN-2878.

Root cause:

- nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0.
- The perf.yaml install line `pip install aiperf==0.6.0` upgrades
  transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor;
  with default pip resolution this picks the latest release, currently
  5.7.0.
- transformers 5.x has no native support for model_type=deepseek_v32
  (still pending in huggingface/transformers#41251 and #42767), so
  AutoTokenizer.from_pretrained() raises
  AttributeError: 'PreTrainedConfig' object has no attribute
  'max_position_embeddings' before reading tokenizer.json.
- aiperf wraps the exception as
  TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'.

Both nvidia/DeepSeek-V3.2-NVFP4 and deepseek-ai/DeepSeek-V3.2 carry
model_type=deepseek_v32 and fail identically on transformers >= 5.x;
this regressed silently when transformers 5.0 shipped, with no change
in this repo.

Fix: pin `transformers==4.57.6` in the pip install in both perf.yaml
files. 4.57.6 is the latest 4.x release and is verified to load the
deepseek_v32 tokenizer end-to-end via aiperf's Tokenizer wrapper.

Files:
- recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml
- recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml

Signed-off-by: Dan Gil <dagil@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.