Add deepseek 3.2 exp#41251
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hello, thanks for your support for deepseek v3.2! I wonder when this PR will be ready? |
|
Working on it! Hoping by next week 🤗 |
|
wow this got old! |
|
The submitted code is currently unusable and does not support the Deepseek-v3.2 official version. Is this PR still being updated? |
https://github.com/yunkchen/transformers/tree/v4.57.3_add_dpskv32 |
There is stillDeepseekV32Attention a bug when using LLMC to quantify the model:[rank0]: Traceback (most recent call last): |
New commit pushed, sorry. |
A bug still occurred when running the LLMC quantization model: [rank0]: Traceback (most recent call last): |
I'm seeing problems in this branch with rope factor/beta_fast/beta_slow values not being floats. Is this an oversight? |
Isn't this implementation still O(L^2) since it just masks full attention to the indexer's topk? |
|
@ArthurZucker Is this ready to merge? I'd really love to experiment with some DeepSeek 3.2 Speciale fine tunes. |
|
@ArthurZucker @yunkchen |
|
@ArthurZucker Happy holidays, checking in again :) Can we get this merged please? |
|
Hey! Thanks, just got back from holidays, we shipped https://github.com/huggingface/transformers/blob/57278c904c5158999d31a0db8bfcd63360c37b48 but now I should be able to get back! sorry for the delay everyone v5 needed a slow down in model addition to support all the new features, especially default FP8 weight support! |
|
Thanks @ArthurZucker, do you have an ETA? Getting this in would be massively helpful to me and the community. Happy to help however I can. |
|
Hi @ArthurZucker, just checking in to see if there are any updates on this? There is a lot of interest for this change, so we’re excited to see it move forward! :) Thanks! |
|
glm <=> deepseek |
|
Looking forward to the merge asap |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, deepseek_v32, glm_moe_dsa |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41251&sha=129ad2 |
Root cause of DYN-2878: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 pins transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf's `transformers>=4.56.0` floor; with default pip resolution this picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Fix: add transformers<5 to the pip install in both perf.yaml files. The pin keeps aiperf's floor satisfied (resolves to 4.57.6 today) and prevents the silent upgrade past native deepseek_v32 support. Verified end-to-end: - transformers 4.55.0 base + `pip install aiperf==0.6.0` -> 5.7.0 -> fail - transformers 4.55.0 base + `pip install "aiperf==0.6.0" "transformers<5"` -> 4.57.6 -> aiperf Tokenizer.from_pretrained('nvidia/DeepSeek-V3.2-NVFP4') loads LlamaTokenizerFast, vocab=128000 Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>
Mirror dynamo's pyproject `transformers>=4.56.0` floor and add an upper bound of <5 to fix DYN-2878. Root cause: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor. Without an upper bound, default pip resolution picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Fix: add `transformers>=4.56.0,<5` to the pip install in both perf.yaml files. The lower bound matches dynamo/pyproject.toml so the perf job runs against the same transformers contract as the rest of dynamo; the upper bound prevents the silent upgrade past native deepseek_v32 support. Verified end-to-end: - transformers 4.55.0 base + `pip install aiperf==0.6.0` -> 5.7.0 -> fail - transformers 4.55.0 base + `pip install "aiperf==0.6.0" "transformers>=4.56.0,<5"` -> 4.57.6 -> aiperf Tokenizer.from_pretrained('nvidia/DeepSeek-V3.2-NVFP4') loads LlamaTokenizerFast, vocab=128000 Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>
Match dynamo/pyproject.toml's declared transformers floor (>=4.56.0) by exact-pinning transformers==4.56.0 in the perf-job pip install. Fixes DYN-2878. Root cause: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor; with default pip resolution this picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Both nvidia/DeepSeek-V3.2-NVFP4 and deepseek-ai/DeepSeek-V3.2 carry model_type=deepseek_v32 and fail identically on transformers >= 5.x; this regressed silently when transformers 5.0 shipped, with no change in this repo. Fix: add `transformers==4.56.0` to the pip install in both perf.yaml files. The version matches dynamo/pyproject.toml's stated floor so the perf job runs against the same transformers contract as the rest of dynamo, and the exact pin is deterministic across job re-runs. Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>
Exact-pin transformers to the version verified to load the model_type=deepseek_v32 tokenizer (per @nealvaidya's review). Fixes DYN-2878. Root cause: - nvcr.io/.../tensorrtllm-runtime:1.1.0-rc4 ships transformers==4.55.0. - The perf.yaml install line `pip install aiperf==0.6.0` upgrades transformers to satisfy aiperf 0.6.0's `transformers>=4.56.0` floor; with default pip resolution this picks the latest release, currently 5.7.0. - transformers 5.x has no native support for model_type=deepseek_v32 (still pending in huggingface/transformers#41251 and #42767), so AutoTokenizer.from_pretrained() raises AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' before reading tokenizer.json. - aiperf wraps the exception as TokenizerError: Failed to load tokenizer 'nvidia/DeepSeek-V3.2-NVFP4'. Both nvidia/DeepSeek-V3.2-NVFP4 and deepseek-ai/DeepSeek-V3.2 carry model_type=deepseek_v32 and fail identically on transformers >= 5.x; this regressed silently when transformers 5.0 shipped, with no change in this repo. Fix: pin `transformers==4.57.6` in the pip install in both perf.yaml files. 4.57.6 is the latest 4.x release and is verified to load the deepseek_v32 tokenizer end-to-end via aiperf's Tokenizer wrapper. Files: - recipes/deepseek-v32-fp4/trtllm/disagg-kv-router/perf.yaml - recipes/deepseek-v32-fp4/trtllm/agg-round-robin/perf.yaml Signed-off-by: Dan Gil <dagil@nvidia.com>
What does this PR do?