Skip to content

[Fix] Deepseek V3 expert bias routing#41647

Merged
ArthurZucker merged 3 commits intohuggingface:mainfrom
fjosw:fix/deepseek_v3_routing
Oct 16, 2025
Merged

[Fix] Deepseek V3 expert bias routing#41647
ArthurZucker merged 3 commits intohuggingface:mainfrom
fjosw:fix/deepseek_v3_routing

Conversation

@fjosw
Copy link
Copy Markdown
Contributor

@fjosw fjosw commented Oct 16, 2025

What does this PR do?

By chance we noticed that #40132 seems to have introduced a bug in the Deepseek V3 routing implementation: The Deepseek-V3 technical report explicitly states

Note that the bias term is only used for routing. The gating value, which will be multiplied with
the FFN output, is still derived from the original affinity score

This was the case in transformers until #40132 which changed the routing code such that the gating values are now derived from the tensor with the added bias term. I wrote a quick fix for the Deepseek-V3 model in this PR, not sure if other models are also affected. Can you please have a look @ArthurZucker and confirm that this is indeed a bug?

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching! Can confirm, we previously gathered on the scores with the index.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just run make fix-copies that will fix dependant models

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v3, glm4_moe, glm4v_moe

@fjosw
Copy link
Copy Markdown
Contributor Author

fjosw commented Oct 16, 2025

Thanks @ArthurZucker, I just pushed the fix-copies changes. Are you aware of any not directly dependent models that use Deepseek style expert bias routing and could have been affected by the refactor from a few weeks ago?

@ArthurZucker
Copy link
Copy Markdown
Collaborator

They are the ones that inherit from deepseek!
Can you just run make style and you should be good to go!

@ArthurZucker ArthurZucker enabled auto-merge (squash) October 16, 2025 13:54
@ArthurZucker ArthurZucker merged commit 8725ce1 into huggingface:main Oct 16, 2025
16 checks passed
@fjosw fjosw deleted the fix/deepseek_v3_routing branch October 16, 2025 14:16
@fjosw fjosw mentioned this pull request Oct 16, 2025
5 tasks
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* [Fix] Deepseek V3 expert bias routing

* [Fix] fix-copies

* [Fix] Run make style
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* [Fix] Deepseek V3 expert bias routing

* [Fix] fix-copies

* [Fix] Run make style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants