Skip to content

SonicMoe#45433

Merged
vasqu merged 17 commits intomainfrom
sonic-moe
Apr 23, 2026
Merged

SonicMoe#45433
vasqu merged 17 commits intomainfrom
sonic-moe

Conversation

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Apr 14, 2026

What does this PR do?

Adds support for the insanely optimized sonic-moe kernels from https://github.com/Dao-AILab/sonic-moe
Needs huggingface/kernels-community#546 and / or Dao-AILab/sonic-moe#46

bf16_experts_tflops

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review April 17, 2026 11:20
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!! Can't wait to land this on main

Just few nits but nothing major

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, this is so small that we can also move this under moe directly - it's essentially the one forward wrapper :D

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to also not even have this here but on kernels!

Copy link
Copy Markdown
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's simple now but my guess is that it might get more complicated (fp8/fp4 support for example).
i made it into a separate file to avoid what we have in finegrained_fp8 with deepgemm for example 🤔 my understanding is that we want to move deepgemm integration to a separate file as well

would be nice to also not even have this here but on kernels!

it's very transformers and experts impl specific (which changes from time to time), for example if we were to move it in kernels, we will have to add many more inputs in the function (has_gate, has_bias, is_transposed, is_concatenated..) which will ultimately just be moe_general_routing with the weight axis permutation baked into it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, fair assesment, I only thought about the current version which is shortsighted

kernels should really only act as thin wrapper and we handle the specifics here imo - otherwise kernels maintainer will always have to do additional work from version change to version change (if things break)

Comment thread src/transformers/integrations/sonicmoe.py Outdated
Comment thread src/transformers/integrations/sonicmoe.py Outdated
b1 = self.gate_up_proj_bias if self.has_bias else None
b2 = self.down_proj_bias if self.has_bias else None

output, _ = moe_general_routing(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Is this compatible with torch compile? My hunch says no because they seldomly register the fake types upstream without external contributions lol

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's actually not 😭 well supposedly works on torch 2.9 or smth but my tests failed because of torch dynamo issues with cuda streams Dao-AILab/sonic-moe#21

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess torch 2.12 will fix it if I followed correctly pytorch/pytorch#177610

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i link it in a comment ?

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE! 🚀

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to also not even have this here but on kernels!

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 20, 2026

SonicMoe merged their quack update, guess we can sync with Dao-AILab/sonic-moe#46 (comment) and have upstream be ready in the next few days

@IlyasMoutawwakil
Copy link
Copy Markdown
Member Author

they will add concat layout support in this pr Dao-AILab/sonic-moe#47

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 20, 2026

Ah sorry missed it, looking forward to it 🫡

@GarlGuo
Copy link
Copy Markdown

GarlGuo commented Apr 20, 2026

Please let me know if any help is needed! I am familiar with HF source code

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 20, 2026

Thank you @GarlGuo! Will keep you in the loop

Comment thread src/transformers/modeling_utils.py Outdated
Comment on lines +1928 to +1932
if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm"]:
if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm", "sonicmoe"]:
message = (
f'Specified `experts_implementation="{applicable_experts}"` is not supported. The only possible arguments are '
'`experts_implementation="eager"`, `"experts_implementation=grouped_mm"`, `"experts_implementation=batched_mm"` '
'and `"experts_implementation=deepgemm"`.'
'`experts_implementation="eager"`, `"experts_implementation=grouped_mm"`, `"experts_implementation=batched_mm"`, '
'`experts_implementation=deepgemm`, `"experts_implementation=sonicmoe"`'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be necessary with #45577

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small equivalence test? Similar to how it was done for grouped mm and batched mm?

Other than that, only small nits 🫡

Comment thread src/transformers/models/gpt_oss/modular_gpt_oss.py
Comment thread src/transformers/integrations/sonicmoe.py
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss, openai_privacy_filter

Comment thread src/transformers/models/openai_privacy_filter/modeling_openai_privacy_filter.py Outdated
@vasqu vasqu enabled auto-merge April 23, 2026 13:17
@vasqu vasqu disabled auto-merge April 23, 2026 13:20
@vasqu vasqu merged commit 533c4e1 into main Apr 23, 2026
27 of 29 checks passed
@vasqu vasqu deleted the sonic-moe branch April 23, 2026 13:24
tarekziade pushed a commit that referenced this pull request Apr 23, 2026
* added sonic moe

* use lazy_load_kernel

* style

* use concatenated revision

* final touches

* fix

* merge conflict

* simpler naming

* style

* add sonicmoe test

* skip fp32 on sonic

* add transposed support

* fix

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants