SonicMoe by IlyasMoutawwakil · Pull Request #45433 · huggingface/transformers

IlyasMoutawwakil · 2026-04-14T14:33:39Z

What does this PR do?

Adds support for the insanely optimized sonic-moe kernels from https://github.com/Dao-AILab/sonic-moe
Needs huggingface/kernels-community#546 and / or Dao-AILab/sonic-moe#46

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2026-04-14T14:47:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu

Awesome!! Can't wait to land this on main

Just few nits but nothing major

vasqu · 2026-04-17T18:09:16Z

Imo, this is so small that we can also move this under moe directly - it's essentially the one forward wrapper :D

would be nice to also not even have this here but on kernels!

yes it's simple now but my guess is that it might get more complicated (fp8/fp4 support for example).
i made it into a separate file to avoid what we have in finegrained_fp8 with deepgemm for example 🤔 my understanding is that we want to move deepgemm integration to a separate file as well

would be nice to also not even have this here but on kernels!

it's very transformers and experts impl specific (which changes from time to time), for example if we were to move it in kernels, we will have to add many more inputs in the function (has_gate, has_bias, is_transposed, is_concatenated..) which will ultimately just be moe_general_routing with the weight axis permutation baked into it.

Gotcha, fair assesment, I only thought about the current version which is shortsighted

kernels should really only act as thin wrapper and we handle the specifics here imo - otherwise kernels maintainer will always have to do additional work from version change to version change (if things break)

vasqu · 2026-04-17T18:19:34Z

+    b1 = self.gate_up_proj_bias if self.has_bias else None
+    b2 = self.down_proj_bias if self.has_bias else None
+
+    output, _ = moe_general_routing(


Just curious: Is this compatible with torch compile? My hunch says no because they seldomly register the fake types upstream without external contributions lol

it's actually not 😭 well supposedly works on torch 2.9 or smth but my tests failed because of torch dynamo issues with cuda streams Dao-AILab/sonic-moe#21

I guess torch 2.12 will fix it if I followed correctly pytorch/pytorch#177610

should i link it in a comment ?

ArthurZucker

NICE! 🚀

ArthurZucker · 2026-04-20T09:07:41Z

would be nice to also not even have this here but on kernels!

vasqu · 2026-04-20T12:20:14Z

SonicMoe merged their quack update, guess we can sync with Dao-AILab/sonic-moe#46 (comment) and have upstream be ready in the next few days

IlyasMoutawwakil · 2026-04-20T13:53:03Z

they will add concat layout support in this pr Dao-AILab/sonic-moe#47

vasqu · 2026-04-20T13:58:57Z

Ah sorry missed it, looking forward to it 🫡

GarlGuo · 2026-04-20T21:20:52Z

Please let me know if any help is needed! I am familiar with HF source code

vasqu · 2026-04-20T21:24:43Z

Thank you @GarlGuo! Will keep you in the loop

IlyasMoutawwakil · 2026-04-23T07:54:05Z

-        if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm"]:
+        if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm", "sonicmoe"]:
            message = (
                f'Specified `experts_implementation="{applicable_experts}"` is not supported. The only possible arguments are '
-                '`experts_implementation="eager"`, `"experts_implementation=grouped_mm"`, `"experts_implementation=batched_mm"` '
-                'and `"experts_implementation=deepgemm"`.'
+                '`experts_implementation="eager"`, `"experts_implementation=grouped_mm"`, `"experts_implementation=batched_mm"`, '
+                '`experts_implementation=deepgemm`, `"experts_implementation=sonicmoe"`'


shouldn't be necessary with #45577

vasqu

Can we add a small equivalence test? Similar to how it was done for grouped mm and batched mm?

Other than that, only small nits 🫡

github-actions · 2026-04-23T12:42:36Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss, openai_privacy_filter

* added sonic moe * use lazy_load_kernel * style * use concatenated revision * final touches * fix * merge conflict * simpler naming * style * add sonicmoe test * skip fp32 on sonic * add transposed support * fix --------- Co-authored-by: vasqu <antonprogamer@gmail.com>

IlyasMoutawwakil and others added 4 commits April 8, 2026 12:55

added sonic moe

c82c2ef

use lazy_load_kernel

ebf3bdc

Merge branch 'main' into sonic-moe

f132b45

style

7dd8cd0

IlyasMoutawwakil and others added 2 commits April 17, 2026 10:15

use concatenated revision

0a7b8c0

Merge branch 'main' into sonic-moe

a8c882a

IlyasMoutawwakil marked this pull request as ready for review April 17, 2026 11:20

IlyasMoutawwakil requested review from ArthurZucker and vasqu April 17, 2026 11:20

vasqu approved these changes Apr 17, 2026

View reviewed changes

ArthurZucker reviewed Apr 20, 2026

View reviewed changes

vasqu mentioned this pull request Apr 20, 2026

Added concatenated layout support Dao-AILab/sonic-moe#46

Closed

final touches

71d7e90

IlyasMoutawwakil requested review from ArthurZucker and vasqu April 23, 2026 07:53

IlyasMoutawwakil commented Apr 23, 2026

View reviewed changes

IlyasMoutawwakil and others added 4 commits April 23, 2026 13:38

Merge branch 'main' into sonic-moe

174a3a4

fix

b078cb1

merge conflict

134cf63

simpler naming

a4c8934

vasqu approved these changes Apr 23, 2026

View reviewed changes

Comment thread src/transformers/models/gpt_oss/modular_gpt_oss.py

Comment thread src/transformers/integrations/sonicmoe.py

IlyasMoutawwakil added 3 commits April 23, 2026 14:00

style

09910a1

add sonicmoe test

76abdc2

skip fp32 on sonic

2007f78

add transposed support

1669103

vasqu reviewed Apr 23, 2026

View reviewed changes

Comment thread src/transformers/models/openai_privacy_filter/modeling_openai_privacy_filter.py Outdated

IlyasMoutawwakil and others added 2 commits April 23, 2026 15:05

Merge branch 'main' into sonic-moe

998a8a3

fix

30b915b

vasqu enabled auto-merge April 23, 2026 13:17

vasqu disabled auto-merge April 23, 2026 13:20

vasqu merged commit 533c4e1 into main Apr 23, 2026
27 of 29 checks passed

vasqu deleted the sonic-moe branch April 23, 2026 13:24

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Conversation

IlyasMoutawwakil commented Apr 14, 2026 • edited by JingyaHuang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented Apr 20, 2026

Uh oh!

IlyasMoutawwakil commented Apr 20, 2026

Uh oh!

vasqu commented Apr 20, 2026

Uh oh!

GarlGuo commented Apr 20, 2026

Uh oh!

vasqu commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

IlyasMoutawwakil commented Apr 14, 2026 •

edited by JingyaHuang

Loading

IlyasMoutawwakil Apr 20, 2026 •

edited

Loading