Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
Awesome!! Can't wait to land this on main
Just few nits but nothing major
There was a problem hiding this comment.
Imo, this is so small that we can also move this under moe directly - it's essentially the one forward wrapper :D
There was a problem hiding this comment.
would be nice to also not even have this here but on kernels!
There was a problem hiding this comment.
yes it's simple now but my guess is that it might get more complicated (fp8/fp4 support for example).
i made it into a separate file to avoid what we have in finegrained_fp8 with deepgemm for example 🤔 my understanding is that we want to move deepgemm integration to a separate file as well
would be nice to also not even have this here but on kernels!
it's very transformers and experts impl specific (which changes from time to time), for example if we were to move it in kernels, we will have to add many more inputs in the function (has_gate, has_bias, is_transposed, is_concatenated..) which will ultimately just be moe_general_routing with the weight axis permutation baked into it.
There was a problem hiding this comment.
Gotcha, fair assesment, I only thought about the current version which is shortsighted
kernels should really only act as thin wrapper and we handle the specifics here imo - otherwise kernels maintainer will always have to do additional work from version change to version change (if things break)
| b1 = self.gate_up_proj_bias if self.has_bias else None | ||
| b2 = self.down_proj_bias if self.has_bias else None | ||
|
|
||
| output, _ = moe_general_routing( |
There was a problem hiding this comment.
Just curious: Is this compatible with torch compile? My hunch says no because they seldomly register the fake types upstream without external contributions lol
There was a problem hiding this comment.
it's actually not 😭 well supposedly works on torch 2.9 or smth but my tests failed because of torch dynamo issues with cuda streams Dao-AILab/sonic-moe#21
There was a problem hiding this comment.
I guess torch 2.12 will fix it if I followed correctly pytorch/pytorch#177610
There was a problem hiding this comment.
should i link it in a comment ?
There was a problem hiding this comment.
would be nice to also not even have this here but on kernels!
|
SonicMoe merged their quack update, guess we can sync with Dao-AILab/sonic-moe#46 (comment) and have upstream be ready in the next few days |
|
they will add concat layout support in this pr Dao-AILab/sonic-moe#47 |
|
Ah sorry missed it, looking forward to it 🫡 |
|
Please let me know if any help is needed! I am familiar with HF source code |
|
Thank you @GarlGuo! Will keep you in the loop |
| if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm"]: | ||
| if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm", "sonicmoe"]: | ||
| message = ( | ||
| f'Specified `experts_implementation="{applicable_experts}"` is not supported. The only possible arguments are ' | ||
| '`experts_implementation="eager"`, `"experts_implementation=grouped_mm"`, `"experts_implementation=batched_mm"` ' | ||
| 'and `"experts_implementation=deepgemm"`.' | ||
| '`experts_implementation="eager"`, `"experts_implementation=grouped_mm"`, `"experts_implementation=batched_mm"`, ' | ||
| '`experts_implementation=deepgemm`, `"experts_implementation=sonicmoe"`' |
vasqu
left a comment
There was a problem hiding this comment.
Can we add a small equivalence test? Similar to how it was done for grouped mm and batched mm?
Other than that, only small nits 🫡
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gpt_oss, openai_privacy_filter |
* added sonic moe * use lazy_load_kernel * style * use concatenated revision * final touches * fix * merge conflict * simpler naming * style * add sonicmoe test * skip fp32 on sonic * add transposed support * fix --------- Co-authored-by: vasqu <antonprogamer@gmail.com>
What does this PR do?
Adds support for the insanely optimized sonic-moe kernels from https://github.com/Dao-AILab/sonic-moe
Needs huggingface/kernels-community#546 and / or Dao-AILab/sonic-moe#46
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.