add expert parallelism for gemma-4-26B-A4B-it#45279
add expert parallelism for gemma-4-26B-A4B-it#45279ArthurZucker merged 2 commits intohuggingface:mainfrom
Conversation
|
you could enable ep in example following gptoss way like |
|
@ArthurZucker @@Cyrilvallez please help review it |
|
@Rocketknight1 please help review it |
ArthurZucker
left a comment
There was a problem hiding this comment.
I don't think this works well with expert implementation tho! but yeah its the only way to have EP!
|
@ArthurZucker thanks for the review, I rebase the code to main and verify ep=4 works well with memory per each rank drop to 15G. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@ArthurZucker I have updated the PR according to your suggestion, could you help review it? |
ArthurZucker
left a comment
There was a problem hiding this comment.
#45473 should fix this no?
yes, it will partly fix the gemma4 moe issue, but I need to rebase the PR after 45473 is merged to include some gemma4 specific fix |
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
|
@ArthurZucker could you please review the updated PR? |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma4 |
What does this PR do?
Fixes # (issue)
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.