Skip to content

add expert parallelism for gemma-4-26B-A4B-it#45279

Merged
ArthurZucker merged 2 commits intohuggingface:mainfrom
sywangyi:gemma4-moe
Apr 22, 2026
Merged

add expert parallelism for gemma-4-26B-A4B-it#45279
ArthurZucker merged 2 commits intohuggingface:mainfrom
sywangyi:gemma4-moe

Conversation

@sywangyi
Copy link
Copy Markdown
Contributor

@sywangyi sywangyi commented Apr 7, 2026

What does this PR do?

Fixes # (issue)

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented Apr 7, 2026

you could enable ep in example following gptoss way like

from transformers.distributed.configuration_utils import DistributedConfig
distributed_config = DistributedConfig(enable_expert_parallel=True)

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    dtype="auto",
    distributed_config=distributed_config,
)

@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented Apr 8, 2026

@ArthurZucker @@Cyrilvallez please help review it

@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented Apr 9, 2026

@Rocketknight1 please help review it

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works well with expert implementation tho! but yeah its the only way to have EP!

Comment thread src/transformers/integrations/tensor_parallel.py Outdated
@sywangyi
Copy link
Copy Markdown
Contributor Author

@ArthurZucker thanks for the review, I rebase the code to main and verify ep=4 works well with memory per each rank drop to 15G.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi
Copy link
Copy Markdown
Contributor Author

@ArthurZucker I have updated the PR according to your suggestion, could you help review it?

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#45473 should fix this no?

@sywangyi
Copy link
Copy Markdown
Contributor Author

#45473 should fix this no?

yes, it will partly fix the gemma4 moe issue, but I need to rebase the PR after 45473 is merged to include some gemma4 specific fix

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@sywangyi
Copy link
Copy Markdown
Contributor Author

@ArthurZucker could you please review the updated PR?

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better ty

Comment thread src/transformers/integrations/tensor_parallel.py
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma4

@ArthurZucker ArthurZucker enabled auto-merge April 22, 2026 06:30
@ArthurZucker ArthurZucker added this pull request to the merge queue Apr 22, 2026
Merged via the queue into huggingface:main with commit e737fb8 Apr 22, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants