add expert parallelism for gemma-4-26B-A4B-it by sywangyi · Pull Request #45279 · huggingface/transformers

sywangyi · 2026-04-07T06:40:35Z

What does this PR do?

Fixes # (issue)

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sywangyi · 2026-04-07T06:44:37Z

you could enable ep in example following gptoss way like

from transformers.distributed.configuration_utils import DistributedConfig
distributed_config = DistributedConfig(enable_expert_parallel=True)

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    dtype="auto",
    distributed_config=distributed_config,
)

sywangyi · 2026-04-08T00:38:05Z

@ArthurZucker @@Cyrilvallez please help review it

sywangyi · 2026-04-09T00:31:57Z

@Rocketknight1 please help review it

ArthurZucker

I don't think this works well with expert implementation tho! but yeah its the only way to have EP!

sywangyi · 2026-04-13T00:52:58Z

@ArthurZucker thanks for the review, I rebase the code to main and verify ep=4 works well with memory per each rank drop to 15G.

HuggingFaceDocBuilderDev · 2026-04-14T13:32:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sywangyi · 2026-04-17T01:12:12Z

@ArthurZucker I have updated the PR according to your suggestion, could you help review it?

ArthurZucker

#45473 should fix this no?

sywangyi · 2026-04-21T03:11:29Z

#45473 should fix this no?

yes, it will partly fix the gemma4 moe issue, but I need to rebase the PR after 45473 is merged to include some gemma4 specific fix

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

sywangyi · 2026-04-21T12:09:10Z

@ArthurZucker could you please review the updated PR?

ArthurZucker

much better ty

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

github-actions · 2026-04-22T06:30:42Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma4

ArthurZucker reviewed Apr 10, 2026

View reviewed changes

Comment thread src/transformers/integrations/tensor_parallel.py Outdated

ArthurZucker reviewed Apr 20, 2026

View reviewed changes

add expert parallelism for gemma-4-26B-A4B-it

5fd1b5b

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

sywangyi force-pushed the gemma4-moe branch from 09d38c9 to 5fd1b5b Compare April 21, 2026 12:03

ArthurZucker approved these changes Apr 22, 2026

View reviewed changes

Comment thread src/transformers/integrations/tensor_parallel.py

update

fda65d9

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

ArthurZucker enabled auto-merge April 22, 2026 06:30

ArthurZucker added this pull request to the merge queue Apr 22, 2026

Merged via the queue into huggingface:main with commit e737fb8 Apr 22, 2026
28 checks passed

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Conversation

sywangyi commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

sywangyi commented Apr 7, 2026

Uh oh!

sywangyi commented Apr 8, 2026

Uh oh!

sywangyi commented Apr 9, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sywangyi commented Apr 13, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

sywangyi commented Apr 17, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

sywangyi commented Apr 21, 2026

Uh oh!

sywangyi commented Apr 21, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sywangyi commented Apr 7, 2026 •

edited

Loading