Skip to content

Add base_model_tp_plan to OlmoeConfig#44668

Merged
ArthurZucker merged 4 commits intomainfrom
olmoe-tp-plan
Mar 26, 2026
Merged

Add base_model_tp_plan to OlmoeConfig#44668
ArthurZucker merged 4 commits intomainfrom
olmoe-tp-plan

Conversation

@dacorvo
Copy link
Copy Markdown
Contributor

@dacorvo dacorvo commented Mar 13, 2026

Fixes #44677

Summary

  • Add base_model_tp_plan to OlmoeConfig, enabling from_pretrained(tp_plan="auto") for OLMoE models
  • Add TensorParallelTesterMixin to OLMoE tests for TP validation coverage
  • Uses "colwise" for q_norm and k_norm because OLMoE applies these norms after the q/k projections — norm weight dimensions must match the sharded projection output

Design note

Qwen3-MoE uses "replicated_with_grad_allreduce" for its q/k norms, but that only works because Qwen3 applies norms before the projections (on the full hidden state). OLMoE's architecture applies norms after projections, so the norm weights must be sharded the same way as the projection output — hence "colwise".

Test plan

  • python -m pytest tests/models/olmoe/test_modeling_olmoe.py::OlmoeModelTest::test_tp_plan_matches_params -xvs — passes

AI disclosure

This PR was developed with AI assistance (Claude). All changes reviewed and validated by a human contributor.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 13, 2026 14:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a default tensor-parallel (TP) sharding plan to OlmoeConfig so OLMoE models can be loaded with from_pretrained(tp_plan="auto"), and wires OLMoE’s modeling tests into the shared TP test mixin.

Changes:

  • Define base_model_tp_plan on OlmoeConfig.
  • Add TensorParallelTesterMixin to OLMoE’s model test class.
  • Document why q/k norms need special handling for TP.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/transformers/models/olmoe/configuration_olmoe.py Introduces a default TP plan mapping for OLMoE modules.
tests/models/olmoe/test_modeling_olmoe.py Adds TP test mixin to validate TP behavior for OLMoE.

Comment thread src/transformers/models/olmoe/configuration_olmoe.py Outdated
Comment thread tests/models/olmoe/test_modeling_olmoe.py
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez
Copy link
Copy Markdown
Member

Hey @dacorvo! YOu'll need a small rebase as we recently migrated configs to dataclasses! Apart from that, did you test the tp plan on real checkpoints to ensure correctness by any chance?

Rebased onto main after configs were migrated to dataclasses.
Adds base_model_tp_plan as a class attribute and TensorParallelTesterMixin
to the OLMoE test suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dacorvo
Copy link
Copy Markdown
Contributor Author

dacorvo commented Mar 18, 2026

@Cyrilvallez I rebased the branch.
I tested two models with this TP Plan against device_map='auto', with three different prompts.

Model TP=2 TP=4 TP=8
OLMoE-1B-7B-0924 MATCH MATCH MATCH
OLMoE-1B-7B-0125 MATCH MATCH MATCH

As a sanity, I also tested the wrong initial plan I submitted, and verified it failed for every prompt 😅 (divergence after a few tokens).

@dacorvo dacorvo requested a review from Cyrilvallez March 18, 2026 17:01
@Cyrilvallez
Copy link
Copy Markdown
Member

@bot /style

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 18, 2026

Style fix bot fixed some files and pushed the changes.

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why I remembered from olmo haha, the norm on the full last dim prevents nice TP 🥲
LGTM now!

Comment thread src/transformers/models/olmoe/configuration_olmoe.py
Comment on lines +175 to +176
if is_torch_available():
OlmoeModelTester.causal_lm_class = OlmoeForCausalLM
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, did not see this but let's not do that 😅 Let's put it directly in the class above!

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmoe

@ArthurZucker ArthurZucker added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit da37a4d Mar 26, 2026
23 checks passed
@ArthurZucker ArthurZucker deleted the olmoe-tp-plan branch March 26, 2026 13:58
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Mar 27, 2026
* Rebase: Add base_model_tp_plan to OlmoeConfig (dataclass style)

Rebased onto main after configs were migrated to dataclasses.
Adds base_model_tp_plan as a class attribute and TensorParallelTesterMixin
to the OLMoE test suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply repo consistency fixes

* review: update src/transformers/models/olmoe/configuration_olmoe.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* review: use correct pattern for OlmoeModelTester class

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Mar 30, 2026
* Rebase: Add base_model_tp_plan to OlmoeConfig (dataclass style)

Rebased onto main after configs were migrated to dataclasses.
Adds base_model_tp_plan as a class attribute and TensorParallelTesterMixin
to the OLMoE test suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply repo consistency fixes

* review: update src/transformers/models/olmoe/configuration_olmoe.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* review: use correct pattern for OlmoeModelTester class

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add base_model_tp_plan to OlmoeConfig

5 participants