Fix Mixtral aux_loss not computed when output_router_logits=False by mvanhorn · Pull Request #44586 · huggingface/transformers

mvanhorn · 2026-03-11T00:24:07Z

What does this PR do?

Decouples router logits collection from output visibility in Mixtral's ForCausalLM. Previously, output_router_logits=False (the default) prevented aux_loss from being computed, meaning load balancing was silently disabled during training even when router_aux_loss_coef > 0.

The fix:

Always collect router logits internally when router_aux_loss_coef > 0
Always compute aux_loss when router logits are available
Only include router_logits in the model output when the user explicitly sets output_router_logits=True

This affects all MoE models inheriting from Mixtral via modular conversion: ernie4_5_moe, flex_olmo, gpt_oss, jamba, minimax, minimax_m2, olmoe, phimoe, qwen2_moe, qwen3_5_moe, qwen3_next.

Fixes #44242

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Load balancing loss not added when output_router_logits=False #44242
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@SunMarc @ArthurZucker @Cyrilvallez (MoE models, training)

This contribution was developed with AI assistance (Claude Code).

Decouple router logits collection from output visibility. When router_aux_loss_coef > 0, always collect router logits internally to compute aux_loss during training, regardless of the output_router_logits setting. Only include router_logits in the model output when output_router_logits=True. This fix propagates to all MoE models inheriting from Mixtral via modular conversion (ernie4_5_moe, flex_olmo, gpt_oss, jamba, minimax, olmoe, phimoe, qwen2_moe, qwen3_5_moe, qwen3_next). Fixes huggingface#44242 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-11T00:25:08Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_moe, flex_olmo, gpt_oss, jamba, minimax, minimax_m2, mixtral, olmoe, phimoe, qwen2_moe, qwen3_5_moe, qwen3_next

Rocketknight1 · 2026-03-11T13:26:20Z

cc @vasqu as well - I see so many issues and PRs about this and it would be great to finally resolve it

vasqu · 2026-03-11T13:37:04Z

It's a docs issue atp, e.g. see #44242 (comment), this seems like code agent slop 👀

#44264 would be a serious solution but it's a draft for a reason

Rocketknight1 · 2026-03-11T13:54:53Z

Sorry for the ping in that case! @mvanhorn be careful opening a lot of code agent PRs - although some of them do turn out to be helpful, the chance of slop causing confusion and wasting time is pretty high.

mvanhorn · 2026-03-11T14:31:20Z

Fair point - I got a little excited about landing my first PR and jumped to a code change without reading the issue thread carefully enough. The discussion already converged on this being a docs clarification, and @vasqu has a broader approach in #44264.

@Rocketknight1 noted on the volume - I'll be more selective going forward and make sure I understand the maintainer consensus before submitting. Just trying to add value to the team here.

mvanhorn mentioned this pull request Mar 11, 2026

Load balancing loss not added when output_router_logits=False #44242

Closed

4 tasks

Rocketknight1 closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mixtral aux_loss not computed when output_router_logits=False#44586

Fix Mixtral aux_loss not computed when output_router_logits=False#44586
mvanhorn wants to merge 1 commit intohuggingface:mainfrom
mvanhorn:osc/44242-fix-mixtral-aux-loss

mvanhorn commented Mar 11, 2026

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

Rocketknight1 commented Mar 11, 2026

Uh oh!

vasqu commented Mar 11, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Mar 11, 2026

Uh oh!

mvanhorn commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mvanhorn commented Mar 11, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

Rocketknight1 commented Mar 11, 2026

Uh oh!

vasqu commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Mar 11, 2026

Uh oh!

mvanhorn commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vasqu commented Mar 11, 2026 •

edited

Loading