Skip to content

[Moe] Enable aux loss automatically when in training + coef is not 0#44264

Draft
vasqu wants to merge 6 commits intohuggingface:mainfrom
vasqu:moe-defaults-loss
Draft

[Moe] Enable aux loss automatically when in training + coef is not 0#44264
vasqu wants to merge 6 commits intohuggingface:mainfrom
vasqu:moe-defaults-loss

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented Feb 24, 2026

As per title, WIP --> needs a test

Comment on lines +910 to +917
original_output_router_logits = None
if (
self.training
and (router_aux_loss_coef := getattr(self.config, "router_aux_loss_coef", None))
and router_aux_loss_coef != 0
):
original_output_router_logits = self.config.output_router_logits
self.config.output_router_logits = True
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Conditioned on training because otherwise we would always output the logits

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense tho I don't remember what's the argument for people to not pass output_router_logits? its not intuitive?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not sure, it's based on the issue from #44242 and seemingly @Rocketknight1 seeing more of these; so seems convenient enough. Will check with the user where he found it in the docs tho 👀

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu vasqu changed the title [Moe] Enable aux loss automatically when in training when coef is not 0 [Moe] Enable aux loss automatically when in training + coef is not 0 Feb 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: dbrx, doge, ernie4_5_moe, ernie4_5_vl_moe, flex_olmo, glm4v_moe, gpt_oss, granitemoe, granitemoehybrid, granitemoeshared, jamba, minimax, minimax_m2, mixtral, nllb_moe, olmoe

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just let's document why we are enabling this

Comment on lines +910 to +917
original_output_router_logits = None
if (
self.training
and (router_aux_loss_coef := getattr(self.config, "router_aux_loss_coef", None))
and router_aux_loss_coef != 0
):
original_output_router_logits = self.config.output_router_logits
self.config.output_router_logits = True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense tho I don't remember what's the argument for people to not pass output_router_logits? its not intuitive?

@winglian
Copy link
Copy Markdown
Collaborator

removing output_router_logits is likely going to break things downstream, most notably Liger-Kernel,

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Feb 25, 2026

It is hidden behind the decorator, not removed. Similar to output_hidden_states

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants