[Moe] Enable aux loss automatically when in training + coef is not 0#44264
[Moe] Enable aux loss automatically when in training + coef is not 0#44264vasqu wants to merge 6 commits intohuggingface:mainfrom
Moe] Enable aux loss automatically when in training + coef is not 0#44264Conversation
| original_output_router_logits = None | ||
| if ( | ||
| self.training | ||
| and (router_aux_loss_coef := getattr(self.config, "router_aux_loss_coef", None)) | ||
| and router_aux_loss_coef != 0 | ||
| ): | ||
| original_output_router_logits = self.config.output_router_logits | ||
| self.config.output_router_logits = True |
There was a problem hiding this comment.
Note: Conditioned on training because otherwise we would always output the logits
There was a problem hiding this comment.
it makes sense tho I don't remember what's the argument for people to not pass output_router_logits? its not intuitive?
There was a problem hiding this comment.
Honestly not sure, it's based on the issue from #44242 and seemingly @Rocketknight1 seeing more of these; so seems convenient enough. Will check with the user where he found it in the docs tho 👀
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Moe] Enable aux loss automatically when in training when coef is not 0Moe] Enable aux loss automatically when in training + coef is not 0
|
[For maintainers] Suggested jobs to run (before merge) run-slow: dbrx, doge, ernie4_5_moe, ernie4_5_vl_moe, flex_olmo, glm4v_moe, gpt_oss, granitemoe, granitemoehybrid, granitemoeshared, jamba, minimax, minimax_m2, mixtral, nllb_moe, olmoe |
ArthurZucker
left a comment
There was a problem hiding this comment.
lgtm just let's document why we are enabling this
| original_output_router_logits = None | ||
| if ( | ||
| self.training | ||
| and (router_aux_loss_coef := getattr(self.config, "router_aux_loss_coef", None)) | ||
| and router_aux_loss_coef != 0 | ||
| ): | ||
| original_output_router_logits = self.config.output_router_logits | ||
| self.config.output_router_logits = True |
There was a problem hiding this comment.
it makes sense tho I don't remember what's the argument for people to not pass output_router_logits? its not intuitive?
|
removing |
|
It is hidden behind the decorator, not removed. Similar to |
As per title, WIP --> needs a test