feat(trainer): log individual losses from loss_dict#45558
Closed
Abdeltoto wants to merge 2 commits intohuggingface:mainfrom
Closed
feat(trainer): log individual losses from loss_dict#45558Abdeltoto wants to merge 2 commits intohuggingface:mainfrom
Abdeltoto wants to merge 2 commits intohuggingface:mainfrom
Conversation
- Accumulate scalar terms from outputs.loss_dict and optional top-level *_loss fields - Apply same DP mean and GA scaling as the main training loss before logging - Clear auxiliary buffers each log step; add integration test with RegressionPreTrainedModelWithLossDict Made-with: Cursor
Made-with: Cursor
Member
|
Sorry, we really don't want code agent PRs on old issues like this! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #31081.
When a model returns auxiliary losses alongside the main loss (e.g. via a
loss_dictfield in itsModelOutput), the Trainer currently only logs thecombined
loss. That makes debugging multi-term objectives painful: you cansee total loss going down without knowing which term is actually moving.
This PR teaches the Trainer to also log each scalar term it finds in
outputs.loss_dict, plus any top-level*_lossscalar attribute on theoutput, under namespaced keys like
loss_dict_<name>andloss_<name>.Behaviour is opt-in by virtue of the model itself: if no extra losses are
returned, nothing changes. The main
lossvalue and all existing logs areunchanged.
Implementation notes
self._aux_losses_accumulatoron theTrainer, mirroring how_total_loss_scalaralready works.training_step, aftercompute_loss(..., return_outputs=True), scalartensor entries from
outputs.loss_dictand fromoutputs.<name>_lossaredetached, gradient-accumulation-scaled, and accumulated. Same DP mean and
same
num_items_in_batchnormalization as the main loss, so the numbersare comparable.
_maybe_log_save_evaluate, accumulators are gathered across processeswith
nested_gather(consistent with the maintr_losspath), averagedover the logging window, and added to
logs. Then the buffers are reset.arbitrary metadata in
loss_dictwon't crash the Trainer.No public API change. No new dependency. The diff is mostly localized to two
methods in
trainer.py.Tests
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_trainer_logs_auxiliary_losses_from_loss_dictA small
RegressionPreTrainedModelWithLossDict(added totests/trainer/trainer_test_utils.py) returnsloss,loss_dict={'mse', 'l1'}, and a top-levelextra_loss. The test runs a tinyTrainer.train()and asserts the resulting log entries contain the expected
loss_dict_mse,loss_dict_l1, andloss_extrakeys with finite,positive values. Ran locally:
Code Agent Policy
I used Cursor as a coding assistant (the commit trailer says
Made-with: Cursor), but I read every diff, ran the tests locally, wrotethe description myself, and own the change. Happy to iterate on review
feedback.
Before submitting
user-facing docs change — the new keys appear automatically when a
model already returns
loss_dict.Who can review?
@SunMarc — Trainer maintainer per the template.