feat(trainer): log individual losses from loss_dict by Abdeltoto · Pull Request #45558 · huggingface/transformers

Abdeltoto · 2026-04-22T02:56:43Z

What does this PR do?

When a model returns auxiliary losses alongside the main loss (e.g. via a
loss_dict field in its ModelOutput), the Trainer currently only logs the
combined loss. That makes debugging multi-term objectives painful: you can
see total loss going down without knowing which term is actually moving.

This PR teaches the Trainer to also log each scalar term it finds in
outputs.loss_dict, plus any top-level *_loss scalar attribute on the
output, under namespaced keys like loss_dict_<name> and loss_<name>.
Behaviour is opt-in by virtue of the model itself: if no extra losses are
returned, nothing changes. The main loss value and all existing logs are
unchanged.

Implementation notes

New buffer self._aux_losses_accumulator on the Trainer, mirroring how
_total_loss_scalar already works.
In training_step, after compute_loss(..., return_outputs=True), scalar
tensor entries from outputs.loss_dict and from outputs.<name>_loss are
detached, gradient-accumulation-scaled, and accumulated. Same DP mean and
same num_items_in_batch normalization as the main loss, so the numbers
are comparable.
In _maybe_log_save_evaluate, accumulators are gathered across processes
with nested_gather (consistent with the main tr_loss path), averaged
over the logging window, and added to logs. Then the buffers are reset.
Non-tensor / non-scalar values are silently ignored, so models that put
arbitrary metadata in loss_dict won't crash the Trainer.

No public API change. No new dependency. The diff is mostly localized to two
methods in trainer.py.

Tests

tests/trainer/test_trainer.py::TrainerIntegrationTest::test_trainer_logs_auxiliary_losses_from_loss_dict

A small RegressionPreTrainedModelWithLossDict (added to
tests/trainer/trainer_test_utils.py) returns loss, loss_dict={'mse', 'l1'}, and a top-level extra_loss. The test runs a tiny Trainer.train()
and asserts the resulting log entries contain the expected
loss_dict_mse, loss_dict_l1, and loss_extra keys with finite,
positive values. Ran locally:

Ran 1 test in 0.245s
OK

Code Agent Policy

I confirm that this is not a pure code agent PR.

I used Cursor as a coding assistant (the commit trailer says
Made-with: Cursor), but I read every diff, ran the tests locally, wrote
the description myself, and own the change. Happy to iterate on review
feedback.

Before submitting

This PR fixes a typo or improves the docs.
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue? See Log multiple losses used along with the combined losses when a model returns a dictionary of losses. #31081.
Did you make sure to update the documentation with your changes? No
user-facing docs change — the new keys appear automatically when a
model already returns loss_dict.
Did you write any new necessary tests?

Who can review?

@SunMarc — Trainer maintainer per the template.

- Accumulate scalar terms from outputs.loss_dict and optional top-level *_loss fields - Apply same DP mean and GA scaling as the main training loss before logging - Clear auxiliary buffers each log step; add integration test with RegressionPreTrainedModelWithLossDict Made-with: Cursor

Made-with: Cursor

Rocketknight1 · 2026-04-22T12:58:49Z

Sorry, we really don't want code agent PRs on old issues like this!

Abdeltoto added 2 commits April 21, 2026 22:54

style: ruff format trainer.py

88e79c9

Made-with: Cursor

Rocketknight1 closed this Apr 22, 2026

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(trainer): log individual losses from loss_dict#45558

feat(trainer): log individual losses from loss_dict#45558
Abdeltoto wants to merge 2 commits intohuggingface:mainfrom
Abdeltoto:feat/trainer-log-loss-dict-31081

Abdeltoto commented Apr 22, 2026

Uh oh!

Rocketknight1 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abdeltoto commented Apr 22, 2026

What does this PR do?

Implementation notes

Tests

Code Agent Policy

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants