Add _loss_is_scaled_for_ga to allow custom trainers to control gradient accumulation loss scaling by abigailtech · Pull Request #43651 · huggingface/transformers

abigailtech · 2026-01-31T17:51:42Z

Added a _loss_is_scaled_for_ga property that custom trainers can override to explicitly control gradient accumulation loss scaling. The default implementation preserves backward compatibility. Custom trainers can now simply override this property to return False instead of manipulating model_accepts_loss_kwargs.

Fixes #43604

Rocketknight1 · 2026-02-03T13:27:58Z

cc @qgallouedec

qgallouedec · 2026-02-03T14:22:04Z

If I understand correctly model_accepts_loss_kwargs checks if the model forward has kwargs, but is actually used to decide if the loss should be scaled or not, am I right? The underlying assumption if that if the model accepts kwargs, then it takes num_items_in_batch as argument, which means that it scales the loss by itself.
TBH, I'm wondering if we should instead aim for a more ambitious refactor here.

abigailtech · 2026-02-05T19:55:31Z

If I understand correctly model_accepts_loss_kwargs checks if the model forward has kwargs, but is actually used to decide if the loss should be scaled or not, am I right? The underlying assumption if that if the model accepts kwargs, then it takes num_items_in_batch as argument, which means that it scales the loss by itself. TBH, I'm wondering if we should instead aim for a more ambitious refactor here.

yees, thats right. I'd be open to a more ambitious refactor, do you have a specific direction in mind?

…ol gradient accumulation loss scaling

abigailtech added 2 commits March 13, 2026 23:28

Add _loss_is_scaled_for_ga property to allow custom trainers to contr…

a69bf2b

…ol gradient accumulation loss scaling

Fix _loss_is_scaled_for_ga logic to match original behavior

cd31c23

abigailtech force-pushed the fix-loss-scaling-condition branch from 41c8952 to cd31c23 Compare March 13, 2026 22:28

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add _loss_is_scaled_for_ga to allow custom trainers to control gradient accumulation loss scaling#43651

Add _loss_is_scaled_for_ga to allow custom trainers to control gradient accumulation loss scaling#43651
abigailtech wants to merge 2 commits intohuggingface:mainfrom
abigailtech:fix-loss-scaling-condition

abigailtech commented Jan 31, 2026

Uh oh!

Rocketknight1 commented Feb 3, 2026

Uh oh!

qgallouedec commented Feb 3, 2026

Uh oh!

abigailtech commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abigailtech commented Jan 31, 2026

Uh oh!

Rocketknight1 commented Feb 3, 2026

Uh oh!

qgallouedec commented Feb 3, 2026

Uh oh!

abigailtech commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants