Clarify that causal LM labels are shifted internally by joshuaswanson · Pull Request #44642 · huggingface/transformers

joshuaswanson · 2026-03-12T23:47:11Z

The generic labels docstring in ModelArgs says "masked language modeling loss" and doesn't mention that causal LM models shift labels internally. This has tripped up a lot of users who pre-shift their labels and end up training next-next-token prediction by accident.

Updates the shared docstring to say "language modeling loss" (since it's used by causal LM models too, not just masked LM) and adds a note explaining that causal LM models handle the shifting, so users should pass labels = input_ids.

Fixes #32944

github-actions · 2026-03-13T00:02:30Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44642&sha=1cb1e5

Clarify that causal LM labels are shifted internally

1cb1e5d

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify that causal LM labels are shifted internally#44642

Clarify that causal LM labels are shifted internally#44642
joshuaswanson wants to merge 1 commit intohuggingface:mainfrom
joshuaswanson:fix/causal-lm-labels-docstring

joshuaswanson commented Mar 12, 2026

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joshuaswanson commented Mar 12, 2026

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant