Skip to content

Clarify that causal LM labels are shifted internally#44642

Open
joshuaswanson wants to merge 1 commit intohuggingface:mainfrom
joshuaswanson:fix/causal-lm-labels-docstring
Open

Clarify that causal LM labels are shifted internally#44642
joshuaswanson wants to merge 1 commit intohuggingface:mainfrom
joshuaswanson:fix/causal-lm-labels-docstring

Conversation

@joshuaswanson
Copy link
Copy Markdown

The generic labels docstring in ModelArgs says "masked language modeling loss" and doesn't mention that causal LM models shift labels internally. This has tripped up a lot of users who pre-shift their labels and end up training next-next-token prediction by accident.

Updates the shared docstring to say "language modeling loss" (since it's used by causal LM models too, not just masked LM) and adds a note explaining that causal LM models handle the shifting, so users should pass labels = input_ids.

Fixes #32944

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44642&sha=1cb1e5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clarify the label shifting behavior of llama models when labels is given.

1 participant