Skip to content

[Trainer] use output.loss when using liger-kernel#42444

Merged
SunMarc merged 6 commits intomainfrom
issue-42414
Nov 28, 2025
Merged

[Trainer] use output.loss when using liger-kernel#42444
SunMarc merged 6 commits intomainfrom
issue-42414

Conversation

@kashif
Copy link
Copy Markdown
Contributor

@kashif kashif commented Nov 27, 2025

What does this PR do?

Handle loss computation for models using Liger-kernel.

Fixes #42414

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Handle loss computation for models using Liger-kernel.
fixes #42414
@kashif kashif requested a review from Rocketknight1 November 27, 2025 08:52
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few ideas to fix this !

Comment thread src/transformers/trainer.py Outdated
@kashif
Copy link
Copy Markdown
Contributor Author

kashif commented Nov 27, 2025

@SunMarc i'll update the accelerate docs based on this change when you approve

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner thanks !

@SunMarc SunMarc merged commit 6db4332 into main Nov 28, 2025
25 checks passed
@SunMarc SunMarc deleted the issue-42414 branch November 28, 2025 11:00
sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025
* use output.loss when using liger

Handle loss computation for models using Liger-kernel.
fixes huggingface#42414

* Clarify Liger-kernel loss computation in comments

* Both standard transformers and Liger models handle shift_labels correctly via **kwargs

* removed unused shift_labels reference in loss computation

* Remove unused model unwrapping
@zhangwj618
Copy link
Copy Markdown

With the latest code and without liger kernel, I ran into an error caused by outputs.loss being None, as models (e.g. Qwen3ForCausalLM) won't calculate loss if labels is None. @kashif @SunMarc

@SunMarc
Copy link
Copy Markdown
Member

SunMarc commented Dec 10, 2025

With the latest code and without liger kernel, I ran into an error caused by outputs.loss being None, as models (e.g. Qwen3ForCausalLM) won't calculate loss if labels is None. @kashif @SunMarc

How come there is no labels ?

@zhangwj618
Copy link
Copy Markdown

@SunMarc UlyssesSPDataLoaderAdapter took labels out from inputs while inserting shift_labels

@SunMarc
Copy link
Copy Markdown
Member

SunMarc commented Dec 11, 2025

Indeed @zhangwj618 ... thanks for spotting this ! I will open a PR for a quick fix

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* use output.loss when using liger

Handle loss computation for models using Liger-kernel.
fixes huggingface#42414

* Clarify Liger-kernel loss computation in comments

* Both standard transformers and Liger models handle shift_labels correctly via **kwargs

* removed unused shift_labels reference in loss computation

* Remove unused model unwrapping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use_liger_kernel is not compatible with sequence parallel

4 participants