Paligemma: fix generation with Gemma2 by zucchini-nlp · Pull Request #36044 · huggingface/transformers

zucchini-nlp · 2025-02-05T10:29:54Z

What does this PR do?

Fixes #36029 and adds tests for the model, imo we need tests with different LM backbone because Gemma-2 is special

This is a quick fix but I think we should make this kind of fix on LM work out-of-the-box, by adding it as kwargs for example. Most LMs accept loss_kwargs thus we can make all multimodal models also accept kwargs that are simply passed further to the LM. WDYT?

HuggingFaceDocBuilderDev · 2025-02-05T11:18:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

we can just use kwargs no?

zucchini-nlp · 2025-02-06T10:34:12Z

I think making it explicit that kwargs will be used by an only an LM was better

Cyrilvallez

Fine with me! Thanks a lot!

ArthurZucker · 2025-02-06T13:26:04Z

let's say that an integration test is most welcome as well!

yeah, was quite low in priority for the patch so I decided to skip it for now :)

ArthurZucker · 2025-02-06T13:36:28Z

For transparency, this commit needs to be modified for the patch, only applying changes for PaliGemnma2

* fix paligemma * nit * use `kwargs` in models that can load any LM * update changes to only affect Paligenma

* fix paligemma * nit * use `kwargs` in models that can load any LM

hiyouga · 2025-03-24T01:34:25Z

Hi @zucchini-nlp @ArthurZucker , I find this PR can lead to an abnormal loss value if gradient accumulation is enabled. Initially reported in hiyouga/LlamaFactory#7443 , the trainer assumes these model accept loss kwargs because of the existence of lm_kwargs (1), but actually they do not (2), resulting in an unexpected loss value. This is also related to the gradient accumulation fix PR #34511 , cc @muellerzr

hiyouga · 2025-03-24T01:37:17Z

This is a quick fix but I think we should make this kind of fix on LM work out-of-the-box, by adding it as kwargs for example. Most LMs accept loss_kwargs thus we can make all multimodal models also accept kwargs that are simply passed further to the LM. WDYT?

Yeah, it is true that most LMs accept such kwargs, but multimodal models compute the loss themselves, unless we reuse the loss computed by the language model part.

hiyouga · 2025-03-24T01:45:01Z

A similar bug report regarding the gemma3 model: hiyouga/LlamaFactory#7416

zucchini-nlp · 2025-03-24T09:00:10Z

Yeah, it is true that most LMs accept such kwargs, but multimodal models compute the loss themselves, unless we reuse the loss computed by the language model part.

Yep, I will look into that and find a better solution

zucchini-nlp · 2025-04-16T20:08:23Z

@hiyouga sorry, just got to look in detail at the issue. Indeed this causes problems when accumulating grads. I think the best solution would be to make all VLMs (touched by this PR first and then others not touched also) compute loss with self.loss_function

Would you like to submit a PR for that?

hiyouga · 2025-04-21T17:50:31Z

@zucchini-nlp Hi, I'm currently deeply occupied with writing my thesis and really don't have much spare time at the moment. Unfortunately, I won't be able to submit the PR. Thanks for understanding!

zucchini-nlp · 2025-04-22T09:21:18Z

@hiyouga yeah, no problem :)

fix paligemma

79b06ed

zucchini-nlp requested review from ArthurZucker and Cyrilvallez February 5, 2025 10:29

zucchini-nlp added 2 commits February 5, 2025 11:43

Merge branch 'main' into paligemma-fix-kwargs

5e8756d

nit

b151af4

zucchini-nlp added the for patch Tag issues / labels that should be included in the next patch label Feb 5, 2025

molbap reviewed Feb 5, 2025

View reviewed changes

Comment thread src/transformers/models/paligemma/modeling_paligemma.py Outdated

use kwargs in models that can load any LM

9b2cfe4

ArthurZucker reviewed Feb 6, 2025

View reviewed changes

Cyrilvallez approved these changes Feb 6, 2025

View reviewed changes

ArthurZucker approved these changes Feb 6, 2025

View reviewed changes

ArthurZucker merged commit 3dd1de3 into huggingface:main Feb 6, 2025

ArthurZucker pushed a commit that referenced this pull request Feb 6, 2025

Paligemma: fix generation with Gemma2 (#36044)

093bebc

* fix paligemma * nit * use `kwargs` in models that can load any LM * update changes to only affect Paligenma

MekkCyber pushed a commit that referenced this pull request Feb 7, 2025

Paligemma: fix generation with Gemma2 (#36044)

987e09e

* fix paligemma * nit * use `kwargs` in models that can load any LM

elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025

Paligemma: fix generation with Gemma2 (huggingface#36044)

2400d76

* fix paligemma * nit * use `kwargs` in models that can load any LM

sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 16, 2025

Paligemma: fix generation with Gemma2 (huggingface#36044)

bf35296

* fix paligemma * nit * use `kwargs` in models that can load any LM

hiyouga mentioned this pull request Mar 24, 2025

Loss inconsistency with different gradient accumulation steps at fixed total batch size hiyouga/LlamaFactory#7443

Closed

1 task

zucchini-nlp mentioned this pull request May 7, 2025

[VLMs] support attention backends #37576

Merged

Conversation

zucchini-nlp commented Feb 5, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 5, 2025

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Feb 6, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Feb 6, 2025

Uh oh!

hiyouga commented Mar 24, 2025

Uh oh!

hiyouga commented Mar 24, 2025

Uh oh!

hiyouga commented Mar 24, 2025

Uh oh!

zucchini-nlp commented Mar 24, 2025

Uh oh!

zucchini-nlp commented Apr 16, 2025

Uh oh!

hiyouga commented Apr 21, 2025

Uh oh!

zucchini-nlp commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants