Paligemma: fix generation with Gemma2#36044
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
we can just use kwargs no?
|
I think making it explicit that kwargs will be used by an only an LM was better |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Fine with me! Thanks a lot!
There was a problem hiding this comment.
let's say that an integration test is most welcome as well!
There was a problem hiding this comment.
yeah, was quite low in priority for the patch so I decided to skip it for now :)
|
For transparency, this commit needs to be modified for the patch, only applying changes for PaliGemnma2 |
* fix paligemma * nit * use `kwargs` in models that can load any LM * update changes to only affect Paligenma
* fix paligemma * nit * use `kwargs` in models that can load any LM
* fix paligemma * nit * use `kwargs` in models that can load any LM
* fix paligemma * nit * use `kwargs` in models that can load any LM
|
Hi @zucchini-nlp @ArthurZucker , I find this PR can lead to an abnormal loss value if gradient accumulation is enabled. Initially reported in hiyouga/LlamaFactory#7443 , the trainer assumes these model accept loss kwargs because of the existence of |
Yeah, it is true that most LMs accept such kwargs, but multimodal models compute the loss themselves, unless we reuse the loss computed by the language model part. |
|
A similar bug report regarding the gemma3 model: hiyouga/LlamaFactory#7416 |
Yep, I will look into that and find a better solution |
|
@hiyouga sorry, just got to look in detail at the issue. Indeed this causes problems when accumulating grads. I think the best solution would be to make all VLMs (touched by this PR first and then others not touched also) compute loss with Would you like to submit a PR for that? |
|
@zucchini-nlp Hi, I'm currently deeply occupied with writing my thesis and really don't have much spare time at the moment. Unfortunately, I won't be able to submit the PR. Thanks for understanding! |
|
@hiyouga yeah, no problem :) |
What does this PR do?
Fixes #36029 and adds tests for the model, imo we need tests with different LM backbone because Gemma-2 is special
This is a quick fix but I think we should make this kind of fix on LM work out-of-the-box, by adding it as
kwargsfor example. Most LMs acceptloss_kwargsthus we can make all multimodal models also accept kwargs that are simply passed further to the LM. WDYT?