Fix GRPO VLM prompt handling for string prompts#5064
Fix GRPO VLM prompt handling for string prompts#5064akshan-main wants to merge 1 commit intohuggingface:mainfrom
Conversation
14b957e to
53a697f
Compare
|
thanks for the fix! |
53a697f to
7015d50
Compare
|
Fixes three VLM GRPO crashes in
Fixes #4746 Related #5041 , this issue asks for complete reliability which I'm not sure of because if exact issue can be raised I can try fixing it but I can guarantee the three issues along with what was raised in the comments of those issues are fixed reliably and tested out too. Tests are written too, please review |
|
after review, it seems like the fix would just make the training silently ignore the images. The issue actually comes from the fact that the user (and the notebook) tries to apply the chat template before themselves, instead of letting the trainer do it. I'll add a more explicit error message. |
it's easier to review when we have 1 PR per item. Currently
I'd be opposed to this change. It's not up to the trainer to handle reward failure. The user can still wrap its reward function like this: import random
def maybe_float_maybe_fail():
if random.random() < 0.1:
raise Exception()
else:
return random.random()
def reward_func_before(completion_ids, **kwargs):
return [maybe_float_maybe_fail() for _ in completion_ids]
def reward_func_after(completion_ids, **kwargs):
rewards = []
for _ in completion_ids:
try:
reward = maybe_float_maybe_fail()
except Exception:
reward = None
rewards.append(reward)
return rewards |
|
for 1, please see #5067 |
Fixes #4746
Fixes #4870
Fixes #4451
Related: #5041
What does this PR do?
When prompts are strings (produced by
processor.apply_chat_template),GRPOTrainerunconditionally passes them toprepare_multimodal_messageswhenever images are present. That function expects a list of{role, content}dicts, so it crashes withTypeError: string indices must be integers.This PR adds an
isinstance(prompt, list)guard soprepare_multimodal_messagesis only called on conversational prompts. String prompts pass through unchanged.Note: This fixes the immediate crash. Other VLM-related issues (bf16/float mismatch, reward function errors) mentioned in #5041 are separate bugs.
This is PR closes those three issues now
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.