fix: Passing empty list to images/videos for some multi-modal models#40569
fix: Passing empty list to images/videos for some multi-modal models#40569HollowMan6 wants to merge 1 commit intohuggingface:mainfrom HollowMan6:noimage
images/videos for some multi-modal models#40569Conversation
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks for the PR @doubao2021-ai !
We had another one in progress here (#36682) which is more complete and fixes all models. I'd prefer that one to be merged
|
Hi @zucchini-nlp, thanks for your reply! I think #36682 doesn't consider the situation when there's no image at all and batch size is 1, i.e., |
|
@HollowMan6 if there are no images at all, it is recommended to simply pass |
|
Yeah, that's what I had proposed in verl-project/verl#3281, but maybe it can be good as well to enforce some checks at the transformers library side. |
For example, this PR fixes the following error when we pass an empty list
of `images` to Qwen2.5-VL:
```log
File "torchdata/stateful_dataloader/worker.py", line 242, in _worker_loop
data = fetcher.fetch(index) # type: ignore[union-attr]
File "torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "verl/utils/dataset/rl_dataset.py", line 248, in __getitem__
model_inputs = self.processor(text=[raw_prompt], images=images, videos=videos, return_tensors="pt")
File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 150, in __call__
image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"])
File "transformers/image_processing_utils_fast.py", line 637, in __call__
return self.preprocess(images, *args, **kwargs)
File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 151, in preprocess
return super().preprocess(images, videos, **kwargs)
File "transformers/image_processing_utils_fast.py", line 662, in preprocess
return self._preprocess_image_like_inputs(
File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 173, in _preprocess_image_like_inputs
batch_feature = self._preprocess(images, **kwargs)
File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 211, in _preprocess
grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
File "transformers/image_transforms.py", line 917, in group_images_by_shape
device = images[0][0].device if is_nested else images[0].device
IndexError: list index out of range
```
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v, llama4, qwen2_vl, qwen3_vl |
|
Hey @HollowMan6 , the feature was added in transformers already in one of the old PRs. I think you can close it now. Passing empty lists for images should work |
|
Oh okay, good to know! Thanks! |
What does this PR do?
Alternative root fix for verl-project/verl#3281
For example, this PR fixes the following error when we pass an empty list of
imagesto Qwen2.5-VL:Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker, @amyeroberts, @qubvel