Bug Description
Qwen2_5_VLProcessor.apply_chat_template raises ValueError: setting an array element with a sequence when processing a batch of ≥2 conversations that include images, under the default padding=False setting.
Root cause: mm_token_type_ids was built by calling np.array(text_inputs["input_ids"]) on a ragged list (variable-length sequences when padding=False). NumPy ≥ 1.24 rejects inhomogeneous shapes for this operation.
This is distinct from #44521, which concerns assistant_masks being all zeros for multimodal inputs.
Reproduction
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
batch_messages = [
[{"role": "user", "content": [{"type": "image", "image": "img1.jpg"}, {"type": "text", "text": "Describe."}]}],
[{"role": "user", "content": [{"type": "image", "image": "img2.jpg"}, {"type": "text", "text": "What is this? Give a detailed answer."}]}],
]
processor.apply_chat_template(batch_messages, padding=False, tokenize=True, return_dict=True)
# raises ValueError: setting an array element with a sequence
Expected Behavior
The processor should handle batched inputs without crashing when padding=False.
Fix
A fix is implemented in PR #44535.
Bug Description
Qwen2_5_VLProcessor.apply_chat_templateraisesValueError: setting an array element with a sequencewhen processing a batch of ≥2 conversations that include images, under the defaultpadding=Falsesetting.Root cause:
mm_token_type_idswas built by callingnp.array(text_inputs["input_ids"])on a ragged list (variable-length sequences whenpadding=False). NumPy ≥ 1.24 rejects inhomogeneous shapes for this operation.This is distinct from #44521, which concerns
assistant_masksbeing all zeros for multimodal inputs.Reproduction
Expected Behavior
The processor should handle batched inputs without crashing when
padding=False.Fix
A fix is implemented in PR #44535.