Skip to content

Qwen2_5_VLProcessor.apply_chat_template crashes on batched input when padding=False #44545

@Anakintano

Description

@Anakintano

Bug Description

Qwen2_5_VLProcessor.apply_chat_template raises ValueError: setting an array element with a sequence when processing a batch of ≥2 conversations that include images, under the default padding=False setting.

Root cause: mm_token_type_ids was built by calling np.array(text_inputs["input_ids"]) on a ragged list (variable-length sequences when padding=False). NumPy ≥ 1.24 rejects inhomogeneous shapes for this operation.

This is distinct from #44521, which concerns assistant_masks being all zeros for multimodal inputs.

Reproduction

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

batch_messages = [
    [{"role": "user", "content": [{"type": "image", "image": "img1.jpg"}, {"type": "text", "text": "Describe."}]}],
    [{"role": "user", "content": [{"type": "image", "image": "img2.jpg"}, {"type": "text", "text": "What is this? Give a detailed answer."}]}],
]

processor.apply_chat_template(batch_messages, padding=False, tokenize=True, return_dict=True)
# raises ValueError: setting an array element with a sequence

Expected Behavior

The processor should handle batched inputs without crashing when padding=False.

Fix

A fix is implemented in PR #44535.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions