fix: handle ragged batch inputs in Qwen2_5_VLProcessor mm_token_type_ids computation#44919
Closed
s-zx wants to merge 1 commit intohuggingface:mainfrom
Closed
fix: handle ragged batch inputs in Qwen2_5_VLProcessor mm_token_type_ids computation#44919s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx wants to merge 1 commit intohuggingface:mainfrom
Conversation
…_ids When processing batched inputs without padding (padding=False), the tokenizer returns lists of different lengths. Calling np.array() on such a ragged list produces an object array, making the subsequent element-wise comparisons (== image_token_id, == video_token_id) fail. Fix by detecting batch inputs (list of lists) and processing each sequence individually, which works for both padded and unpadded batches. Fixes huggingface#44514
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen2_5_vl |
Member
|
Will close in favor of #44563 (comment) which solves the issue for all models at once |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes a crash in
Qwen2_5_VLProcessor.__call__when processing batched inputs without padding (padding=False).Root cause: When the tokenizer returns sequences of different lengths (ragged list),
np.array(text_inputs["input_ids"])creates an object array instead of a 2D integer array. The subsequent element-wise comparisons (== image_token_id,== video_token_id) then fail or produce wrong results.Fix: Detect batch inputs (list-of-lists) and process each sequence individually, which correctly handles both padded (uniform length) and unpadded (ragged) batches.
Before:
After:
Fixes #44514