Modularize ProcessorMixin into smaller components#45493
Modularize ProcessorMixin into smaller components#45493zucchini-nlp wants to merge 34 commits intohuggingface:mainfrom
ProcessorMixin into smaller components#45493Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: aria, audioflamingo3, aya_vision, blip, chameleon, cohere2_vision, cohere_asr, colmodernvbert, gemma3, gemma4, glm46v, glm4v, glmasr, idefics3, qwen2_5_vl, qwen3_vl, llava, musicflamingo |
|
This comment contains models: ["models/aria", "models/audioflamingo3", "models/aya_vision", "models/blip", "models/chameleon", "models/cohere2_vision", "models/cohere_asr", "models/colmodernvbert", "models/gemma3", "models/gemma4", "models/glm46v", "models/glm4v", "models/glmasr", "models/idefics3", "models/llava", "models/musicflamingo", "models/qwen2_5_vl", "models/qwen3_vl"] |
|
Huh, all audio models failed, needs a fix |
|
run-slow: audioflamingo3, cohere_asr, gemma4, glmasr, musicflamingo |
|
This comment contains models: ["models/audioflamingo3", "models/cohere_asr", "models/gemma4", "models/glmasr", "models/musicflamingo"] |
CI ResultsCommit Info
Model CI Report❌ 7 new failed tests from this PR 😭
|
ProcessorMixin into smaller components
|
Btw, failing tests were fixed and work for me locally, I think the CI fetched wrong commit as per error logs |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aria, audioflamingo3, aya_vision, blip, chameleon, cohere2_vision, cohere_asr, colmodernvbert, florence2, fuyu, gemma3, gemma4, glm46v, glm4v, glmasr, idefics3 |
| @classmethod | ||
| def setUpClass(cls): | ||
| # Ensure local assets are used instead of remote URLs to avoid network access in tests | ||
| from tests.test_processing_common import MODALITY_INPUT_DATA | ||
|
|
||
| repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..", "..")) | ||
| local_image = os.path.join(repo_root, "coco_sample.png") | ||
| if not os.path.isfile(local_image): | ||
| import numpy as np | ||
| from PIL import Image | ||
|
|
||
| Image.fromarray((np.random.rand(64, 64, 3) * 255).astype("uint8")).save(local_image) | ||
|
|
||
| local_tiny_video = os.path.join(repo_root, "tiny_video.mp4") | ||
| if not os.path.isfile(local_tiny_video): |
There was a problem hiding this comment.
zero idea why it was added, it forces a vdeo to be downloaded at root dir. Mixin already uses url_to_video, so no need to override
What does this PR do?
Modularizes
ProcessorMixinto make it easier for new processors to override smaller fn rather than the whole__call__. Splits__call__into smaller functions such as validation, input preparation, replacing multimodal placeholders, and a few properties for common special tokensIn simple cases like llava or qwen2-vl, the processor only has to override one method ->
replace_image_tokens. It takes a single image input and returns the corresponding placeholder text. More complicated models can override and add their own validation and input preparation, e.g. gemma3 requires nested images and has lots of sanity checksConverted a bunch of processors with different modalities to check that it works. I think for the rest we can either ask community to contrib or do it in a separate PR. This PR is already bloating up
Best way to review:
non-model files -> llava -> gemma3 -> audioflamingo -> idefics3 -> gemma4 -> test files(already includes variety of processor types)