Skip to content

Modularize ProcessorMixin into smaller components#45493

Open
zucchini-nlp wants to merge 34 commits intohuggingface:mainfrom
zucchini-nlp:replace-image-tokens
Open

Modularize ProcessorMixin into smaller components#45493
zucchini-nlp wants to merge 34 commits intohuggingface:mainfrom
zucchini-nlp:replace-image-tokens

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Apr 17, 2026

What does this PR do?

Modularizes ProcessorMixin to make it easier for new processors to override smaller fn rather than the whole __call__. Splits __call__ into smaller functions such as validation, input preparation, replacing multimodal placeholders, and a few properties for common special tokens

In simple cases like llava or qwen2-vl, the processor only has to override one method -> replace_image_tokens. It takes a single image input and returns the corresponding placeholder text. More complicated models can override and add their own validation and input preparation, e.g. gemma3 requires nested images and has lots of sanity checks

Converted a bunch of processors with different modalities to check that it works. I think for the rest we can either ask community to contrib or do it in a separate PR. This PR is already bloating up

Best way to review: non-model files -> llava -> gemma3 -> audioflamingo -> idefics3 -> gemma4 -> test files (already includes variety of processor types)

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [WIP] Major processing refactor Major processing refactor Apr 23, 2026
@zucchini-nlp zucchini-nlp requested a review from yonigozlan April 23, 2026 13:39
@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: aria, audioflamingo3, aya_vision, blip, chameleon, cohere2_vision, cohere_asr, colmodernvbert, gemma3, gemma4, glm46v, glm4v, glmasr, idefics3, qwen2_5_vl, qwen3_vl, llava, musicflamingo

@zucchini-nlp zucchini-nlp requested a review from vasqu April 23, 2026 13:40
@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aria", "models/audioflamingo3", "models/aya_vision", "models/blip", "models/chameleon", "models/cohere2_vision", "models/cohere_asr", "models/colmodernvbert", "models/gemma3", "models/gemma4", "models/glm46v", "models/glm4v", "models/glmasr", "models/idefics3", "models/llava", "models/musicflamingo", "models/qwen2_5_vl", "models/qwen3_vl"]
quantizations: []

@zucchini-nlp zucchini-nlp requested a review from eustlb April 23, 2026 13:49
@zucchini-nlp
Copy link
Copy Markdown
Member Author

Huh, all audio models failed, needs a fix

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: audioflamingo3, cohere_asr, gemma4, glmasr, musicflamingo

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/audioflamingo3", "models/cohere_asr", "models/gemma4", "models/glmasr", "models/musicflamingo"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 4115361c workflow commit (merge commit)
PR 26c037c7 branch commit (from PR)
main 57f9936a base commit (on main)

Model CI Report

7 new failed tests from this PR 😭

  • cohere_asr:
    tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_batched_mixed_lengths (✅ ⟹ ❌)
    tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_longform_english (✅ ⟹ ❌)
    tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_non_english_with_punctuation (❌ ⟹ ❌)
    tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_shortform_english (✅ ⟹ ❌)
    tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_shortform_english_no_punctuation (✅ ⟹ ❌)

  • gemma4:
    tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)
    tests/models/gemma4/test_processing_gemma4.py::Gemma4ProcessorTest::test_processor_with_multiple_inputs (✅ ⟹ ❌)

@zucchini-nlp zucchini-nlp changed the title Major processing refactor Modularize ProcessorMixin into smaller components Apr 24, 2026
Comment thread src/transformers/processing_utils.py Outdated
@zucchini-nlp
Copy link
Copy Markdown
Member Author

Btw, failing tests were fixed and work for me locally, I think the CI fetched wrong commit as per error logs

Copy link
Copy Markdown
Contributor

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason this didn't push with the above review...
That will be super usefull and simplify further overwrites! Thanks !!

Comment thread src/transformers/processing_utils.py
Comment thread src/transformers/processing_utils.py Outdated
Comment thread src/transformers/feature_extraction_sequence_utils.py Outdated
Comment thread src/transformers/audio_utils.py
Comment thread src/transformers/models/audioflamingo3/processing_audioflamingo3.py Outdated
Comment thread src/transformers/models/audioflamingo3/processing_audioflamingo3.py Outdated
@eustlb eustlb self-requested a review April 27, 2026 14:25
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, audioflamingo3, aya_vision, blip, chameleon, cohere2_vision, cohere_asr, colmodernvbert, florence2, fuyu, gemma3, gemma4, glm46v, glm4v, glmasr, idefics3

Comment on lines -38 to -52
@classmethod
def setUpClass(cls):
# Ensure local assets are used instead of remote URLs to avoid network access in tests
from tests.test_processing_common import MODALITY_INPUT_DATA

repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..", ".."))
local_image = os.path.join(repo_root, "coco_sample.png")
if not os.path.isfile(local_image):
import numpy as np
from PIL import Image

Image.fromarray((np.random.rand(64, 64, 3) * 255).astype("uint8")).save(local_image)

local_tiny_video = os.path.join(repo_root, "tiny_video.mp4")
if not os.path.isfile(local_tiny_video):
Copy link
Copy Markdown
Member Author

@zucchini-nlp zucchini-nlp Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero idea why it was added, it forces a vdeo to be downloaded at root dir. Mixin already uses url_to_video, so no need to override

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants