Modularize `ProcessorMixin` into smaller components by zucchini-nlp · Pull Request #45493 · huggingface/transformers

zucchini-nlp · 2026-04-17T13:19:29Z

What does this PR do?

Modularizes ProcessorMixin to make it easier for new processors to override smaller fn rather than the whole __call__. Splits __call__ into smaller functions such as validation, input preparation, replacing multimodal placeholders, and a few properties for common special tokens

In simple cases like llava or qwen2-vl, the processor only has to override one method -> replace_image_tokens. It takes a single image input and returns the corresponding placeholder text. More complicated models can override and add their own validation and input preparation, e.g. gemma3 requires nested images and has lots of sanity checks

Converted a bunch of processors with different modalities to check that it works. I think for the rest we can either ask community to contrib or do it in a separate PR. This PR is already bloating up

Best way to review: non-model files -> llava -> gemma3 -> audioflamingo -> idefics3 -> gemma4 -> test files (already includes variety of processor types)

HuggingFaceDocBuilderDev · 2026-04-17T13:34:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2026-04-23T13:40:52Z

run-slow: aria, audioflamingo3, aya_vision, blip, chameleon, cohere2_vision, cohere_asr, colmodernvbert, gemma3, gemma4, glm46v, glm4v, glmasr, idefics3, qwen2_5_vl, qwen3_vl, llava, musicflamingo

github-actions · 2026-04-23T13:42:17Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aria", "models/audioflamingo3", "models/aya_vision", "models/blip", "models/chameleon", "models/cohere2_vision", "models/cohere_asr", "models/colmodernvbert", "models/gemma3", "models/gemma4", "models/glm46v", "models/glm4v", "models/glmasr", "models/idefics3", "models/llava", "models/musicflamingo", "models/qwen2_5_vl", "models/qwen3_vl"]
quantizations: []

zucchini-nlp · 2026-04-23T15:21:09Z

Huh, all audio models failed, needs a fix

zucchini-nlp · 2026-04-23T15:51:12Z

run-slow: audioflamingo3, cohere_asr, gemma4, glmasr, musicflamingo

github-actions · 2026-04-23T15:52:41Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/audioflamingo3", "models/cohere_asr", "models/gemma4", "models/glmasr", "models/musicflamingo"]
quantizations: []

github-actions · 2026-04-23T16:47:34Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	4115361c	workflow commit (merge commit)
PR	26c037c7	branch commit (from PR)
main	57f9936a	base commit (on `main`)

Model CI Report

❌ 7 new failed tests from this PR 😭

cohere_asr:
tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_batched_mixed_lengths (✅ ⟹ ❌)
tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_longform_english (✅ ⟹ ❌)
tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_non_english_with_punctuation (❌ ⟹ ❌)
tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_shortform_english (✅ ⟹ ❌)
tests/models/cohere_asr/test_modeling_cohere_asr.py::CohereAsrIntegrationTest::test_shortform_english_no_punctuation (✅ ⟹ ❌)
gemma4:
tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)
tests/models/gemma4/test_processing_gemma4.py::Gemma4ProcessorTest::test_processor_with_multiple_inputs (✅ ⟹ ❌)

zucchini-nlp · 2026-04-27T12:59:06Z

Btw, failing tests were fixed and work for me locally, I think the CI fetched wrong commit as per error logs

eustlb

for some reason this didn't push with the above review...
That will be super usefull and simplify further overwrites! Thanks !!

github-actions · 2026-04-28T10:51:17Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, audioflamingo3, aya_vision, blip, chameleon, cohere2_vision, cohere_asr, colmodernvbert, florence2, fuyu, gemma3, gemma4, glm46v, glm4v, glmasr, idefics3

zucchini-nlp · 2026-04-28T10:52:25Z

-    @classmethod
-    def setUpClass(cls):
-        # Ensure local assets are used instead of remote URLs to avoid network access in tests
-        from tests.test_processing_common import MODALITY_INPUT_DATA
-
-        repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..", ".."))
-        local_image = os.path.join(repo_root, "coco_sample.png")
-        if not os.path.isfile(local_image):
-            import numpy as np
-            from PIL import Image
-
-            Image.fromarray((np.random.rand(64, 64, 3) * 255).astype("uint8")).save(local_image)
-
-        local_tiny_video = os.path.join(repo_root, "tiny_video.mp4")
-        if not os.path.isfile(local_tiny_video):


zero idea why it was added, it forces a vdeo to be downloaded at root dir. Mixin already uses url_to_video, so no need to override

zucchini-nlp added 10 commits March 10, 2026 11:11

tmp

ffb8cd9

more

19ca911

Merge remote-tracking branch 'upstream/main' into replace-image-tokens

efd9dbb

.

be4d458

.

147a36f

qwen

482d12a

apply to video with timestamps processing

7ef4447

mllama with no image tokens

5c16ce5

main

ea62529

delete dups

43ba1cd

zucchini-nlp added 16 commits April 17, 2026 15:43

split modality fn

2c35f3d

stricter check and consistent naming

5006e50

fix videos and audio

fb61ab1

bc for non-MLLM processors

e2e7c40

some renaming and reordering

89647f2

two more models

671c88a

check wih audio processor

83fe02a

oops

1923acb

delete more similar code

0c94f76

two more models

fcdb68b

a bit more

8967a6f

more models

b0c7bc8

fix tests

db4d9b8

Merge branch 'main' into replace-image-tokens

2e87a97

fix idefics

c131de5

fix repo

4ef846d

zucchini-nlp changed the title ~~[WIP] Major processing refactor~~ Major processing refactor Apr 23, 2026

now it should pass CI

16c34fe

zucchini-nlp requested a review from yonigozlan April 23, 2026 13:39

zucchini-nlp requested a review from vasqu April 23, 2026 13:40

zucchini-nlp requested a review from eustlb April 23, 2026 13:49

omg, a typo

26c037c

fix tests

ed1b857

zucchini-nlp changed the title ~~Major processing refactor~~ Modularize ProcessorMixin into smaller components Apr 24, 2026

eustlb reviewed Apr 27, 2026

View reviewed changes

Comment thread src/transformers/processing_utils.py Outdated

eustlb reviewed Apr 27, 2026

View reviewed changes

eustlb self-requested a review April 27, 2026 14:25

zucchini-nlp added 5 commits April 28, 2026 11:53

comments from eustlb

0068e97

oops, typo

f35f87a

delete url from images utils

8e2f30c

fix

e9cb5bb

Merge branch 'main' into replace-image-tokens

a867768

zucchini-nlp commented Apr 28, 2026

View reviewed changes

huggingface deleted a comment from github-actions Bot Apr 28, 2026

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Conversation

zucchini-nlp commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 17, 2026

Uh oh!

zucchini-nlp commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

zucchini-nlp commented Apr 23, 2026

Uh oh!

zucchini-nlp commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

Uh oh!

zucchini-nlp commented Apr 27, 2026

Uh oh!

eustlb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

zucchini-nlp Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zucchini-nlp commented Apr 17, 2026 •

edited

Loading

eustlb left a comment •

edited

Loading

zucchini-nlp Apr 28, 2026 •

edited

Loading