Any to any pipeline and auto-mapping by zucchini-nlp · Pull Request #40884 · huggingface/transformers

zucchini-nlp · 2025-09-15T11:27:54Z

What does this PR do?

Adds any-to-any as a pipeline and in auto classes so that we can have a single mapping for all multimodal models. The model mapping is almost same as image-text-to-text, with inclusion of audio-LLM and omni-LLM. I hope I added all audio models, but lmk if anything is missing from recent ones

Fixes #40302 and fixes #37794

merveenoyan

from what I understand in the code what we do is being able to load an any-to-any model and still being able to do what we do with image-text-to-text tasks with it, for me it's a bit confusing but if we write the docs well it should be ok!

docs/source/en/tasks/multimodal_generation.md

HuggingFaceDocBuilderDev · 2025-09-15T16:54:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-09-15T17:14:21Z

@merveenoyan if you have time to look on the docs section, your advice will be appreciated. Do you think there is anything we should add or highlight? I added basic functionality with examples for now

zucchini-nlp · 2025-09-16T12:30:39Z

Oke, I think this one is ready now, as long as CI turns green

jackzhxng · 2025-09-16T16:12:05Z

Thank you! Looking forward to this getting merged 🙏🏻

src/transformers/models/auto/modeling_auto.py

Cyrilvallez

Nice! Left a few comments! Tagging @ArthurZucker as well, as the names we choose for the pipelines and mappings are important here - we will likely get stuck with them for some time so let's make sure we like them/they are descriptives enough!

docs/source/en/tasks/any_to_any.md

src/transformers/pipelines/__init__.py

src/transformers/tokenization_mistral_common.py

ArthurZucker

Overall very nice, naming not sure yet!

docs/source/en/model_doc/auto.md

docs/source/en/tasks/any_to_any.md

src/transformers/pipelines/any_to_any.py

jackzhxng

Solves our use case perfectly, also output_modalities is very useful to have. Thanks @zucchini-nlp 🙏🏻

Leaving to @ArthurZucker and @Cyrilvallez for approval

zucchini-nlp · 2025-11-13T14:05:04Z

Test failures not related!

zucchini-nlp · 2025-11-17T14:52:11Z

Test failures not related, kind ping @ArthurZucker whenever you have time

ArthurZucker

Very very nice, sorry that it took so long to come back to it!
Fan of in/out modalities! Shaping well!

src/transformers/pipelines/base.py

github-actions · 2025-11-26T20:56:52Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, aria, audioflamingo3, auto, autoformer, aya_vision, bark, beit, bit, blip, blip_2, blt, bridgetower, chameleon

* initial commit * fix tests * fix copies, tests and rename pipe * another rename * fix copies again * activate pipeline mixin in some models * audio loading * typo * fix the test * stupid typo in filename * fix copies * docs * forgot * fix pipe tests * fix copies * fix test * lets not pass it explicitly * final fix * rename in test files as well * fix again after reordering... * add qwen2 audio * add qwen3-omni * wait, I didn't push it last time? * it's only torch from now on * how was the model merged with docstring issues? * make style * requires backend depends on input modalities * add repr * fix copies * fox copies, new models were added * and now fix copies

initial commit

25f8152

merveenoyan reviewed Sep 15, 2025

View reviewed changes

docs/source/en/tasks/multimodal_generation.md Outdated Show resolved Hide resolved

docs/source/en/tasks/multimodal_generation.md Outdated Show resolved Hide resolved

zucchini-nlp added 10 commits September 15, 2025 15:38

fix tests

8f71e71

fix copies, tests and rename pipe

beb8517

another rename

06579e5

fix copies again

0388c2e

activate pipeline mixin in some models

f2ef098

audio loading

4c5fbb5

typo

cb18262

fix the test

08873fb

stupid typo in filename

06dddc1

fix copies

338baa0

zucchini-nlp added 2 commits September 15, 2025 19:11

docs

7a5e080

forgot

076105d

zucchini-nlp added 6 commits September 16, 2025 13:21

fix pipe tests

5670450

fix copies

8665854

fix test

aafaae4

lets not pass it explicitly

4252a4c

final fix

f47372a

rename in test files as well

51057b3

zucchini-nlp requested a review from ArthurZucker September 16, 2025 12:31

zucchini-nlp changed the title ~~Any to any pipeline~~ Any to any pipeline and auto-mapping Sep 16, 2025

fix again after reordering...

ee1251d

zucchini-nlp mentioned this pull request Sep 16, 2025

Add new model LFM2-VL #40624

Merged

Merge branch 'main' into auto-multimodal

073b782

jackzhxng reviewed Sep 16, 2025

View reviewed changes

src/transformers/models/auto/modeling_auto.py Show resolved Hide resolved

zucchini-nlp added 2 commits September 26, 2025 10:37

it's only torch from now on

9c2404f

how was the model merged with docstring issues?

067061b

Cyrilvallez reviewed Sep 30, 2025

View reviewed changes

docs/source/en/tasks/any_to_any.md Show resolved Hide resolved

docs/source/en/tasks/any_to_any.md Show resolved Hide resolved

src/transformers/pipelines/__init__.py Show resolved Hide resolved

src/transformers/tokenization_mistral_common.py Show resolved Hide resolved

ArthurZucker reviewed Oct 3, 2025

View reviewed changes

This was referenced Oct 6, 2025

Add standardized multimodal capability metadata #41345

Closed

Add in-out modalities as class attribute per model #41366

Merged

zucchini-nlp added 5 commits October 16, 2025 17:22

merge main

79f9275

make style

51fafd3

Merge remote-tracking branch 'upstream/main' into auto-multimodal

5855a4a

requires backend depends on input modalities

cfd8d1b

add repr

67f8022

zucchini-nlp requested review from ArthurZucker and Cyrilvallez October 16, 2025 17:38

jackzhxng reviewed Nov 3, 2025

View reviewed changes

zucchini-nlp added 3 commits November 6, 2025 11:04

Merge branch 'main' into auto-multimodal

271ebd1

Merge branch 'main' into auto-multimodal

df7e556

fix copies

28fd203

zucchini-nlp added 2 commits November 17, 2025 14:28

merge main

8101f3f

fox copies, new models were added

27c15e4

zucchini-nlp added 2 commits November 24, 2025 10:35

merge main

40c96a3

and now fix copies

e344128

ArthurZucker approved these changes Nov 24, 2025

View reviewed changes

src/transformers/pipelines/base.py Show resolved Hide resolved

Merge branch 'main' into auto-multimodal

d4da21e

zucchini-nlp merged commit 55b1400 into huggingface:main Nov 27, 2025
23 checks passed

zucchini-nlp mentioned this pull request Nov 27, 2025

Add VideoToTextPipeline with smart frame sampling and system prompts #42432

Open

5 tasks

jackzhxng mentioned this pull request Dec 12, 2025

Fix multimodal for Transformers v5 huggingface/optimum-executorch#195

Merged

Conversation

zucchini-nlp commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2025

Uh oh!

zucchini-nlp commented Sep 15, 2025

Uh oh!

zucchini-nlp commented Sep 16, 2025

Uh oh!

jackzhxng commented Sep 16, 2025

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jackzhxng left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Nov 13, 2025

Uh oh!

zucchini-nlp commented Nov 17, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Comments

zucchini-nlp commented Sep 15, 2025 •

edited

Loading