Skip to content

feat[vLLM × v5]: Add vLLM compatibility for audio models#45326

Merged
ArthurZucker merged 4 commits intohuggingface:mainfrom
harshaljanjani:feat/audio-vllm-attention-backend
Apr 21, 2026
Merged

feat[vLLM × v5]: Add vLLM compatibility for audio models#45326
ArthurZucker merged 4 commits intohuggingface:mainfrom
harshaljanjani:feat/audio-vllm-attention-backend

Conversation

@harshaljanjani
Copy link
Copy Markdown
Contributor

@harshaljanjani harshaljanjani commented Apr 8, 2026

What does this PR do?

→ This PR introduces compat fixes across several audio models to ensure they can be loaded and used by a companion vLLM PR. These changes are deliberate and are blocking this vLLM PR which adds audio backend compatibility to vLLM. Once this PR is merged, the other PR will be marked ready for review!
→ Outlining the design choices of one PR without context from the other didn't make much sense to me, so I wrote a doc that outlines both sets of changes together and explains their deliberate nature, amongst other valuable things!
→ The v5 tracker doesn’t mention the audio backend, but it is certainly a significant gap that needs to be addressed. After this is merged, I'll open an issue tracker for the Transformers audio backend work in vLLM so the efforts can stay organized.

Please refer to the document for the reasoning behind these changes in context with the vLLM PR!
Document: v5 x vLLM Audio Backend Support Document

Related Issues:

→ Current v5 tracker: vllm-project/vllm#38379
vllm-project/vllm#38902

@vasqu @ArthurZucker

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.

@harshaljanjani
Copy link
Copy Markdown
Contributor Author

harshaljanjani commented Apr 9, 2026

The CI failures are unrelated to this PR (the GraniteSpeech failure is likely a pre-existing issue as I documented).

@Rocketknight1
Copy link
Copy Markdown
Member

cc @hmellor as well

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but can you make sure it's tested ! ?

Copy link
Copy Markdown
Contributor

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing to see it's working out of the box! 🔥 How did you test?
Edit: Okay I see it's there

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, granite_speech, musicflamingo, vibevoice_acoustic_tokenizer, vibevoice_asr

@harshaljanjani
Copy link
Copy Markdown
Contributor Author

harshaljanjani commented Apr 14, 2026

LGTM but can you make sure it's tested ! ?

@ArthurZucker So I tested Granite Speech, Audio Flamingo 3, GLM-ASR and VibeVoice-ASR (the changed models) again just to verify the tests pass, and they do, except the one I mentioned in the document previously which isn't related to this change. It's missing a skip like AudioFlamingo3 and Voxtral. Guess it doesn't hurt to add it so I've added it within the scope of this PR itself, and after that the CI is green (not sure about the local issue on my side when fetching the URL since it's actually valid, didn't happen in the last run).

Before the necessary skip (GraniteSpeechForConditionalGenerationModelTest::test_inputs_embeds_matches_input_ids):

image

After the necessary skip (all model tests):

RUN_SLOW=1 pytest -q -o log_cli=false tests/models/granite_speech/test_modeling_granite_speech.py tests/models/audioflamingo3/test_modeling_audioflamingo3.py tests/models/glmasr/test_modeling_glmasr.py tests/models/vibevoice_asr/test_modeling_vibevoice_asr.py

image

Amazing to see it's working out of the box! 🔥 How did you test?
Edit: Okay I see it's there

@eustlb Yupp it's tested :)
Also I think you'd find this gist for how I benchmarked the models, and the documentation of the quirks I encountered valuable as well. I'll mark the vLLM PR ready for review once this is out of the way.

@harshaljanjani
Copy link
Copy Markdown
Contributor Author

Good day @ArthurZucker @eustlb, just checking in to see if there have been any updates so that the vLLM PR can be unblocked :)

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go, I'd be happy if we can see some stuff than can be taken from the vllm PR to here to help standardize! 🤗

@ArthurZucker ArthurZucker enabled auto-merge April 20, 2026 09:12
@ArthurZucker ArthurZucker added this pull request to the merge queue Apr 20, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 20, 2026
@harshaljanjani
Copy link
Copy Markdown
Contributor Author

Got kicked out of the merge queue 😓

@ArthurZucker ArthurZucker merged commit a6dab9f into huggingface:main Apr 21, 2026
28 checks passed
@harshaljanjani harshaljanjani deleted the feat/audio-vllm-attention-backend branch April 21, 2026 07:11
artem-spector pushed a commit to artem-spector/transformers that referenced this pull request Apr 21, 2026
…#45326)

* chore: Add vLLM compat for audio models

* fix: Fix ci/circleci: check_repository_consistency

* nit: Skip incompatible test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants