Add Vocos model#39403

Open

Manalelaidouni wants to merge 127 commits intohuggingface:mainfrom

Manalelaidouni:add-vocos-model

Contributor

Manalelaidouni commented Jul 14, 2025

What does this PR do?

This PR aims at integrating Vocos model to transformers.

Vocos is a neural vocoder designed for high quality audio synthesis in TTS pipelines and related tasks, outpeforms HifiGan and it is significantly faster. It has 2 main variants :

VocosModel can be used as a standalone vocoder in audio generation pipeline, the goal is to use it as a drop in vocoder in YuE model. It can also be used together with VocosFeatureExtractor to synthesis audio from mel-spectrogram features.
VocosWithEncodecModel : integrates the EnCodec neural audio codec model into Vocos for end-to-end audio compression and reconstruction.

This is a continuation of integrating model components for the new YuE model (mention in #36784).

Who can review?

Anyone in the community is free to review the PR once the tests have passed.
@ArthurZucker @eustlb @ylacombe

Manalelaidouni marked this pull request as draft

July 14, 2025 22:50

ArthurZucker added the New model label

Manalelaidouni force-pushed the add-vocos-model branch from cfaf1d4 to 4d9b6ac Compare

July 16, 2025 04:13

ArthurZucker reviewed

View reviewed changes

Collaborator

ArthurZucker left a comment

Nice! My main comment is to remove the hidden states post processing!

src/transformers/models/vocos/modeling_vocos.py Outdated

ArthurZucker requested a review from eustlb

July 16, 2025 13:33

Manalelaidouni added 2 commits

July 19, 2025 12:38


          add working vocos

7b502bb


          update vocos

30a17e7

Manalelaidouni force-pushed the add-vocos-model branch from 4d9b6ac to 30a17e7 Compare

July 19, 2025 11:40

Manalelaidouni added 5 commits

July 19, 2025 13:20


          refactor vocos head

33a715e


          fix docstring

f6026d9

nit

7fa04e0


          Merge branch 'huggingface:main' into add-vocos-model

910c500


          fix output mismatch

09cf23f

Manalelaidouni marked this pull request as ready for review

July 22, 2025 13:07

Manalelaidouni marked this pull request as draft

July 22, 2025 13:26

Manalelaidouni marked this pull request as ready for review

July 22, 2025 15:29

Contributor Author

Manalelaidouni commented Jul 22, 2025 •

edited

Loading

Thanks for reviewing! the failing tests seem unrelated to my changes, but I realized the latest datasets 4.0.0 loads different audio samples than earlier versions which was causing integration tests to fail in CI.

Manalelaidouni added 8 commits

July 24, 2025 01:01


          update checkpoint conversions

d909e64


          add working vocos

6709a97


          update vocos

01913f5


          refactor vocos head

c42d8b9


          fix docstring

b987b13

nit

ea73b18


          fix output mismatch

1fab4f5


          update checkpoint conversions

e81574a

Manalelaidouni force-pushed the add-vocos-model branch from d909e64 to e81574a Compare

July 24, 2025 00:13

ArthurZucker approved these changes

View reviewed changes

Collaborator

ArthurZucker left a comment

Sorry for my late review!

src/transformers/models/vocos/modeling_vocos.py Outdated

src/transformers/models/vocos/modeling_vocos.py Outdated

src/transformers/models/vocos/modeling_vocos.py Outdated

src/transformers/models/vocos/feature_extraction_vocos.py Outdated

Collaborator

ArthurZucker commented Aug 11, 2025

If you can merge main adress the small comment and we can merge!

Manalelaidouni added 2 commits

August 11, 2025 23:13


          Merge branch 'main' into add-vocos-model

324b7c7


          Merge branch 'add-vocos-model' of https://github.com/Manalelaidouni/t…

3f469f7

…ransformers into add-vocos-model


          ruff styling

5a86436

Contributor

github-actions Bot commented Jan 14, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 14, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=5a8643


          update modular 2

bd47278

Contributor

github-actions Bot commented Jan 15, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 15, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=bd4727


          nits

b7bac40

Contributor

github-actions Bot commented Jan 15, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 15, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=b7bac4


          update auto mapping + tests

d55b0f1

Contributor

github-actions Bot commented Jan 20, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 20, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=d55b0f

Manalelaidouni added 2 commits

January 20, 2026 17:53


          add codebook_weights buffer to initialization

0150a11


          allow unused config attribute

c504d3b

Contributor

github-actions Bot commented Jan 20, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 20, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=c504d3


          Merge branch 'main' into add-vocos-model

4cc6166

Contributor

github-actions Bot commented Jan 22, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 22, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=4cc616

Manalelaidouni added 2 commits

January 23, 2026 14:31


          skip training tests

4b0aec4


          update docs

39af5a1

Contributor

github-actions Bot commented Jan 23, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 23, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=39af5a


          add test decorators

3e8df70

Contributor

github-actions Bot commented Jan 23, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor

github-actions Bot commented Jan 23, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=3e8df7


          Merge branch 'main' into add-vocos-model

373a5d2

Contributor

github-actions Bot commented Jan 26, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, vocos, vocos_encodec

Contributor Author

Manalelaidouni commented Jan 26, 2026

Hey @eustlb @ebezzam I pushed few changes and the PR is in a mergeable shape again, would appreciate your review when you have a moment. The CI failures look unrelated except for the date updating in docs,

Removed return_audio_only from feature extractor and simplified the processor flow.
feature extractor and processor return a, attention_mask so batch outputs can be trimmed consistently, models now accept and passes it through both VocosModel and VocosEncodecModel so that there is output trimming for batched audio uses the mask (similar to Parakeet style handling)
Updated tests accordingly, fixtures, gist and the model doc cards.

evalstate mentioned this pull request

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Audio New model