Add Vocos model#39403
Conversation
cfaf1d4 to
4d9b6ac
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
Nice! My main comment is to remove the hidden states post processing!
4d9b6ac to
30a17e7
Compare
|
Thanks for reviewing! the failing tests seem unrelated to my changes, but I realized the latest datasets 4.0.0 loads different audio samples than earlier versions which was causing integration tests to fail in CI. |
d909e64 to
e81574a
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
Sorry for my late review!
|
If you can merge main adress the small comment and we can merge! |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=5a8643 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=bd4727 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=b7bac4 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=d55b0f |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=c504d3 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=4cc616 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=39af5a |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=39403&sha=3e8df7 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, vocos, vocos_encodec |
|
Hey @eustlb @ebezzam I pushed few changes and the PR is in a mergeable shape again, would appreciate your review when you have a moment. The CI failures look unrelated except for the date updating in docs,
|
What does this PR do?
This PR aims at integrating
Vocosmodel totransformers.Vocos is a neural vocoder designed for high quality audio synthesis in TTS pipelines and related tasks, outpeforms
HifiGanand it is significantly faster. It has 2 main variants :VocosModelcan be used as a standalone vocoder in audio generation pipeline, the goal is to use it as a drop in vocoder in YuE model. It can also be used together withVocosFeatureExtractorto synthesis audio from mel-spectrogram features.VocosWithEncodecModel: integrates the EnCodec neural audio codec model into Vocos for end-to-end audio compression and reconstruction.This is a continuation of integrating model components for the new YuE model (mention in #36784).
Who can review?
Anyone in the community is free to review the PR once the tests have passed.
@ArthurZucker @eustlb @ylacombe