Add xcodec2 model#37868
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
ff to ping me once this is ready! |
| return spectrogram_list | ||
|
|
||
|
|
||
| def spectrogram_torch( |
There was a problem hiding this comment.
Added Torch-equivalent to spectrogram_batched (namely Mel feature extraction with Kaldi-style pre-processing which I didn't see supported in other torch implementations)
So that torch/GPU is supported by the feature extractor. Could also update SeamlesssM4T?
| """ | ||
| Get mel-filter bank features using TorchAudio. Note that TorchAudio requires 16-bit signed integers as inputs | ||
| and hence the waveform should not be normalized before feature extraction. | ||
| Get mel-filter bank features using Numpy method to mimic Kaldi. |
There was a problem hiding this comment.
Update docstring since it wasn't using TorchAudio!
| return y | ||
|
|
||
|
|
||
| class ISTFTHead(nn.Module): |
There was a problem hiding this comment.
Note to self: could be imported from Vocos when merged: #39403
|
run-slow: xcodec2 |
|
This comment contains run-slow, running the specified jobs: models: ['models/xcodec2'] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, dac, seamless_m4t, xcodec, xcodec2 |
| raise ValueError("Padding must be 'center' or 'same'.") | ||
|
|
||
|
|
||
| class Xcodec2ISTFTHead(nn.Module): |
There was a problem hiding this comment.
Note to self: this and Xcodec2ISTFT can be imported from Vocos when merged: #39403
|
run-slow: xcodec2 |
|
This comment contains run-slow, running the specified jobs: models: ['models/xcodec2'] |
|
I see that the Could you please provide an estimated timeline for when this PR is expected to be merged into the main branch of transformers? |
What does this PR do?
This PR adds support for XCodec2 a high fidelity general neural audio codec used in Llasa a Text-to-Speech model, to the Transformers library.
This model is composed of 5 components:
This is still a draft PR. Work done so far:
modeling_xcodec2.pyandmodular_xcodec2.py.Todo
Who can review?
cc: @ArthurZucker
cc: @eustlb @Vaibhavs10 for visibility