Optimize Parakeet feature extraction on CUDA#45134
Optimize Parakeet feature extraction on CUDA#45134milesial wants to merge 3 commits intohuggingface:mainfrom
Conversation
|
Hi @eustlb , thanks for linking your PR! Some rough numbers, processing a 30min audio of 16 kHz:
Several changes in this PR:
I see your PR is a major refactor of audio processors, does it address these points as well?
Other question would be about timelines, I'm guessing your PR can take a while to get merged, while this smaller one can get merged faster in the meantime and unblock us, what do you think? |
Signed-off-by: milesial <milesial@users.noreply.github.com>
b5a601d to
6703a6d
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: parakeet |
What does this PR do?
Add support for CUDA parakeet preprocessor, running STFT and mel spectrogram extraction on the GPU.
This refactor also speeds up the CPU implementation.
Tested on
nvidia/parakeet-ctc-0.6b, B200, 300s audio:Before this PR, CPU: 28ms
After this PR, CPU: 21ms
After this PR, GPU: 1.7ms
No impact on accuracy (VoxPopuli).
Context for this one is to accelerate vLLM for our multimodal nemotron model. Processing audio inputs has a bottleneck on this CPU feature extractor. This PR does a small refactor that gives some good speedup for CPU, and also enables the CUDA backend to be used to accelerate even further.
Some rough numbers, processing a 30min audio of 16 kHz:
Several changes in this PR:
Before submitting
Pull Request section?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.