Fix DAC conversion script#39793
Conversation
| 1. Transformer model does not use weight norm for speed-up. And during model conversion, weight norm was removed on | ||
| CPU (old script: https://github.com/huggingface/transformers/blob/8e077a3e452e8cab94ef62b37d68258bd3dcffed/src/transformers/models/dac/convert_dac_checkpoint.py#L230) | ||
| This leads to slightly different weight (1e-8) and the error accumulates. Removing weight norm on GPU would produce | ||
| equivalent weights (current conversion script). | ||
| 2. Original version uses Snake1D activation with JIT: https://github.com/descriptinc/descript-audio-codec/blob/c7cfc5d2647e26471dc394f95846a0830e7bec34/dac/nn/layers.py#L18 | ||
| Transformer version does not use JIT, so outputs are slightly different. |
There was a problem hiding this comment.
Updated (definite) reason for high tolerances
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
We should not update the conversion if we don't change the hub. Having a legacy path is unideal and makes it confusing for the average user as hub differs from the script here.
There are two option imo:
- Change to new conversion (no extra flags) and update hub weights
- Only leave the description where the differences stem from
I'd prefer option 1 even if it was breaking tbh. Would wait on Eustache here tbh
|
[For maintainers] Suggested jobs to run (before merge) run-slow: dac |
|
thanks @vasqu! 🚨 @eustlb (when you're back), @vasqu and I spoke offline that it would be better to:
Main reason being that several models are depending on DAC (XCodec, Dia, Higgs Boson, maybe more), and it would be better that they don't depend on a model with minor output differences. As model addition/integration will be trickier since we may not be able to isolate if differences are coming from DAC or from implementing the new model. |
What does this PR do
Reproducer to show weight norm difference when doing weight removal on a different device: https://gist.github.com/ebezzam/c83f186dcfeaab8cac040c960eb474cd