For some reason I'm noticing a very slow model instantiation time.
For example to load shleifer/distill-mbart-en-ro-12-4 it takes
- 21 secs to instantiate the model
- 0.5sec to
torch.load its weights.
If I'm not changing how the model is created and want to quickly fast forward to the area of debug how could these slow parts be cached and not rebuilt anew again and again?
But also it looks like we are doing a completely wasteful operation of init_weights, which immediately get overwritten with pretrained model weights (#9205 (comment)) (for the use case of pre-trained model).
(I initially made a mistake and thought that it was torch.load that had an issue, but it's cls(config, *model_args, **model_kwargs)) - thank you, @sgugger - so this post has been edited to reflect reality. So if you're joining later you can skip the comments up to #9205 (comment) and continue from there)
@patrickvonplaten, @sgugger, @LysandreJik
For some reason I'm noticing a very slow model instantiation time.
For example to load
shleifer/distill-mbart-en-ro-12-4it takestorch.loadits weights.If I'm not changing how the model is created and want to quickly fast forward to the area of debug how could these slow parts be cached and not rebuilt anew again and again?
But also it looks like we are doing a completely wasteful operation of init_weights, which immediately get overwritten with pretrained model weights (#9205 (comment)) (for the use case of pre-trained model).
(I initially made a mistake and thought that it was
torch.loadthat had an issue, but it'scls(config, *model_args, **model_kwargs)) - thank you, @sgugger - so this post has been edited to reflect reality. So if you're joining later you can skip the comments up to #9205 (comment) and continue from there)@patrickvonplaten, @sgugger, @LysandreJik