[model_utils] very slow model instantiation

For some reason I'm noticing a very slow model instantiation time.

For example to load `shleifer/distill-mbart-en-ro-12-4` it takes 
* 21 secs to instantiate the model
* 0.5sec to `torch.load` its weights.

If I'm not changing how the model is created and want to quickly fast forward to the area of debug how could these slow parts be cached and not rebuilt anew again and again?

But also it looks like we are doing a completely wasteful operation of init_weights, which immediately get overwritten with pretrained model weights (https://github.com/huggingface/transformers/issues/9205#issuecomment-748741195) (for the use case of pre-trained model).

(I initially made a mistake and thought that it was `torch.load` that had an issue, but it's `cls(config, *model_args, **model_kwargs)`) - thank you, @sgugger - so this post has been edited to reflect reality. So if you're joining later you can skip the comments up to https://github.com/huggingface/transformers/issues/9205#issuecomment-748722644 and continue from there)

@patrickvonplaten, @sgugger, @LysandreJik 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model_utils] very slow model instantiation #9205

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[model_utils] very slow model instantiation #9205

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions