Skip to content

[model_utils] very slow model instantiation #9205

@stas00

Description

@stas00

For some reason I'm noticing a very slow model instantiation time.

For example to load shleifer/distill-mbart-en-ro-12-4 it takes

  • 21 secs to instantiate the model
  • 0.5sec to torch.load its weights.

If I'm not changing how the model is created and want to quickly fast forward to the area of debug how could these slow parts be cached and not rebuilt anew again and again?

But also it looks like we are doing a completely wasteful operation of init_weights, which immediately get overwritten with pretrained model weights (#9205 (comment)) (for the use case of pre-trained model).

(I initially made a mistake and thought that it was torch.load that had an issue, but it's cls(config, *model_args, **model_kwargs)) - thank you, @sgugger - so this post has been edited to reflect reality. So if you're joining later you can skip the comments up to #9205 (comment) and continue from there)

@patrickvonplaten, @sgugger, @LysandreJik

Metadata

Metadata

Assignees

Labels

PerformanceWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions