[OSS] Partitioning can fail if very imbalanced parameters

## 🐛 Bug
As spotted in https://github.com/huggingface/transformers/issues/9156#issuecomment-747285458, some models can expose an interesting structure, with one very big tensor (which is not optimized), bigger than the sum of the other tensors. In that case, one rank will get no grads, and a couple of dependent functions don't like that

## Command
https://github.com/huggingface/transformers/issues/9156#issuecomment-747285458
seq2seq basic model run on 2 ranks: one rank gets 1 static tensor, the other one gets all the grads. Gradient clipping breaks and computations are not really optimized



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSS] Partitioning can fail if very imbalanced parameters #261

🐛 Bug

Command

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[OSS] Partitioning can fail if very imbalanced parameters #261

Description

🐛 Bug

Command

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions