Skip to content

[OSS] Partitioning can fail if very imbalanced parameters #261

@blefaudeux

Description

@blefaudeux

🐛 Bug

As spotted in huggingface/transformers#9156 (comment), some models can expose an interesting structure, with one very big tensor (which is not optimized), bigger than the sum of the other tensors. In that case, one rank will get no grads, and a couple of dependent functions don't like that

Command

huggingface/transformers#9156 (comment)
seq2seq basic model run on 2 ranks: one rank gets 1 static tensor, the other one gets all the grads. Gradient clipping breaks and computations are not really optimized

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions