Skip to content

[BUG]: Cannot synchronize grads of shared parameters cross pipeline stages when using ZERO-3 #1249

@VenAlone

Description

@VenAlone

🐛 Describe the bug

@FrankLeeeee @ver217
Hi, in line 36 of _pipeline_parallel_gradient_handler.py:

if param.requires_grad and param.grad is not None and group is not None:

the condition "param.grad is not None" will not work properly with ZERO-3, because after ZERO-3 synchronized grads, all parameters's grads were set to "colo_attr",grads are None and buckets is empty here!

This line also has the problem:

Environment

colossalai latest version

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions