🐛 Describe the bug
@FrankLeeeee @ver217
Hi, in line 36 of _pipeline_parallel_gradient_handler.py:
|
if param.requires_grad and param.grad is not None and group is not None: |
the condition "param.grad is not None" will not work properly with ZERO-3, because after ZERO-3 synchronized grads, all parameters's grads were set to "colo_attr",grads are None and buckets is empty here!
This line also has the problem:
|
grads = [param.grad.data for param in bucket] |
Environment
colossalai latest version
🐛 Describe the bug
@FrankLeeeee @ver217
Hi, in line 36 of _pipeline_parallel_gradient_handler.py:
ColossalAI/colossalai/engine/gradient_handler/_pipeline_parallel_gradient_handler.py
Line 36 in 1aad903
the condition "param.grad is not None" will not work properly with ZERO-3, because after ZERO-3 synchronized grads, all parameters's grads were set to "colo_attr",grads are None and buckets is empty here!
This line also has the problem:
ColossalAI/colossalai/engine/gradient_handler/_pipeline_parallel_gradient_handler.py
Line 43 in 1aad903
Environment
colossalai latest version