Skip to content

[FEATURE]: adding grad_norm logging in training process #2684

@pluiez

Description

@pluiez

Describe the feature

I have been using the colossalai framework for my project and I noticed that there is no way to obtain the grad_norm after the backward pass during training. For example, the optimizer ZeroOptimizer calculates the grad_norm using the _calc_global_norm method in the step() method and then clips the grad_norm. If I want to separately obtain the grad_norm, I have to call _calc_global_norm again, which results in unnecessary additional calls. However, in PyTorch, the logic of clip_grad_norm() and optimizer.step() are decoupled, and this issue does not exist.

I would like to request the addition of a logging mechanism for the grad_norm during the training process. This could be achieved using the logging module or TensorBoard. This would make it easier to monitor the training process and ensure that the gradients are within the desired range.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions