Describe the feature
I have been using the colossalai framework for my project and I noticed that there is no way to obtain the grad_norm after the backward pass during training. For example, the optimizer ZeroOptimizer calculates the grad_norm using the _calc_global_norm method in the step() method and then clips the grad_norm. If I want to separately obtain the grad_norm, I have to call _calc_global_norm again, which results in unnecessary additional calls. However, in PyTorch, the logic of clip_grad_norm() and optimizer.step() are decoupled, and this issue does not exist.
I would like to request the addition of a logging mechanism for the grad_norm during the training process. This could be achieved using the logging module or TensorBoard. This would make it easier to monitor the training process and ensure that the gradients are within the desired range.
Describe the feature
I have been using the colossalai framework for my project and I noticed that there is no way to obtain the
grad_normafter the backward pass during training. For example, the optimizerZeroOptimizercalculates the grad_norm using the_calc_global_normmethod in thestep()method and then clips the grad_norm. If I want to separately obtain the grad_norm, I have to call_calc_global_normagain, which results in unnecessary additional calls. However, in PyTorch, the logic ofclip_grad_norm()andoptimizer.step()are decoupled, and this issue does not exist.I would like to request the addition of a logging mechanism for the
grad_normduring the training process. This could be achieved using the logging module or TensorBoard. This would make it easier to monitor the training process and ensure that the gradients are within the desired range.