Describe the feature
In some vae training, users may use weight adaptive loss which may compute grad of some parameters twice, like

This will trigger backward hook twice.
Based on pytorch's document, we may use post-grad-accumulation hook to solve this problem.

Describe the feature
In some vae training, users may use weight adaptive loss which may compute grad of some parameters twice, like

This will trigger backward hook twice.
Based on pytorch's document, we may use post-grad-accumulation hook to solve this problem.