Conversation
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
for more information, see https://pre-commit.ci
timmoon10
left a comment
There was a problem hiding this comment.
I reproduce this error when I enable the quantized backward activation kernels in the te.Sequential tests. This PR looks correct to me, but the logic in quantize_helper is too convoluted and unintuitive for me to be confident. Pipeline 23564490 is passing so far.
| input_tensor = reinterpret_cast<const Tensor *>(grad); | ||
| activation_input_tensor = reinterpret_cast<const Tensor *>(input); |
There was a problem hiding this comment.
Yes, this is hell. We need to change it. CC @Oleg-Goncharov
| void quantize_helper(const NVTETensor input, const NVTETensor grad, const NVTETensor noop, | ||
| NVTETensor output, NVTETensor dbias, NVTETensor workspace, | ||
| cudaStream_t stream) { |
There was a problem hiding this comment.
The confusion from this function is not worth the code reuse. Better to split it up into three functions: quantize_helper, forward_activation_helper, backward_activation_helper.
| float elt = static_cast<float>(in.data.elt[j]); | ||
| if constexpr (IS_ACT || IS_DACT) { | ||
| if constexpr (IS_ACT) { | ||
| elt = OP(elt, {}); | ||
| } |
There was a problem hiding this comment.
So if I understand correctly, this is the bug we're trying to fix. If the forward pass is y = f(x), we were previously computing dx = x * df(dy) instead of dx = dy * df(x).
Description
This PR supersedes PR #1460. There was a bug introduced in the dActivation kernels for Blackwell where the activation input and the gradient input were swapped. This uncovered the issue with the tests, where different tensors were seeded with the same values.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: