Fix dActivation by ptrendx · Pull Request #1462 · NVIDIA/TransformerEngine

ptrendx · 2025-02-07T00:25:45Z

Description

This PR supersedes PR #1460. There was a bug introduced in the dActivation kernels for Blackwell where the activation input and the gradient input were swapped. This uncovered the issue with the tests, where different tensors were seeded with the same values.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Properly seed the tensors in C++ tests
Fix the dActivation logic

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

for more information, see https://pre-commit.ci

timmoon10

I reproduce this error when I enable the quantized backward activation kernels in the te.Sequential tests. This PR looks correct to me, but the logic in quantize_helper is too convoluted and unintuitive for me to be confident. Pipeline 23564490 is passing so far.

timmoon10 · 2025-02-07T00:47:44Z

transformer_engine/common/util/cast_kernels.cuh

+    input_tensor = reinterpret_cast<const Tensor *>(grad);
+    activation_input_tensor = reinterpret_cast<const Tensor *>(input);


Yes, this is hell. We need to change it. CC @Oleg-Goncharov

timmoon10 · 2025-02-07T01:23:04Z

transformer_engine/common/util/cast_kernels.cuh

+void quantize_helper(const NVTETensor input, const NVTETensor grad, const NVTETensor noop,
+                     NVTETensor output, NVTETensor dbias, NVTETensor workspace,
+                     cudaStream_t stream) {


The confusion from this function is not worth the code reuse. Better to split it up into three functions: quantize_helper, forward_activation_helper, backward_activation_helper.

timmoon10 · 2025-02-07T01:26:47Z

transformer_engine/common/util/cast_kernels.cuh

            float elt = static_cast<float>(in.data.elt[j]);
-            if constexpr (IS_ACT || IS_DACT) {
+            if constexpr (IS_ACT) {
              elt = OP(elt, {});
            }


So if I understand correctly, this is the bug we're trying to fix. If the forward pass is y = f(x), we were previously computing dx = x * df(dy) instead of dx = dy * df(x).

That's correct.

ptrendx added 10 commits February 6, 2025 14:32

Ensure that each tensor is seeded differently

d7d61cd

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

5b34f01

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

736b242

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

3ff436e

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

1ab6b10

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Disambiguate (and fix) the C++ unit tests for dact

d7254bf

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix tests

e35f301

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

7142618

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

30dc30a

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix MXFP8 dbias tests

ed31060

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

ptrendx requested a review from timmoon10 February 7, 2025 00:25

[pre-commit.ci] auto fixes from pre-commit.com hooks

efde3a2

for more information, see https://pre-commit.ci

timmoon10 approved these changes Feb 7, 2025

View reviewed changes

timmoon10 mentioned this pull request Feb 7, 2025

[PyTorch] Enable quantized activation backward kernels in operation-based API tests #1463

Closed

14 tasks

ptrendx merged commit 2d058d6 into NVIDIA:release_v2.0 Feb 7, 2025
11 checks passed

ptrendx mentioned this pull request Feb 7, 2025

Fix DAct input ordering of gradient input and activation input #1460

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dActivation#1462

Fix dActivation#1462
ptrendx merged 11 commits intoNVIDIA:release_v2.0from
ptrendx:pr_seed_tests

ptrendx commented Feb 7, 2025

Uh oh!

timmoon10 left a comment

Uh oh!

timmoon10 Feb 7, 2025

Uh oh!

ptrendx Feb 7, 2025

Uh oh!

timmoon10 Feb 7, 2025

Uh oh!

ptrendx Feb 7, 2025

Uh oh!

timmoon10 Feb 7, 2025

Uh oh!

ptrendx Feb 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		input_tensor = reinterpret_cast<const Tensor *>(grad);
		activation_input_tensor = reinterpret_cast<const Tensor *>(input);

Conversation

ptrendx commented Feb 7, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants