Skip to content

Fix dActivation#1462

Merged
ptrendx merged 11 commits intoNVIDIA:release_v2.0from
ptrendx:pr_seed_tests
Feb 7, 2025
Merged

Fix dActivation#1462
ptrendx merged 11 commits intoNVIDIA:release_v2.0from
ptrendx:pr_seed_tests

Conversation

@ptrendx
Copy link
Member

@ptrendx ptrendx commented Feb 7, 2025

Description

This PR supersedes PR #1460. There was a bug introduced in the dActivation kernels for Blackwell where the activation input and the gradient input were swapped. This uncovered the issue with the tests, where different tensors were seeded with the same values.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Properly seed the tensors in C++ tests
  • Fix the dActivation logic

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
@ptrendx ptrendx requested a review from timmoon10 February 7, 2025 00:25
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reproduce this error when I enable the quantized backward activation kernels in the te.Sequential tests. This PR looks correct to me, but the logic in quantize_helper is too convoluted and unintuitive for me to be confident. Pipeline 23564490 is passing so far.

Comment on lines +1206 to +1207
input_tensor = reinterpret_cast<const Tensor *>(grad);
activation_input_tensor = reinterpret_cast<const Tensor *>(input);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is hell. We need to change it. CC @Oleg-Goncharov

Comment on lines +1199 to +1201
void quantize_helper(const NVTETensor input, const NVTETensor grad, const NVTETensor noop,
NVTETensor output, NVTETensor dbias, NVTETensor workspace,
cudaStream_t stream) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion from this function is not worth the code reuse. Better to split it up into three functions: quantize_helper, forward_activation_helper, backward_activation_helper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment on lines -251 to 253
float elt = static_cast<float>(in.data.elt[j]);
if constexpr (IS_ACT || IS_DACT) {
if constexpr (IS_ACT) {
elt = OP(elt, {});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, this is the bug we're trying to fix. If the forward pass is y = f(x), we were previously computing dx = x * df(dy) instead of dx = dy * df(x).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

@ptrendx ptrendx merged commit 2d058d6 into NVIDIA:release_v2.0 Feb 7, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants