[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core by pggPL · Pull Request #1614 · NVIDIA/TransformerEngine

pggPL · 2025-03-25T14:24:34Z

Description

Core part of NVIDIA-DL-Framework-Inspect support - which affects the core TE files.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2025-03-25T15:55:58Z

/te-ci pytorch L1

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2025-04-01T10:44:07Z

/te-ci pytorch L1

setup.py

transformer_engine/pytorch/tensor/_internal/float8_tensor_base.py

transformer_engine/pytorch/tensor/__init__.py

transformer_engine/pytorch/module/linear.py

transformer_engine/pytorch/module/base.py

transformer_engine/pytorch/module/layernorm_linear.py

transformer_engine/pytorch/module/layernorm_mlp.py

transformer_engine/debug/pytorch/debug_state.py

ptrendx · 2025-04-02T04:38:08Z

transformer_engine/debug/pytorch/debug_quantization.py

+        tensor: torch.Tensor,
+        *,
+        out: Optional[Union[torch.Tensor, DebugQuantizedTensor]] = None,
+        dtype: torch.dtype = None,


What is the use case for this?

Weight caching for fake quantization for example. We want to save them in correct precision.

I don't think I understand this usecase TBH. Wouldn't you still want such tensor to have the dtype the same as the input tensor to the quantize call?

Suppose weight is stored in FP32, but activation is in BF16. Then we want weight after fake-quantization to be in BF16 to prevent cast each forward.

Ok, so that's the AMP scenario. the regular torch.nn.Linear just fails when given weight and activation in different types.

we do it here anyways

TransformerEngine/transformer_engine/pytorch/module/linear.py

Line 200 in 66d6afb

weightmat = cast_if_needed(weight, activation_dtype)

Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

pggPL · 2025-04-02T15:18:02Z

/te-ci pytorch L1

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

transformer_engine/pytorch/utils.py

transformer_engine/pytorch/module/base.py

transformer_engine/pytorch/cpp_extensions/gemm.py

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2025-04-07T12:04:16Z

/te-ci pytorch L1

for more information, see https://pre-commit.ci

ptrendx · 2025-04-07T22:33:35Z

transformer_engine/debug/pytorch/debug_quantization.py

+        tensor: torch.Tensor,
+        *,
+        out: Optional[Union[torch.Tensor, DebugQuantizedTensor]] = None,
+        dtype: torch.dtype = None,


I don't think I understand this usecase TBH. Wouldn't you still want such tensor to have the dtype the same as the input tensor to the quantize call?

transformer_engine/debug/pytorch/debug_state.py

transformer_engine/pytorch/module/base.py

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

transformer_engine/pytorch/tensor/quantized_tensor.py

transformer_engine/pytorch/utils.py

transformer_engine/pytorch/tensor/__init__.py

transformer_engine/pytorch/module/linear.py

timmoon10 · 2025-04-12T01:03:25Z

transformer_engine/pytorch/module/linear.py

-            return [None] * 5
+            return [None] * 6
        grad_input_quantizer = None
+        grad_weight_quantizer = None


I strongly oppose adding a quantizer for the grad weight. It doesn't make sense logically since we never plan to output a quantized grad weight, barring some major development in optimizers. I get that the asymmetry is weird, but the real problem is that output_quantizer and grad_input_quantizer are also illogical. We should be using the input_quantizer/grad_output_quantizer from the layer that consumes them, like what te.Sequential does, but that's very tricky to implement.

Ok, but maybe one wants to do some experiments with weight quantization and debug tools helps with that. Moreover this quantizer enables us to log the statistics.

@timmoon10 Actually, I would argue that grad_weight_quantizer makes more sense compared to the other quantizers you listed since it is actually "local" vs the other ones logically belonging to the other operations. Even in te.Sequential the grad_weight quantizer would belong to the op that holds those weights.
I also don't agree that we never plan to output a quantized grad weight - while the convergence story of this would need to be understood, projects like MS-AMP tried to provide FP8 gradient functionality since it saves a lot of memory during training. So it is not completely out of the question that we will want to provide that as well.

The MS-AMP case is a good example. I suppose the right answer would be to gracefully support multiple quantizer types in a layer. In most cases we would have grad_weight_quantizer=None. grad_weight_quantizer would only be non-trivial if we want quantized grads or we're doing something weird like debug mode.

Unresolving for future visibility.

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

pggPL · 2025-04-15T09:22:44Z

/te-ci pytorch L1

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

ptrendx

From my side LGTM.

timmoon10

I haven't fully gone through the code, but things look reasonable to me. I would be fine merging.

pggPL · 2025-04-16T09:04:59Z

/te-ci pytorch L1

pggPL added 3 commits March 25, 2025 14:22

add

4871063

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

weight workspace fix

2040b35

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

docs fix

6d60342

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL and others added 5 commits April 1, 2025 10:45

file i forgot

0780661

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge remote-tracking branch 'upstream/main' into nvinspect_core

f1a8e08

[pre-commit.ci] auto fixes from pre-commit.com hooks

3db240f

for more information, see https://pre-commit.ci

fix

4ea29c8

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

lint fix

d39000c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

ptrendx reviewed Apr 2, 2025

View reviewed changes

pggPL and others added 9 commits April 2, 2025 11:35

Update transformer_engine/debug/pytorch/utils.py

e727df1

Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

setup fix

9a8030e

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

setup fix

fb5b176

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Update transformer_engine/pytorch/tensor/_internal/float8_tensor_base.py

b84f4e7

Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

all tensor types

c93afb7

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fixes

c433066

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fixes

85256b7

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge remote-tracking branch 'upstream/main' into nvinspect_core

cc865b2

[pre-commit.ci] auto fixes from pre-commit.com hooks

78db8c0

for more information, see https://pre-commit.ci

fix

348d4f4

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

ksivaman reviewed Apr 3, 2025

View reviewed changes

transformer_engine/pytorch/utils.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 3, 2025

View reviewed changes

transformer_engine/pytorch/module/base.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 3, 2025

View reviewed changes

transformer_engine/pytorch/cpp_extensions/gemm.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 3, 2025

View reviewed changes

transformer_engine/pytorch/cpp_extensions/gemm.py Outdated Show resolved Hide resolved

pggPL added 3 commits April 7, 2025 13:14

Merge remote-tracking branch 'upstream/main' into nvinspect_core

9446dff

fixes

2e1aa04

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

ef1ce89

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6205971

for more information, see https://pre-commit.ci

ptrendx reviewed Apr 7, 2025

View reviewed changes

pggPL and others added 6 commits April 8, 2025 11:49

Merge branch 'main' into nvinspect_core

207c6a9

removed check

96120b4

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

move error

d73128a

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge branch 'main' into nvinspect_core

0e04417

Merge branch 'main' into nvinspect_core

3ce9ede

_reset

9fccb57

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

timmoon10 requested changes Apr 12, 2025

View reviewed changes

timmoon10 self-requested a review April 12, 2025 01:15

pggPL and others added 9 commits April 14, 2025 10:04

Update transformer_engine/pytorch/module/linear.py

6957da5

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

Merge remote-tracking branch 'upstream/main' into nvinspect_core

e8b61f4

name documentation

64332c4

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

added blockwise quantizer

17d93fa

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

876e6bf

for more information, see https://pre-commit.ci

fix

ff9d053

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9a2ffe2

for more information, see https://pre-commit.ci

make debug option optional

9eaf124

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Update transformer_engine/pytorch/tensor/quantized_tensor.py

650393e

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>

names fix

b0d92c9

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

ptrendx approved these changes Apr 15, 2025

View reviewed changes

timmoon10 approved these changes Apr 16, 2025

View reviewed changes

timmoon10 self-requested a review April 16, 2025 00:35

Merge branch 'main' into nvinspect_core

3aecae6

pggPL merged commit beaecf8 into NVIDIA:main Apr 16, 2025
24 of 29 checks passed

Conversation

pggPL commented Mar 25, 2025

Description

Checklist:

Uh oh!

pggPL commented Mar 25, 2025

Uh oh!

pggPL commented Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pggPL Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pggPL commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pggPL commented Apr 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pggPL commented Apr 15, 2025

Uh oh!

ptrendx left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

pggPL commented Apr 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pggPL Apr 15, 2025 •

edited

Loading

timmoon10 Apr 16, 2025 •

edited

Loading