[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core#1614
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core#1614pggPL merged 39 commits intoNVIDIA:mainfrom
Conversation
|
/te-ci pytorch L1 |
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
transformer_engine/pytorch/tensor/_internal/float8_tensor_base.py
Outdated
Show resolved
Hide resolved
| tensor: torch.Tensor, | ||
| *, | ||
| out: Optional[Union[torch.Tensor, DebugQuantizedTensor]] = None, | ||
| dtype: torch.dtype = None, |
There was a problem hiding this comment.
Weight caching for fake quantization for example. We want to save them in correct precision.
There was a problem hiding this comment.
I don't think I understand this usecase TBH. Wouldn't you still want such tensor to have the dtype the same as the input tensor to the quantize call?
There was a problem hiding this comment.
Suppose weight is stored in FP32, but activation is in BF16. Then we want weight after fake-quantization to be in BF16 to prevent cast each forward.
There was a problem hiding this comment.
Ok, so that's the AMP scenario. the regular torch.nn.Linear just fails when given weight and activation in different types.
There was a problem hiding this comment.
we do it here anyways
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
|
/te-ci pytorch L1 |
for more information, see https://pre-commit.ci
| tensor: torch.Tensor, | ||
| *, | ||
| out: Optional[Union[torch.Tensor, DebugQuantizedTensor]] = None, | ||
| dtype: torch.dtype = None, |
There was a problem hiding this comment.
I don't think I understand this usecase TBH. Wouldn't you still want such tensor to have the dtype the same as the input tensor to the quantize call?
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
| return [None] * 5 | ||
| return [None] * 6 | ||
| grad_input_quantizer = None | ||
| grad_weight_quantizer = None |
There was a problem hiding this comment.
I strongly oppose adding a quantizer for the grad weight. It doesn't make sense logically since we never plan to output a quantized grad weight, barring some major development in optimizers. I get that the asymmetry is weird, but the real problem is that output_quantizer and grad_input_quantizer are also illogical. We should be using the input_quantizer/grad_output_quantizer from the layer that consumes them, like what te.Sequential does, but that's very tricky to implement.
There was a problem hiding this comment.
Ok, but maybe one wants to do some experiments with weight quantization and debug tools helps with that. Moreover this quantizer enables us to log the statistics.
There was a problem hiding this comment.
@timmoon10 Actually, I would argue that grad_weight_quantizer makes more sense compared to the other quantizers you listed since it is actually "local" vs the other ones logically belonging to the other operations. Even in te.Sequential the grad_weight quantizer would belong to the op that holds those weights.
I also don't agree that we never plan to output a quantized grad weight - while the convergence story of this would need to be understood, projects like MS-AMP tried to provide FP8 gradient functionality since it saves a lot of memory during training. So it is not completely out of the question that we will want to provide that as well.
There was a problem hiding this comment.
The MS-AMP case is a good example. I suppose the right answer would be to gracefully support multiple quantizer types in a layer. In most cases we would have grad_weight_quantizer=None. grad_weight_quantizer would only be non-trivial if we want quantized grads or we're doing something weird like debug mode.
Unresolving for future visibility.
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
|
/te-ci pytorch L1 |
timmoon10
left a comment
There was a problem hiding this comment.
I haven't fully gone through the code, but things look reasonable to me. I would be fine merging.
|
/te-ci pytorch L1 |
Description
Core part of NVIDIA-DL-Framework-Inspect support - which affects the core TE files.
Checklist: