Add ONNX export support for TE modules by asfiyab-nvidia · Pull Request #41 · NVIDIA/TransformerEngine

asfiyab-nvidia · 2022-12-14T03:18:01Z

Add TorchScript Operators
Add symbolic methods to ONNX exporter
Add tests for the ONNX export

Signed-off-by: Asfiya Baig asfiyab@nvidia.com
Signed-off-by: Neta Zmora nzmora@nvidia.com

ptrendx · 2022-12-14T18:10:05Z

Hi @asfiyab-nvidia, what is that libcustom so file?

asfiyab-nvidia · 2022-12-14T18:19:35Z

@ptrendx it contains the onnxruntime (ORT) implementations for FP8 functionality. This is used to test the ONNX export and validate the ORT outputs against TE outputs. (code under tests/test_onnx_export.py)
The so is included in the PR so there's no dependencies on external sources

ptrendx · 2022-12-14T18:33:26Z

Does that have to be closed source? If so, can we at least move it to tests directory instead of the top level one? If it does not have to be closed source then maybe we can have the source inside tests and compile it on the fly?

asfiyab-nvidia · 2022-12-14T20:31:07Z

Moving the .so to the tests directory seems to be a better approach at the moment. We can potentially include the source code in a follow up PR.

ptrendx · 2023-01-05T21:05:14Z

/te-ci

transformer_engine/pytorch/csrc/ts_fp8_op.cpp

transformer_engine/pytorch/module.py

ptrendx · 2023-01-06T21:20:56Z

Please fix the tests (see the results for commit 4812408) - the biggest problem is that you try to run tests requiring FP8 on non-Hopper, which triggers the assertion failure. I am working on enabling Hopper GPU in CI, so we should be able to get the FP8 tests running soon too.

netaz · 2023-01-08T16:09:38Z

@ptrendx is there some code in TE we can leverage to query the SM version, or do you recommend us installing some lib (e.g. pynvml)?

asfiyab-nvidia · 2023-01-10T17:25:40Z

/te-ci

ptrendx · 2023-01-10T18:39:28Z

/te-ci

asfiyab-nvidia · 2023-01-11T19:21:21Z

@ptrendx can you please authorize a pipeline run for the latest commit? It contains fixes for the failures from the last run. Thanks

ptrendx · 2023-01-11T20:50:47Z

/te-ci

* Add TorchScript Operators * Add symbolic methods to ONNX exporter * Add tests for the ONNX export Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* Increase layernorm FP16 threshold * Normalize onnx file names: _ separates configs; - separates words in a single config * Add get_attn_mask_str and fix mask string * Add missing ONNX files * Moved generated ONNX files to tests/gen_onnx_models/ Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

1. remove List import for pylint failure 2. address comments: remove state tensors from GPU 3. address comments: Update reverse_map_dtype function and add to namespace Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

1. skip FP8 tests on non-hopper devices 2. minor fix for C++ lint check Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

1. update copyrights 2. update path to ORT .so Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

ksivaman

Initial comments

tests/test_onnx_export.py

transformer_engine/pytorch/__init__.py

tests/test_onnx_export.py

Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: asfiyab-nvidia <117682710+asfiyab-nvidia@users.noreply.github.com>

ksivaman · 2023-01-18T17:03:48Z

/te-ci

ksivaman · 2023-01-18T18:18:56Z

/te-ci

ksivaman

LGTM

[New API] Added support for Reshape operation. [New API] Added support for DgradDreluBNBwdWeight operation [Minor Enhancement] Added cudnn frontend enums to simplify Resample operation creation. [Minor Enhancement] Added alpha and beta values as key for the plan caches. [Bug Fix] Fixed an error which was causing reference code to fail with segmentation fault. [Bug Fix] Fixed an issue where stride/padding and dilation values were incorrectly cached for 2d convolutions. [Bug Fix] Fixed issues where error statuses were not handled correctly during tensor creation. [Samples] Added a new sample to show case how fMHA graph can be programmed through FE API. This sample contains both fprop and backprop graphs. [Samples] Added a new sample to show case DgradDreluBNBwdWeight operation. [Samples] Added a modular block which models fprop of residual block resnet. Co-authored-by: Anerudhan Gopal <agopal@nvidia.com>

asfiyab-nvidia force-pushed the dev-onnx-export-support branch from 693ee53 to 8ed54a8 Compare December 14, 2022 18:37

asfiyab-nvidia force-pushed the dev-onnx-export-support branch from b39f87e to 0cf5e16 Compare December 27, 2022 19:22

asfiyab-nvidia force-pushed the dev-onnx-export-support branch from 65f4196 to b9b5477 Compare January 4, 2023 21:32

ptrendx reviewed Jan 5, 2023

View reviewed changes

transformer_engine/pytorch/csrc/ts_fp8_op.cpp Outdated Show resolved Hide resolved

ptrendx reviewed Jan 5, 2023

View reviewed changes

transformer_engine/pytorch/module.py Outdated Show resolved Hide resolved

asfiyab-nvidia and others added 14 commits January 17, 2023 20:18

Add ONNX export support for TE modules (#1)

7c0f5a2

* Add TorchScript Operators * Add symbolic methods to ONNX exporter * Add tests for the ONNX export Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

fixes for pylint tests

d8a8305

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

fix pylint warning in softmax.py

9a1961a

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

move FP8 ORT lib inside tests/

8bea101

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

enable cross attention tests

60e951b

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

fix merge conflict changes

bc19bd9

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

fix Q/DQ scale input

c94a780

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

enable FP16 config when bias is disabled

0d765d7

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

fix pylint check errors

40c438f

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

updates

6556139

1. remove List import for pylint failure 2. address comments: remove state tensors from GPU 3. address comments: Update reverse_map_dtype function and add to namespace Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

minor fix: coding guidelines

1a6cf53

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

changes:

595e943

1. skip FP8 tests on non-hopper devices 2. minor fix for C++ lint check Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

fix onnxruntime version

a088c6c

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

asfiyab-nvidia added 2 commits January 17, 2023 20:18

minor fix: add space between code and comment

ede3946

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

changes

ab4410f

1. update copyrights 2. update path to ORT .so Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

asfiyab-nvidia force-pushed the dev-onnx-export-support branch from 119a0ec to ab4410f Compare January 17, 2023 20:35

ksivaman reviewed Jan 18, 2023

View reviewed changes

tests/test_onnx_export.py Outdated Show resolved Hide resolved

transformer_engine/pytorch/__init__.py Outdated Show resolved Hide resolved

tests/test_onnx_export.py Outdated Show resolved Hide resolved

Apply suggestions from code review

b8ae986

Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: asfiyab-nvidia <117682710+asfiyab-nvidia@users.noreply.github.com>

ksivaman approved these changes Jan 18, 2023

View reviewed changes

ksivaman merged commit 6c9ce17 into NVIDIA:main Jan 18, 2023

ksivaman mentioned this pull request Feb 25, 2023

fix bug in non-FP8 nvfuser path #81

Merged

timmoon10 mentioned this pull request Oct 15, 2024

[PyTorch] Build custom ORT ops before running ONNX export tests #1252

Merged

14 tasks

Conversation

asfiyab-nvidia commented Dec 14, 2022

Uh oh!

ptrendx commented Dec 14, 2022

Uh oh!

asfiyab-nvidia commented Dec 14, 2022

Uh oh!

ptrendx commented Dec 14, 2022

Uh oh!

asfiyab-nvidia commented Dec 14, 2022

Uh oh!

ptrendx commented Jan 5, 2023

Uh oh!

Uh oh!

Uh oh!

ptrendx commented Jan 6, 2023

Uh oh!

netaz commented Jan 8, 2023

Uh oh!

asfiyab-nvidia commented Jan 10, 2023

Uh oh!

ptrendx commented Jan 10, 2023

Uh oh!

asfiyab-nvidia commented Jan 11, 2023

Uh oh!

ptrendx commented Jan 11, 2023

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ksivaman commented Jan 18, 2023

Uh oh!

ksivaman commented Jan 18, 2023

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments