[PyTorch] Build custom ORT ops before running ONNX export tests by timmoon10 · Pull Request #1252 · NVIDIA/TransformerEngine

timmoon10 · 2024-10-15T01:19:42Z

Description

We have experienced failures in the ONNX export tests when running with Python 3.12 because PyPI does not have an available distribution for ONNX Runtime 1.13.1. Instead of manually rebuilding the custom ONNX Runtime ops at libcustom_ort_fp8_qdq_ops.so, I figure it's a good time to add logic to build the ops automatically before running the test.

Related: #41

Pinging @nzmora-nvidia and @asfiyab-nvidia.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor
Testing

Changes

Upgrade to ONNX Runtime 1.19.2 for ONNX export tests
Remove binary for custom ORT ops
Add code to build custom ORT ops
Export ONNX ops that perform intermediate compute in FP32, similar to TE kernels

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2024-10-15T01:21:46Z

/te-ci pytorch

qa/L0_pytorch_unittest/test.sh

cyanguwa

LGTM

Matches internal impl of TE kernels. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2024-10-15T19:08:03Z

/te-ci pytorch

timmoon10 · 2024-10-15T19:12:15Z

I'm not sure what changed when I bumped the ONNX Runtime version, but I've experienced test failures for some FP8 operations (GeLU, LayerNorm, RMSNorm). Our tests tolerances basically require bit-perfect FP8 casting. However, this is not reasonable since the exported ONNX ops might do intermediate compute in FP16 while the TE kernels do all intermediate compute in FP32. I've modified the ONNX export infrastructure to match TE and do intermediate compute in FP32. In the future we should consider loosening the numerical tolerances to handle expected numerical error with FP8.

ksivaman · 2024-10-15T19:19:23Z

@timmoon10 I haven't looked but there are some differences between the tests/pytorch/custom_ort_ops/custom_op_library.cc here and original version authored by @nzmora-nvidia from where we've copied this, could this be a source of some discrepancies?

timmoon10 · 2024-10-15T20:17:36Z

Perhaps, but I don't think any of those changes should have affected numerics. In any case, the changes in this PR make the ONNX export more correct, so I don't think there's much risk to merging if the tests pass.

…IA#1252) * Build custom ORT ops before running ONNX tests Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove ONNX from context parallelism tests Signed-off-by: Tim Moon <tmoon@nvidia.com> * Export ONNX ops that do compute in FP32 Matches internal impl of TE kernels. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add build script for custom ORT ops Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added 2 commits October 14, 2024 18:08

Build custom ORT ops before running ONNX tests

6897059

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Remove ONNX from context parallelism tests

c1072d4

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added bug Something isn't working testing Improvements to tests or testing infrastructure labels Oct 15, 2024

timmoon10 requested a review from cyanguwa October 15, 2024 01:19

ksivaman reviewed Oct 15, 2024

View reviewed changes

qa/L0_pytorch_unittest/test.sh Show resolved Hide resolved

cyanguwa approved these changes Oct 15, 2024

View reviewed changes

timmoon10 added 2 commits October 15, 2024 12:04

Export ONNX ops that do compute in FP32

45ac21a

Matches internal impl of TE kernels. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Add build script for custom ORT ops

495e5dc

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 merged commit f6b766b into NVIDIA:main Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Build custom ORT ops before running ONNX export tests#1252

[PyTorch] Build custom ORT ops before running ONNX export tests#1252
timmoon10 merged 4 commits intoNVIDIA:mainfrom
timmoon10:custom-ort-ops

timmoon10 commented Oct 15, 2024 •

edited

Loading

Uh oh!

timmoon10 commented Oct 15, 2024

Uh oh!

Uh oh!

cyanguwa left a comment

Uh oh!

timmoon10 commented Oct 15, 2024

Uh oh!

timmoon10 commented Oct 15, 2024 •

edited

Loading

Uh oh!

ksivaman commented Oct 15, 2024

Uh oh!

timmoon10 commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

timmoon10 commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Oct 15, 2024

Uh oh!

Uh oh!

cyanguwa left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Oct 15, 2024

Uh oh!

timmoon10 commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ksivaman commented Oct 15, 2024

Uh oh!

timmoon10 commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

timmoon10 commented Oct 15, 2024 •

edited

Loading

timmoon10 commented Oct 15, 2024 •

edited

Loading