Skip to content

[PyTorch] Build custom ORT ops before running ONNX export tests#1252

Merged
timmoon10 merged 4 commits intoNVIDIA:mainfrom
timmoon10:custom-ort-ops
Oct 16, 2024
Merged

[PyTorch] Build custom ORT ops before running ONNX export tests#1252
timmoon10 merged 4 commits intoNVIDIA:mainfrom
timmoon10:custom-ort-ops

Conversation

@timmoon10
Copy link
Collaborator

@timmoon10 timmoon10 commented Oct 15, 2024

Description

We have experienced failures in the ONNX export tests when running with Python 3.12 because PyPI does not have an available distribution for ONNX Runtime 1.13.1. Instead of manually rebuilding the custom ONNX Runtime ops at libcustom_ort_fp8_qdq_ops.so, I figure it's a good time to add logic to build the ops automatically before running the test.

Related: #41

Pinging @nzmora-nvidia and @asfiyab-nvidia.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor
  • Testing

Changes

  • Upgrade to ONNX Runtime 1.19.2 for ONNX export tests
  • Remove binary for custom ORT ops
  • Add code to build custom ORT ops
  • Export ONNX ops that perform intermediate compute in FP32, similar to TE kernels

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 added bug Something isn't working testing Improvements to tests or testing infrastructure labels Oct 15, 2024
@timmoon10 timmoon10 requested a review from cyanguwa October 15, 2024 01:19
@timmoon10
Copy link
Collaborator Author

/te-ci pytorch

Copy link
Collaborator

@cyanguwa cyanguwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Matches internal impl of TE kernels.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci pytorch

@timmoon10
Copy link
Collaborator Author

timmoon10 commented Oct 15, 2024

I'm not sure what changed when I bumped the ONNX Runtime version, but I've experienced test failures for some FP8 operations (GeLU, LayerNorm, RMSNorm). Our tests tolerances basically require bit-perfect FP8 casting. However, this is not reasonable since the exported ONNX ops might do intermediate compute in FP16 while the TE kernels do all intermediate compute in FP32. I've modified the ONNX export infrastructure to match TE and do intermediate compute in FP32. In the future we should consider loosening the numerical tolerances to handle expected numerical error with FP8.

@ksivaman
Copy link
Member

@timmoon10 I haven't looked but there are some differences between the tests/pytorch/custom_ort_ops/custom_op_library.cc here and original version authored by @nzmora-nvidia from where we've copied this, could this be a source of some discrepancies?

@timmoon10
Copy link
Collaborator Author

Perhaps, but I don't think any of those changes should have affected numerics. In any case, the changes in this PR make the ONNX export more correct, so I don't think there's much risk to merging if the tests pass.

@timmoon10 timmoon10 merged commit f6b766b into NVIDIA:main Oct 16, 2024
timmoon10 added a commit to timmoon10/TransformerEngine that referenced this pull request Nov 7, 2024
…IA#1252)

* Build custom ORT ops before running ONNX tests

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove ONNX from context parallelism tests

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Export ONNX ops that do compute in FP32

Matches internal impl of TE kernels.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add build script for custom ORT ops

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working testing Improvements to tests or testing infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments