Debug CI tests on Ada by timmoon10 · Pull Request #397 · NVIDIA/TransformerEngine

timmoon10 · 2023-08-23T21:12:45Z

This applies the changes in #393 to the PyTorch and Paddle tests. In particular, it only runs tests involving cuDNN fused attention on compute capabilities 8.0 and 9.0.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-08-23T21:14:07Z

Pipeline 9489089

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Avoid split-k kernels on Ada. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-08-24T22:23:47Z

Running on an L40, I found that the JAX FP8 GEMM tests on integer matrices were failing. It seems cuBLAS chooses a split-k kernel that prevents us from getting bit-wise correct results, although it is still within the expected FP8 error. I've changed the matrix dims to help cuBLAS pick a nicer kernel.

Pipeline 9504876.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-09-06T04:03:56Z

Pipeline 9617488 is green.

tests/pytorch/test_fused_attn.py

tests/jax/test_custom_call_compute.py

tests/jax/test_fused_attn.py

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-09-23T00:00:27Z

I've tweaked the PyTorch and JAX fused attention tests so we check if there's a supported backed (namely F16_arbitrary_seqlen on Ada). These pass when I run manually on an L40 and I've launched pipeline 9938409.

#403 adds some PyTorch attention tests and #411 adds backend detection logic to Paddle. We should hold off on merging until those are in.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-10-04T00:29:19Z

This PR is now good to go, pending ~~pipeline 10094388~~ pipeline 70748350.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

cyanguwa

Just that one comment, otherwise looks good!

tests/pytorch/test_fused_attn.py

Signed-off-by: Tim Moon <tmoon@nvidia.com>

@cyanguwa

Review suggestion from @cyanguwa Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

tests/paddle/test_operators.py

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-10-12T17:57:35Z

Tests passed in pipeline 10211932.

ksivaman

Looks good

Debug PyTorch and Paddle tests on Ada

53991e9

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added the bug Something isn't working label Aug 23, 2023

timmoon10 requested review from cyanguwa and ksivaman August 23, 2023 21:12

Only run Paddle layer tests with cuDNN fMHA on supported archs

8b8c107

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 mentioned this pull request Aug 24, 2023

[Paddle] Add parallel support #357

Merged

timmoon10 added 2 commits August 24, 2023 11:33

Debug PyTorch fMHA tests

d171496

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Reduce JAX FP8 GEMM sizes

02eb19e

Avoid split-k kernels on Ada. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added 3 commits August 28, 2023 10:43

Merge branch 'main' into ada-ci-debug

2081361

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

91af369

Disable JAX fused self-attention test on Ada

1cbf0df

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

b23008c

cyanguwa requested changes Sep 12, 2023

View reviewed changes

tests/pytorch/test_fused_attn.py Outdated Show resolved Hide resolved

tests/jax/test_custom_call_compute.py Show resolved Hide resolved

tests/jax/test_fused_attn.py Outdated Show resolved Hide resolved

timmoon10 added 4 commits September 13, 2023 08:51

Merge branch 'main' into ada-ci-debug

434bc2a

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

f021843

Run supported fused attention tests on Ada

9320700

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Run supported fused attention JAX tests on Ada

b79b163

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 force-pushed the ada-ci-debug branch from 6526b44 to b79b163 Compare September 22, 2023 23:50

timmoon10 added 2 commits September 26, 2023 17:24

Merge branch 'main' into ada-ci-debug

0e55cd0

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

aa4add3

timmoon10 requested a review from cyanguwa October 4, 2023 00:19

Enable Paddle fused attention on Ada

0a864fc

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 force-pushed the ada-ci-debug branch from b06201d to 0a864fc Compare October 4, 2023 00:48

cyanguwa approved these changes Oct 5, 2023

View reviewed changes

tests/pytorch/test_fused_attn.py Show resolved Hide resolved

Merge branch 'main' into ada-ci-debug

8384f75

timmoon10 added 3 commits October 6, 2023 17:01

Merge branch 'main' into ada-ci-debug

f499cba

Update reference scale calculation in TensorFlow test

74e28ad

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'tf-scale-bugfix' into ada-ci-debug

c3c82e2

timmoon10 mentioned this pull request Oct 7, 2023

Update reference scale calculation in TensorFlow test #463

Merged

timmoon10 added 6 commits October 6, 2023 18:36

Restore backend support to reference FP8 attention impl in PyT test

24c8814

Review suggestion from @cyanguwa Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

2d9cd6b

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix merge conflicts

44ac1f8

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Debug Paddle tests on Ada

ae8af09

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

6726c30

Loosen tolerances for Paddle attention tests

777e45b

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 commented Oct 11, 2023

View reviewed changes

tests/paddle/test_operators.py Outdated Show resolved Hide resolved

timmoon10 added 2 commits October 11, 2023 14:58

Assume causal mask implies equal seqlens in Paddle attention tests

bcab36d

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into ada-ci-debug

6448c1d

timmoon10 changed the title ~~Debug PyTorch and Paddle tests on Ada~~ Debug CI tests on Ada Oct 11, 2023

ksivaman approved these changes Oct 12, 2023

View reviewed changes

timmoon10 merged commit 4ae3476 into NVIDIA:main Oct 12, 2023

timmoon10 deleted the ada-ci-debug branch October 12, 2023 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug CI tests on Ada#397

Debug CI tests on Ada#397
timmoon10 merged 27 commits intoNVIDIA:mainfrom
timmoon10:ada-ci-debug

timmoon10 commented Aug 23, 2023

Uh oh!

timmoon10 commented Aug 23, 2023

Uh oh!

timmoon10 commented Aug 24, 2023

Uh oh!

timmoon10 commented Sep 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Sep 23, 2023

Uh oh!

timmoon10 commented Oct 4, 2023 •

edited

Loading

Uh oh!

cyanguwa left a comment

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Oct 12, 2023

Uh oh!

ksivaman left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

timmoon10 commented Aug 23, 2023

Uh oh!

timmoon10 commented Aug 23, 2023

Uh oh!

timmoon10 commented Aug 24, 2023

Uh oh!

timmoon10 commented Sep 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Sep 23, 2023

Uh oh!

timmoon10 commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyanguwa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Oct 12, 2023

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

timmoon10 commented Oct 4, 2023 •

edited

Loading