Error handle for non-sm80/sm90 GPUs when using fused attention by zlsh80826 · Pull Request #393 · NVIDIA/TransformerEngine

zlsh80826 · 2023-08-21T15:46:15Z

cuDNN max 512 seqlen fused kernel only supports sm80 and sm90, and arbitrary seqlen requires 8.9.3 to support all CC >= 80.

Disable max 512 seqlen fused kernel when CC is not 80 or 90
Update is_fused_attn_available API for different attention setup

zlsh80826 · 2023-08-21T15:46:58Z

/te-ci

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 · 2023-08-21T15:50:46Z

/te-ci

zlsh80826 · 2023-08-21T17:16:13Z

@timmoon10 @ksivaman The CI reports "no space left on device" when initializing the container, could you take a look? Thanks

timmoon10

LGTM

@ksivaman @ptrendx This is relevant to our discussion on the common headers at #382 (comment).

transformer_engine/jax/csrc/modules.cpp

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

timmoon10 · 2023-08-21T17:54:47Z

BTW, no reason to worry about the GitHub CI failures as long as the GitLab tests all pass. The provided nodes don't seem to be beefy enough to handle the current PyTorch container and I'm still thinking about workarounds.

transformer_engine/common/fused_attn/fused_attn.cpp

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 · 2023-08-23T12:25:39Z

/te-ci

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 · 2023-08-23T15:04:46Z

/te-ci

transformer_engine/jax/flax/transformer.py

transformer_engine/jax/csrc/modules.cpp

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 · 2023-08-24T16:52:30Z

/te-ci

timmoon10

LGTM

…A#393) * Fused attention kernel only supports sm80 and sm90 Signed-off-by: Reese Wang <rewang@nvidia.com> * Update transformer_engine/jax/csrc/modules.cpp Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * arbitary fused kernel supports sm86/sm89 after 8.9.3 Signed-off-by: Reese Wang <rewang@nvidia.com> * Skip sm70 Signed-off-by: Reese Wang <rewang@nvidia.com> * Forward is_fused_attn_kernel_available to cpp backend Signed-off-by: Reese Wang <rewang@nvidia.com> * Remove cpp is_fused_attn_available API Signed-off-by: Reese Wang <rewang@nvidia.com> --------- Signed-off-by: Reese Wang <rewang@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Jan Bielak <jbielak@nvidia.com>

…A#393) * Fused attention kernel only supports sm80 and sm90 Signed-off-by: Reese Wang <rewang@nvidia.com> * Update transformer_engine/jax/csrc/modules.cpp Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * arbitary fused kernel supports sm86/sm89 after 8.9.3 Signed-off-by: Reese Wang <rewang@nvidia.com> * Skip sm70 Signed-off-by: Reese Wang <rewang@nvidia.com> * Forward is_fused_attn_kernel_available to cpp backend Signed-off-by: Reese Wang <rewang@nvidia.com> * Remove cpp is_fused_attn_available API Signed-off-by: Reese Wang <rewang@nvidia.com> --------- Signed-off-by: Reese Wang <rewang@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Rui Wang <rui@helixon.com>

zlsh80826 force-pushed the rewang/restrict-fused-attn-to-cc8090 branch from 8b81606 to 1cc7ee8 Compare August 21, 2023 15:46

Fused attention kernel only supports sm80 and sm90

1a3f728

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 force-pushed the rewang/restrict-fused-attn-to-cc8090 branch from 1cc7ee8 to 1a3f728 Compare August 21, 2023 15:49

zlsh80826 changed the title ~~Error handle for non-sm80/sm90 when using fused attention~~ Error handle for non-sm80/sm90 GPUs when using fused attention Aug 21, 2023

zlsh80826 requested review from cyanguwa and timmoon10 August 21, 2023 17:14

timmoon10 approved these changes Aug 21, 2023

View reviewed changes

transformer_engine/jax/csrc/modules.cpp Outdated Show resolved Hide resolved

Update transformer_engine/jax/csrc/modules.cpp

aeabf9c

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman approved these changes Aug 21, 2023

View reviewed changes

timmoon10 mentioned this pull request Aug 21, 2023

Refactor logging macros #382

Merged

cyanguwa reviewed Aug 21, 2023

View reviewed changes

transformer_engine/common/fused_attn/fused_attn.cpp Outdated Show resolved Hide resolved

arbitary fused kernel supports sm86/sm89 after 8.9.3

f195aa4

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 force-pushed the rewang/restrict-fused-attn-to-cc8090 branch from 4acce81 to f195aa4 Compare August 23, 2023 12:25

Skip sm70

08c6989

Signed-off-by: Reese Wang <rewang@nvidia.com>

timmoon10 reviewed Aug 23, 2023

View reviewed changes

transformer_engine/jax/flax/transformer.py Show resolved Hide resolved

timmoon10 mentioned this pull request Aug 23, 2023

Debug CI tests on Ada #397

Merged

cyanguwa reviewed Aug 23, 2023

View reviewed changes

transformer_engine/jax/csrc/modules.cpp Outdated Show resolved Hide resolved

timmoon10 mentioned this pull request Aug 24, 2023

[Paddle] Add parallel support #357

Merged

zlsh80826 added 2 commits August 24, 2023 15:44

Forward is_fused_attn_kernel_available to cpp backend

c723da8

Signed-off-by: Reese Wang <rewang@nvidia.com>

Remove cpp is_fused_attn_available API

d33b2b3

Signed-off-by: Reese Wang <rewang@nvidia.com>

zlsh80826 requested review from cyanguwa and timmoon10 August 24, 2023 16:05

timmoon10 approved these changes Aug 24, 2023

View reviewed changes

ksivaman merged commit 94c57e4 into NVIDIA:main Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handle for non-sm80/sm90 GPUs when using fused attention#393

Error handle for non-sm80/sm90 GPUs when using fused attention#393
ksivaman merged 6 commits intoNVIDIA:mainfrom
zlsh80826:rewang/restrict-fused-attn-to-cc8090

zlsh80826 commented Aug 21, 2023 •

edited

Loading

Uh oh!

zlsh80826 commented Aug 21, 2023

Uh oh!

zlsh80826 commented Aug 21, 2023

Uh oh!

zlsh80826 commented Aug 21, 2023

Uh oh!

timmoon10 left a comment

Uh oh!

Uh oh!

timmoon10 commented Aug 21, 2023

Uh oh!

Uh oh!

zlsh80826 commented Aug 23, 2023

Uh oh!

zlsh80826 commented Aug 23, 2023

Uh oh!

Uh oh!

Uh oh!

zlsh80826 commented Aug 24, 2023

Uh oh!

timmoon10 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

zlsh80826 commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zlsh80826 commented Aug 21, 2023

Uh oh!

zlsh80826 commented Aug 21, 2023

Uh oh!

zlsh80826 commented Aug 21, 2023

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timmoon10 commented Aug 21, 2023

Uh oh!

Uh oh!

zlsh80826 commented Aug 23, 2023

Uh oh!

zlsh80826 commented Aug 23, 2023

Uh oh!

Uh oh!

Uh oh!

zlsh80826 commented Aug 24, 2023

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

zlsh80826 commented Aug 21, 2023 •

edited

Loading