Skip to content

Error handle for non-sm80/sm90 GPUs when using fused attention#393

Merged
ksivaman merged 6 commits intoNVIDIA:mainfrom
zlsh80826:rewang/restrict-fused-attn-to-cc8090
Aug 25, 2023
Merged

Error handle for non-sm80/sm90 GPUs when using fused attention#393
ksivaman merged 6 commits intoNVIDIA:mainfrom
zlsh80826:rewang/restrict-fused-attn-to-cc8090

Conversation

@zlsh80826
Copy link
Collaborator

@zlsh80826 zlsh80826 commented Aug 21, 2023

cuDNN max 512 seqlen fused kernel only supports sm80 and sm90, and arbitrary seqlen requires 8.9.3 to support all CC >= 80.

  • Disable max 512 seqlen fused kernel when CC is not 80 or 90
  • Update is_fused_attn_available API for different attention setup

@zlsh80826 zlsh80826 force-pushed the rewang/restrict-fused-attn-to-cc8090 branch from 8b81606 to 1cc7ee8 Compare August 21, 2023 15:46
@zlsh80826
Copy link
Collaborator Author

/te-ci

Signed-off-by: Reese Wang <rewang@nvidia.com>
@zlsh80826 zlsh80826 force-pushed the rewang/restrict-fused-attn-to-cc8090 branch from 1cc7ee8 to 1a3f728 Compare August 21, 2023 15:49
@zlsh80826 zlsh80826 changed the title Error handle for non-sm80/sm90 when using fused attention Error handle for non-sm80/sm90 GPUs when using fused attention Aug 21, 2023
@zlsh80826
Copy link
Collaborator Author

/te-ci

@zlsh80826
Copy link
Collaborator Author

@timmoon10 @ksivaman The CI reports "no space left on device" when initializing the container, could you take a look? Thanks

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ksivaman @ptrendx This is relevant to our discussion on the common headers at #382 (comment).

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@timmoon10
Copy link
Collaborator

BTW, no reason to worry about the GitHub CI failures as long as the GitLab tests all pass. The provided nodes don't seem to be beefy enough to handle the current PyTorch container and I'm still thinking about workarounds.

Signed-off-by: Reese Wang <rewang@nvidia.com>
@zlsh80826 zlsh80826 force-pushed the rewang/restrict-fused-attn-to-cc8090 branch from 4acce81 to f195aa4 Compare August 23, 2023 12:25
@zlsh80826
Copy link
Collaborator Author

/te-ci

Signed-off-by: Reese Wang <rewang@nvidia.com>
@zlsh80826
Copy link
Collaborator Author

/te-ci

@timmoon10 timmoon10 mentioned this pull request Aug 23, 2023
Signed-off-by: Reese Wang <rewang@nvidia.com>
Signed-off-by: Reese Wang <rewang@nvidia.com>
@zlsh80826
Copy link
Collaborator Author

/te-ci

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ksivaman ksivaman merged commit 94c57e4 into NVIDIA:main Aug 25, 2023
janekb04 pushed a commit to janekb04/TransformerEngine that referenced this pull request Sep 1, 2023
…A#393)

* Fused attention kernel only supports sm80 and sm90

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Update transformer_engine/jax/csrc/modules.cpp

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* arbitary fused kernel supports sm86/sm89 after 8.9.3

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Skip sm70

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Forward is_fused_attn_kernel_available to cpp backend

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Remove cpp is_fused_attn_available API

Signed-off-by: Reese Wang <rewang@nvidia.com>

---------

Signed-off-by: Reese Wang <rewang@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
RuiWang1998 pushed a commit to RuiWang1998/TransformerEngine that referenced this pull request Sep 11, 2023
…A#393)

* Fused attention kernel only supports sm80 and sm90

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Update transformer_engine/jax/csrc/modules.cpp

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* arbitary fused kernel supports sm86/sm89 after 8.9.3

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Skip sm70

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Forward is_fused_attn_kernel_available to cpp backend

Signed-off-by: Reese Wang <rewang@nvidia.com>

* Remove cpp is_fused_attn_available API

Signed-off-by: Reese Wang <rewang@nvidia.com>

---------

Signed-off-by: Reese Wang <rewang@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Rui Wang <rui@helixon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments