[build] fix computer capability arch flags, add PTX, handle PTX by stas00 · Pull Request #591 · deepspeedai/DeepSpeed

stas00 · 2020-12-09T04:02:21Z

Now that pytorch/pytorch#48891 has been merged I have a better understanding of how the arch code is generated.

It looks like your original code wasn't producing it correctly, as -gencode=arch=compute_{cc},code=compute_{cc} generates a PTX code and not the normal compute code. Looking at

https://github.com/pytorch/pytorch/blob/4434c07a2c0ba4debc6330063546f600aee8deb3/torch/utils/cpp_extension.py#L1556-L1563

It needed to be -gencode=arch=compute_{cc},code=sm_{cc} in your code in first place. notice that the last bit is code=sm_ and not code=compute_.

So as a follow up to #578, this PR

replaces s/code=compute_/code=sm_/ for normal archs
adds +PTX for the highest arch in the jit-mode
correctly handles +PTX (especially in the case of TORCH_CUDA_ARCH_LIST which was missing from my previous PR)

Do note, I'm new to all this so please kindly verify that I'm doing the right thing. I did compile and tested the resulting binary on my code only.

The weird syntax for PTX I copied from pytorch (link above), https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#just-in-time-compilation explains that this is how PTX is encoded in nvcc flags. Quote:

By specifying a virtual code architecture instead of a real GPU, nvcc postpones the assembly of PTX code until application runtime, at which the target GPU is exactly known. For instance, the command below allows generation of exactly matching GPU binary code, when the application is launched on an sm_50 or later architecture.

nvcc x.cu --gpu-architecture=compute_50 --gpu-code=compute_50

So '-gencode=arch=compute_{cc},code=compute_{cc}')` tells nvcc to enable PTX for that arch. Very confusing.

Thanks.

RezaYazdaniAminabadi · 2020-12-09T04:40:51Z

-            args.append(f'-gencode=arch=compute_{cc},code=compute_{cc}')
+            args.append(f'-gencode=arch=compute_{cc},code=sm_{cc}')
+            if cc.endswith('+PTX'):
+                args.append(f'-gencode=arch=compute_{cc},code=compute_{cc}')


Shouldn't we use the compute-capability (cc) number alone, rather than concatenated with '+PTX'?

edit: I misunderstood what you said, you were pointing at a bug - fixed now. thank you!

But I will leave what I originally wrote below if it helps to explain things in general.

There are 2 unrelated things.

The code you quoted above does the right thing (edit: after I fixed the bug below) if a user passes TORCH_CUDA_ARCH_LIST that contains +PTX, as my previous PR missed that part (I had no clue how +PTX was converted to compile flags at that time, but now I do)

The question is whether to add +PTX for jit_mode or not here:
https://github.com/microsoft/DeepSpeed/blob/03111cef09b33a25792246785ab0ded68be1733c/op_builder/builder.py#L246
I added it to sync with pytorch CUDAExtension implementation, which is how it'll do it once pytorch-1.8 is released. But you can choose to not include it. I have no idea how and when jit_mode is used to tell whether it's needed or not.

So to conclude the code you quoted above does the right thing if any of the ccs contains '+PTX, which may come from user-defined TORCH_CUDA_ARCH_LIST`- so it's a must. But wrt to (2) you can choose not to include it.
If you do - please let me know and I will remove https://github.com/microsoft/DeepSpeed/blob/03111cef09b33a25792246785ab0ded68be1733c/op_builder/builder.py#L246

Let me know.

oh, I see what you meant, there is a bug in my code - sorry - I forgot to check the result of test runs - thank you for flagging that, @RezaYazdaniAminabadi.

RezaYazdaniAminabadi · 2020-12-09T04:44:40Z

Thanks @stas00 for the fix. 👍

fix arch flags, add PTX

03111ce

stas00 requested review from RezaYazdaniAminabadi, ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, samyam and tjruwase as code owners December 9, 2020 04:02

stas00 changed the title ~~fix arch flags, add PTX~~ [build] fix computer capability arch flags, add PTX, handle PTX Dec 9, 2020

RezaYazdaniAminabadi reviewed Dec 9, 2020

View reviewed changes

bug fix

2803061

RezaYazdaniAminabadi approved these changes Dec 9, 2020

View reviewed changes

jeffra added 2 commits December 11, 2020 09:58

Merge branch 'master' into correct-archs

63af698

Merge branch 'master' into correct-archs

fb22dd1

jeffra merged commit 8a184b6 into deepspeedai:master Dec 11, 2020

stas00 deleted the correct-archs branch December 11, 2020 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[build] fix computer capability arch flags, add PTX, handle PTX#591

[build] fix computer capability arch flags, add PTX, handle PTX#591
jeffra merged 4 commits intodeepspeedai:masterfrom
stas00:correct-archs

stas00 commented Dec 9, 2020 •

edited

Loading

Uh oh!

RezaYazdaniAminabadi Dec 9, 2020 •

edited

Loading

Uh oh!

stas00 Dec 9, 2020 •

edited

Loading

Uh oh!

stas00 Dec 9, 2020 •

edited

Loading

Uh oh!

RezaYazdaniAminabadi commented Dec 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stas00 commented Dec 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RezaYazdaniAminabadi Dec 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Dec 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Dec 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RezaYazdaniAminabadi commented Dec 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stas00 commented Dec 9, 2020 •

edited

Loading

RezaYazdaniAminabadi Dec 9, 2020 •

edited

Loading

stas00 Dec 9, 2020 •

edited

Loading

stas00 Dec 9, 2020 •

edited

Loading