Make RTC compatible with CUDA enhanced compatibility #19364

ptrendx · 2020-10-16T22:41:12Z

Description

Starting with CUDA 11.1 it is possible to run programs compiled with newer CUDA toolkit with older driver (as long as the major version is the same, e.g. CUDA 11.1 works with CUDA 11.0 driver) without the compat library. This requires a few changes to API used by nvRTC however, which are addressed by this PR.

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
Code is well-documented

Comments

Change was tested in internal CI using CUDA 11.1 toolkit and Titan RTX with 450.80.02 driver.

mxnet-bot · 2020-10-16T22:41:14Z

Hey @ptrendx , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, windows-cpu, centos-gpu, miscellaneous, website, unix-gpu, edge, windows-gpu, centos-cpu, clang, unix-cpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

ptrendx · 2020-10-16T22:42:42Z

Note: this does not touch the legacy RTC part (https://mxnet.apache.org/versions/1.6/api/python/docs/api/mxnet/rtc/index.html) - what is the plan for it @szha?

szha · 2020-10-17T20:45:48Z

I think we need to continue to support mx.rtc

ptrendx · 2020-10-18T06:10:19Z

Are there any people using it? The interface is not great, since the CudaModule from there is not even an operator so can't be used in a model. We could reuse the recent RTC stuff to make it much better experience (and actually potentially pretty useful).

That said, this PR does not touch that functionality (because the compilation options there are set by the user). I could make it so if you specify the proper option (--gpu-architecture=sm_XX insteaf of compute_XX yourself) it gets the cubin instead of ptx so it works with enhanced compatibility.

szha · 2020-10-18T13:41:48Z

Agreed. In 2.0 we can change the interface.

ptrendx · 2020-10-20T17:33:44Z

@mxnet-bot run ci [centos-cpu, centos-gpu, edge, miscellaneous]

mxnet-bot · 2020-10-20T17:33:52Z

Jenkins CI successfully triggered : [edge, centos-cpu, miscellaneous, centos-gpu]

ptrendx · 2020-10-23T18:41:46Z

@mxnet-bot run ci [centos-cpu, unix-gpu, edge, website]

mxnet-bot · 2020-10-23T18:41:54Z

Jenkins CI successfully triggered : [edge, website, unix-gpu, centos-cpu]

DickJC123

For others interested in understanding better the motivation behind this PR, I suggest https://docs.nvidia.com/deploy/pdf/CUDA_Compatibility.pdf . One paragraph worth repeating from that doc is:

To use other CUDA APIs introduced in a minor release (that require a new
driver), one would have to implement fallbacks or fail gracefully. This situation
is not different from what is available today where developers use macros to
compile out features based on CUDA versions. Users should refer to the CUDA
headers and documentation for new CUDA APIs introduced in a release.

Thus, it's fair to use an 11.1 feature that is supported by both 11.1 and 11.0 kernel-mode drivers. Before using an 11.1 feature that requires an 11.1 kernel-mode driver, one should check dynamically for that feature's presence at runtime, as suggested in the document section 3.2 "Handling New CUDA Features." This is particularly important to pay attention to while the upstream CI testing has no enhanced-compatibility build.

DickJC123 · 2020-10-30T00:24:27Z

src/common/cuda/rtc.cc

+  const auto getSize = use_cubin ? nvrtcGetCUBINSize : nvrtcGetPTXSize;
+  const auto getFunc = use_cubin ? nvrtcGetCUBIN : nvrtcGetPTX;


FWIW, while nvrtcGetCUBINSize() and nvrtcGetCUBIN() are not yet in the nvrtc docs, their use is described in https://docs.nvidia.com/deploy/cuda-compatibility/

src/common/cuda/rtc.cc

src/common/rtc.cc

ptrendx · 2020-11-03T17:48:51Z

@mxnet-bot run ci [unix-cpu]

mxnet-bot · 2020-11-03T17:48:58Z

Jenkins CI successfully triggered : [unix-cpu]

* Guard RTC better * Use nvrtcGetCUBIN * Fix lint * Enable cubin loading in legacy rtc path * Fixes from review

ptrendx added 3 commits October 16, 2020 15:14

Guard RTC better

4697414

Use nvrtcGetCUBIN

8ff0b9e

Fix lint

869aaa5

ptrendx requested a review from DickJC123 October 16, 2020 22:41

lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Oct 16, 2020

lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 16, 2020

Enable cubin loading in legacy rtc path

e882040

ptrendx requested a review from eric-haibin-lin as a code owner October 19, 2020 16:51

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 19, 2020

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 23, 2020

DickJC123 reviewed Oct 30, 2020

View reviewed changes

Fixes from review

66df3b7

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Nov 2, 2020

lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 2, 2020

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 3, 2020

DickJC123 merged commit b33fbd1 into apache:master Nov 3, 2020

DickJC123 mentioned this pull request Jul 12, 2021

[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443

Merged

3 tasks

		const auto getSize = use_cubin ? nvrtcGetCUBINSize : nvrtcGetPTXSize;
		const auto getFunc = use_cubin ? nvrtcGetCUBIN : nvrtcGetPTX;

Make RTC compatible with CUDA enhanced compatibility #19364

Make RTC compatible with CUDA enhanced compatibility #19364

Uh oh!

Conversation

ptrendx commented Oct 16, 2020

Description

Checklist

Essentials

Comments

Uh oh!

mxnet-bot commented Oct 16, 2020

Uh oh!

ptrendx commented Oct 16, 2020

Uh oh!

szha commented Oct 17, 2020

Uh oh!

ptrendx commented Oct 18, 2020

Uh oh!

szha commented Oct 18, 2020

Uh oh!

ptrendx commented Oct 20, 2020

Uh oh!

mxnet-bot commented Oct 20, 2020

Uh oh!

ptrendx commented Oct 23, 2020

Uh oh!

mxnet-bot commented Oct 23, 2020

Uh oh!

DickJC123 left a comment

Choose a reason for hiding this comment

Uh oh!

DickJC123 Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ptrendx commented Nov 3, 2020

Uh oh!

mxnet-bot commented Nov 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants