[CUDA] Add an option for profiling cuda kernels #16061
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.






Several Nvidia tools such as Nsight Systems and Nsight Compute can be used for profiling cuda kernels. NVIDIA Nsight Systems collects system-wide information about your program and GPU events and might help you to understand possible bottlenecks in your topology. To profile concrete Cuda kernel, NVIDIA Nsight Compute can be used.
If you try to profile cuda kernel from TVM with Nsight Compute without this patch, then you see only SASS instructions instead of the source code. It is useful, but sometimes it might be easier to analyze generated cuda code instead of instructions. In this patch, a new pass config option was added. By using option
cuda.kernels_output_dir, you can specify the directory where cuda source code should be stored after the build. Also, in the case of using this option, cuda kernels will be compiled with option-lineinfowhich is an equivalent of-goption in GCC. When the cuda kernels were compiled with-lineinfooption, then Nsight compute can map profile information to the source code. One important note, that to get the source code in Nsight Compute, you have to set parameterImport Sourceduring profiling session configuration equals toYes.