Skip to content

Accomodate name change of printed kernel files#3778

Merged
naoyam merged 3 commits intomainfrom
codediff_kernel_filename
Jan 29, 2025
Merged

Accomodate name change of printed kernel files#3778
naoyam merged 3 commits intomainfrom
codediff_kernel_filename

Conversation

@jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Jan 29, 2025

Generated CUDA files were previously named like __tmp_kernel_32.cu and compare_codegen.sh matched that pattern when copying kernels. That broke codediff since these filenames were changed in #3468. This PR fixes compare_codegen.sh to match __tmp_*.cu instead. It also fixes the outputs of printed PTX and cubin files so that they use the same base filenames: currently that is the previous naming scheme __tmp_kernel_32.cu (if using NVFUSER_ENABLE=static_fusion_count).

This should fix the problems seen recently in codediff CI jobs.

Generated CUDA files were previously named like `__tmp_kernel_32.cu` and
`compare_codegen.sh` matched that pattern when copying kernels. This PR
changes that pattern to `__tmp_nvfuser_*.cu` instead, since these
filenames were changed in #3468.

This should fix the problems seen recently in codediff CI jobs.
@jacobhinkle
Copy link
Collaborator Author

!test --diff

@github-actions
Copy link

github-actions bot commented Jan 29, 2025

PR Reviewer Guide 🔍

(Review updated until commit 6a171b4)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
⚡ Recommended focus areas for review

Kernel Name Change

The PR changes the pattern of generated CUDA file names from __tmp_kernel_*.cu to __tmp_nvfuser_*.cu. This change should be verified to ensure it doesn't break any existing functionality.

  const std::string& kernel_name,
  const std::string& full_src_code) {
std::stringstream ss;
ss << "__tmp_" << kernel_name << ".cu";
std::string name = ss.str();
Script Update

The compare_codegen.sh script has been updated to reflect the new kernel name pattern. The changes should be reviewed to ensure they are correct and don't introduce any new issues.

    find . -maxdepth 1 \( -name '__tmp_*.cu' -o -name '__tmp_*.ptx' \) -exec mv '{}' "$1" \;
}

@jacobhinkle
Copy link
Collaborator Author

!test --diff

@jacobhinkle
Copy link
Collaborator Author

I believe CI failed in this instance because the commit on main still produces the mismatched CUDA/PTX filenames. I am reasonably confident that this will fix the CI codediff issues.

@jacobhinkle jacobhinkle marked this pull request as ready for review January 29, 2025 02:42
@naoyam
Copy link
Collaborator

naoyam commented Jan 29, 2025

This is a bit concerning. Will see if it persists.

00:14:12 [ RUN      ] WelfordReduction.Test/dtype_std__complex_double__redu_1048576_iter_320_axis_0
00:14:12 To reproduce: NVFUSER_TEST_RANDOM_SEED=1738112846 NVFUSER_TEST_ATEN_RANDOM_SEED=0 nvfuser_tests --gtest_filter='WelfordReduction.Test/dtype_std__complex_double__redu_1048576_iter_320_axis_0'
00:14:12 unknown file: Failure
00:14:12 C++ exception with description "CUDA out of memory. Tried to allocate 5.00 GiB. GPU 0 has a total capacity of 79.14 GiB of which 4.56 GiB is free. Process 1677709 has 60.14 GiB memory in use. Process 1681995 has 14.43 GiB memory in use. Of the allocated memory 5.01 GiB is allocated by PyTorch, and 57.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
00:14:12 Exception raised from malloc at /opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp:1333 (most recent call first):
00:14:12 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f99c18a1818 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)

@naoyam naoyam merged commit 76fec33 into main Jan 29, 2025
44 of 52 checks passed
@naoyam naoyam deleted the codediff_kernel_filename branch January 29, 2025 04:21
jacobhinkle added a commit that referenced this pull request Jan 29, 2025
This uses the function name as the filename. Since #3778 we have been
generating filenames like `__tmp_.ptx` since the `kernel_name` is not
yet filled when cubin and ptx files are output. Now we will get
`__tmp_nvfuser_none_f0_c0_r0_g0.{cu,cubin,ptx}` or if
`NVFUSER_ENABLE=static_fusion_count` is provided
`__tmp_nvfuser_1.{cu,cubin,ptx}`.
naoyam pushed a commit that referenced this pull request Jan 29, 2025
This uses the function name as the filename. Since #3778 we have been
generating filenames like `__tmp_.ptx` since the `kernel_name` is not
yet filled when cubin and ptx files are output. Now we will get
`__tmp_nvfuser_none_f0_c0_r0_g0.{cu,cubin,ptx}` or if
`NVFUSER_ENABLE=static_fusion_count` is provided
`__tmp_nvfuser_1.{cu,cubin,ptx}`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants