Accomodate name change of printed kernel files by jacobhinkle · Pull Request #3778 · NVIDIA/Fuser

jacobhinkle · 2025-01-29T00:08:42Z

Generated CUDA files were previously named like __tmp_kernel_32.cu and compare_codegen.sh matched that pattern when copying kernels. That broke codediff since these filenames were changed in #3468. This PR fixes compare_codegen.sh to match __tmp_*.cu instead. It also fixes the outputs of printed PTX and cubin files so that they use the same base filenames: currently that is the previous naming scheme __tmp_kernel_32.cu (if using NVFUSER_ENABLE=static_fusion_count).

This should fix the problems seen recently in codediff CI jobs.

Generated CUDA files were previously named like `__tmp_kernel_32.cu` and `compare_codegen.sh` matched that pattern when copying kernels. This PR changes that pattern to `__tmp_nvfuser_*.cu` instead, since these filenames were changed in #3468. This should fix the problems seen recently in codediff CI jobs.

jacobhinkle · 2025-01-29T00:09:03Z

!test --diff

github-actions · 2025-01-29T00:10:24Z

PR Reviewer Guide 🔍

(Review updated until commit `6a171b4`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
⚡ Recommended focus areas for review Kernel Name Change The PR changes the pattern of generated CUDA file names from `__tmp_kernel_.cu` to `__tmp_nvfuser_.cu`. This change should be verified to ensure it doesn't break any existing functionality. const std::string& kernel_name, const std::string& full_src_code) { std::stringstream ss; ss << "__tmp_" << kernel_name << ".cu"; std::string name = ss.str(); Script Update The `compare_codegen.sh` script has been updated to reflect the new kernel name pattern. The changes should be reviewed to ensure they are correct and don't introduce any new issues. find . -maxdepth 1 $ -name '__tmp_.cu' -o -name '__tmp_.ptx' $ -exec mv '{}' "$1" \; }

jacobhinkle · 2025-01-29T00:43:48Z

!test --diff

jacobhinkle · 2025-01-29T02:42:02Z

I believe CI failed in this instance because the commit on main still produces the mismatched CUDA/PTX filenames. I am reasonably confident that this will fix the CI codediff issues.

naoyam · 2025-01-29T04:21:00Z

This is a bit concerning. Will see if it persists.

00:14:12 [ RUN      ] WelfordReduction.Test/dtype_std__complex_double__redu_1048576_iter_320_axis_0
00:14:12 To reproduce: NVFUSER_TEST_RANDOM_SEED=1738112846 NVFUSER_TEST_ATEN_RANDOM_SEED=0 nvfuser_tests --gtest_filter='WelfordReduction.Test/dtype_std__complex_double__redu_1048576_iter_320_axis_0'
00:14:12 unknown file: Failure
00:14:12 C++ exception with description "CUDA out of memory. Tried to allocate 5.00 GiB. GPU 0 has a total capacity of 79.14 GiB of which 4.56 GiB is free. Process 1677709 has 60.14 GiB memory in use. Process 1681995 has 14.43 GiB memory in use. Of the allocated memory 5.01 GiB is allocated by PyTorch, and 57.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
00:14:12 Exception raised from malloc at /opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp:1333 (most recent call first):
00:14:12 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f99c18a1818 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)

This uses the function name as the filename. Since #3778 we have been generating filenames like `__tmp_.ptx` since the `kernel_name` is not yet filled when cubin and ptx files are output. Now we will get `__tmp_nvfuser_none_f0_c0_r0_g0.{cu,cubin,ptx}` or if `NVFUSER_ENABLE=static_fusion_count` is provided `__tmp_nvfuser_1.{cu,cubin,ptx}`.

jacobhinkle requested review from naoyam and xwang233 January 29, 2025 00:08

jacobhinkle added 2 commits January 28, 2025 19:32

Match __tmp_*.cu

bedaa25

Unify .ptx, .cubin, and .cu filenames

6a171b4

jacobhinkle marked this pull request as ready for review January 29, 2025 02:42

naoyam approved these changes Jan 29, 2025

View reviewed changes

naoyam merged commit 76fec33 into main Jan 29, 2025
44 of 52 checks passed

naoyam deleted the codediff_kernel_filename branch January 29, 2025 04:21

jacobhinkle mentioned this pull request Jan 29, 2025

Fix names of printed ptx and cubin files #3788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accomodate name change of printed kernel files#3778

Accomodate name change of printed kernel files#3778
naoyam merged 3 commits intomainfrom
codediff_kernel_filename

jacobhinkle commented Jan 29, 2025 •

edited

Loading

Uh oh!

jacobhinkle commented Jan 29, 2025

Uh oh!

github-actions bot commented Jan 29, 2025 •

edited

Loading

Uh oh!

jacobhinkle commented Jan 29, 2025

Uh oh!

jacobhinkle commented Jan 29, 2025

Uh oh!

naoyam commented Jan 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacobhinkle commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented Jan 29, 2025

Uh oh!

github-actions bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 6a171b4)

Uh oh!

jacobhinkle commented Jan 29, 2025

Uh oh!

jacobhinkle commented Jan 29, 2025

Uh oh!

naoyam commented Jan 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented Jan 29, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025 •

edited

Loading

(Review updated until commit `6a171b4`)