-
Notifications
You must be signed in to change notification settings - Fork 380
[Bugfix] Mangle Single Precision Mathematical Functions of cuda math api #1602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughAdds a protected helper Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-12-15T08:56:23.827ZApplied to files:
📚 Learning: 2025-11-14T07:56:11.098ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/target/codegen_cuda.h (1)
68-68: Add documentation for the new method.The new
ReserveKeywordsAsUnique_()method lacks a doc comment explaining its purpose. Consider adding a brief description of why this wrapper exists and what additional keywords it reserves beyond the base class implementation.📝 Suggested documentation
protected: + /// \brief Reserve C++ and CUDA-specific keywords as unique identifiers. + /// Calls the base class ReserveKeywordsAsUnique() and additionally reserves + /// CUDA math API function names (e.g., sinf, cosf, max, min) to prevent + /// naming conflicts with user-defined identifiers. void ReserveKeywordsAsUnique_();src/target/codegen_cuda.cc (1)
221-310: Consider a data-driven approach for better maintainability (optional).The current implementation with 90+ sequential
ReserveNamecalls is clear but verbose. For improved maintainability, consider using an array of function names with a loop:💡 Optional refactor example
void CodeGenTileLangCUDA::ReserveKeywordsAsUnique_() { CodeGenC::ReserveKeywordsAsUnique(); // Single-precision CUDA math functions static const char* cuda_math_functions[] = { "acosf", "acoshf", "asinf", "asinhf", "atan2f", "atanf", "atanhf", "cbrtf", "ceilf", "copysignf", "cosf", "coshf", "cospif", // ... rest of the list "max", "min" }; for (const char* name : cuda_math_functions) { name_supply_->ReserveName(name); } }This would make it easier to maintain and update the list of reserved names.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/target/codegen_cuda.ccsrc/target/codegen_cuda.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-15T08:56:23.827Z
Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.
Applied to files:
src/target/codegen_cuda.cc
🧬 Code graph analysis (1)
src/target/codegen_cuda.h (1)
src/target/codegen_cuda.cc (2)
ReserveKeywordsAsUnique_(219-311)ReserveKeywordsAsUnique_(219-219)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (2)
src/target/codegen_cuda.cc (2)
3528-3528: LGTM! Correctly routes keyword reservation through the CUDA-specific wrapper.The change from
ReserveKeywordsAsUnique()toReserveKeywordsAsUnique_()ensures that CUDA math API function names are properly reserved when adding functions, preventing naming conflicts with user-defined identifiers.
298-299: These are valid CUDA intrinsic names — no action needed.The CUDA math API includes
voidsincosfandvoidsincospifas legitimate device intrinsics with the exact names shown. These void-returning functions compute sine and cosine together with signatures likevoid __nv_sincosf(float x, float *sptr, float *cptr), storing results through pointers. The "void" prefix is intentional and part of the official CUDA naming convention, not a typo.Likely an incorrect or invalid review comment.
| void CodeGenTileLangCUDA::ReserveKeywordsAsUnique_() { | ||
| CodeGenC::ReserveKeywordsAsUnique(); | ||
| name_supply_->ReserveName("acosf"); | ||
| name_supply_->ReserveName("acoshf"); | ||
| name_supply_->ReserveName("asinf"); | ||
| name_supply_->ReserveName("asinhf"); | ||
| name_supply_->ReserveName("atan2f"); | ||
| name_supply_->ReserveName("atanf"); | ||
| name_supply_->ReserveName("atanhf"); | ||
| name_supply_->ReserveName("cbrtf"); | ||
| name_supply_->ReserveName("ceilf"); | ||
| name_supply_->ReserveName("copysignf"); | ||
| name_supply_->ReserveName("cosf"); | ||
| name_supply_->ReserveName("coshf"); | ||
| name_supply_->ReserveName("cospif"); | ||
| name_supply_->ReserveName("cyl_bessel_i0f"); | ||
| name_supply_->ReserveName("cyl_bessel_i1f"); | ||
| name_supply_->ReserveName("erfcf"); | ||
| name_supply_->ReserveName("erfcinvf"); | ||
| name_supply_->ReserveName("erfcxf"); | ||
| name_supply_->ReserveName("erff"); | ||
| name_supply_->ReserveName("erfinvf"); | ||
| name_supply_->ReserveName("exp10f"); | ||
| name_supply_->ReserveName("exp2f"); | ||
| name_supply_->ReserveName("expf"); | ||
| name_supply_->ReserveName("expm1f"); | ||
| name_supply_->ReserveName("fabsf"); | ||
| name_supply_->ReserveName("fdimf"); | ||
| name_supply_->ReserveName("fdividef"); | ||
| name_supply_->ReserveName("floorf"); | ||
| name_supply_->ReserveName("fmaf"); | ||
| name_supply_->ReserveName("fmaxf"); | ||
| name_supply_->ReserveName("fminf"); | ||
| name_supply_->ReserveName("fmodf"); | ||
| name_supply_->ReserveName("frexpf"); | ||
| name_supply_->ReserveName("hypotf"); | ||
| name_supply_->ReserveName("ilogbf"); | ||
| name_supply_->ReserveName("isfinite"); | ||
| name_supply_->ReserveName("isinf"); | ||
| name_supply_->ReserveName("isnan"); | ||
| name_supply_->ReserveName("j0f"); | ||
| name_supply_->ReserveName("j1f"); | ||
| name_supply_->ReserveName("jnf"); | ||
| name_supply_->ReserveName("ldexpf"); | ||
| name_supply_->ReserveName("lgammaf"); | ||
| name_supply_->ReserveName("llrintf"); | ||
| name_supply_->ReserveName("llroundf"); | ||
| name_supply_->ReserveName("log10f"); | ||
| name_supply_->ReserveName("log1pf"); | ||
| name_supply_->ReserveName("log2f"); | ||
| name_supply_->ReserveName("logbf"); | ||
| name_supply_->ReserveName("logf"); | ||
| name_supply_->ReserveName("lrintf"); | ||
| name_supply_->ReserveName("lroundf"); | ||
| name_supply_->ReserveName("max"); | ||
| name_supply_->ReserveName("min"); | ||
| name_supply_->ReserveName("modff"); | ||
| name_supply_->ReserveName("nanf"); | ||
| name_supply_->ReserveName("nearbyintf"); | ||
| name_supply_->ReserveName("nextafterf"); | ||
| name_supply_->ReserveName("norm3df"); | ||
| name_supply_->ReserveName("norm4df"); | ||
| name_supply_->ReserveName("normcdff"); | ||
| name_supply_->ReserveName("normcdfinvf"); | ||
| name_supply_->ReserveName("normf"); | ||
| name_supply_->ReserveName("powf"); | ||
| name_supply_->ReserveName("rcbrtf"); | ||
| name_supply_->ReserveName("remainderf"); | ||
| name_supply_->ReserveName("remquof"); | ||
| name_supply_->ReserveName("rhypotf"); | ||
| name_supply_->ReserveName("rintf"); | ||
| name_supply_->ReserveName("rnorm3df"); | ||
| name_supply_->ReserveName("rnorm4df"); | ||
| name_supply_->ReserveName("rnormf"); | ||
| name_supply_->ReserveName("roundf"); | ||
| name_supply_->ReserveName("rsqrtf"); | ||
| name_supply_->ReserveName("scalblnf"); | ||
| name_supply_->ReserveName("scalbnf"); | ||
| name_supply_->ReserveName("signbit"); | ||
| name_supply_->ReserveName("voidsincosf"); | ||
| name_supply_->ReserveName("voidsincospif"); | ||
| name_supply_->ReserveName("sinf"); | ||
| name_supply_->ReserveName("sinhf"); | ||
| name_supply_->ReserveName("sinpif"); | ||
| name_supply_->ReserveName("sqrtf"); | ||
| name_supply_->ReserveName("tanf"); | ||
| name_supply_->ReserveName("tanhf"); | ||
| name_supply_->ReserveName("tgammaf"); | ||
| name_supply_->ReserveName("truncf"); | ||
| name_supply_->ReserveName("y0f"); | ||
| name_supply_->ReserveName("y1f"); | ||
| name_supply_->ReserveName("ynf"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's verify the file exists and check the exact lines mentioned
cd /tmp/repo && wc -l src/target/codegen_cuda.cc 2>/dev/null || echo "File not found, searching for it..."
find . -name "codegen_cuda.cc" -type f 2>/dev/null | head -5Repository: tile-ai/tilelang
Length of output: 201
🏁 Script executed:
# Once we locate the file, examine lines 219-311
sed -n '219,311p' src/target/codegen_cuda.cc 2>/dev/null | cat -nRepository: tile-ai/tilelang
Length of output: 4326
🏁 Script executed:
# Search for context around this function and any related comments
rg -A 5 "ReserveKeywordsAsUnique_" src/target/codegen_cuda.cc 2>/dev/null || echo "Pattern not found"Repository: tile-ai/tilelang
Length of output: 566
🏁 Script executed:
# Look for issue #1561 or related comments about naming conflicts
rg "1561|naming.*conflict|ReserveName" src/target/codegen_cuda.cc | head -20Repository: tile-ai/tilelang
Length of output: 850
🏁 Script executed:
# Search for issue #1561 or related discussions in the codebase
rg -i "1561|sincosf|naming conflict|reserve" --type cc --type h src/target/ | grep -i "issue\|1561\|sincosf\|reserve" | head -20Repository: tile-ai/tilelang
Length of output: 86
🏁 Script executed:
# Check if there are any comments or documentation about these reserved names
rg -B 5 -A 2 "ReserveKeywordsAsUnique" src/target/codegen_cuda.cc | head -40Repository: tile-ai/tilelang
Length of output: 642
🏁 Script executed:
# Look for related CUDA function reservations in CodeGenC
find . -name "*.cc" -o -name "*.h" | xargs grep -l "ReserveKeywordsAsUnique" | head -5Repository: tile-ai/tilelang
Length of output: 197
🏁 Script executed:
# Check the parent class CodeGenC to understand what's already reserved
rg -A 50 "void.*CodeGenC::ReserveKeywordsAsUnique" src/target/codegen_c.cc 2>/dev/null | head -80Repository: tile-ai/tilelang
Length of output: 42
🌐 Web query:
CUDA math API sincosf sincospif voidsincosf function names documentation
💡 Result:
Summary of CUDA sin/cos combined functions and docs
-
Prototypes (single-precision, device):
- void sincosf(float x, float *sptr, float *cptr) — computes sin(x) and cos(x) into *sptr and *cptr. [1]
- void sincospif(float x, float sptr, float cptr) — computes sin(pix) and cos(pix) into *sptr and *cptr. [1][2]
-
libdevice/internal name:
- __nv_sincosf(float x, float* sptr, float* cptr) — same functionality; documented in the libdevice user guide. [3]
-
Notes:
- These are device functions (for device code). [1]
- Results/accuracy and behavior can be affected by the --use_fast_math compiler flag; see the CUDA C++ Programming Guide for accuracy tables. [1][2]
- See related single-precision functions sinf(), sinpif(), cospif() in the CUDA Math API docs. [1][2]
Sources: NVIDIA CUDA Math API and libdevice docs. [1][2][3]
References
[1] NVIDIA CUDA Math API (single-precision functions).
[2] NVIDIA CUDA Math API (sincospif / sinpif entries).
[3] NVIDIA libdevice User’s Guide: __nv_sincosf.
🏁 Script executed:
# Check if there are TODOs or issues related to this in the code
rg -i "todo|fixme|issue|precision|sincosf" src/target/codegen_cuda.ccRepository: tile-ai/tilelang
Length of output: 1135
🏁 Script executed:
# Look at HIP codegen to see if it reserves similar functions
rg "ReserveKeywordsAsUnique_|ReserveName" src/target/codegen_hip.cc | head -20Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Check git history or comments about why these specific functions were chosen
git log --oneline -n 20 -- src/target/codegen_cuda.cc 2>/dev/null || echo "Git not available"Repository: tile-ai/tilelang
Length of output: 131
🌐 Web query:
CUDA math API double-precision functions sin cos sqrt exp log asin acos atan
💡 Result:
Short answer: CUDA provides full double-precision math functions for sin, cos, sqrt, exp, log, asin, acos, atan (and many others) as device-callable functions (e.g. device double sin(double), cos(double), sqrt(double), exp(double), log(double), asin(double), acos(double), atan(double)). sincos/sincospi and sinpi/cospi variants are also available. You do not need extra headers to call them from device code; see NVIDIA’s CUDA Math API (double-precision section) for details and accuracy notes. [1][2][3]
References:
- NVIDIA CUDA Math API — Double Precision Mathematical Functions. [1]
- NVIDIA CUDA Math API (log, sqrt, etc., entries). [2]
- CUDA C++ Best Practices note on sincos / fast-math mapping. [3]
🌐 Web query:
CUDA half-precision math functions hsin hcos hsqrt half precision device functions
💡 Result:
Short answer: CUDA provides half-precision device math functions such as hsin, hcos, hsqrt (and many others) as device intrinsics returning __half; they are declared in cuda_fp16.h and operate in round-to-nearest-even mode. See the CUDA Math API reference for full list and semantics. [1][2]
Key points / prototypes
- Header: #include <cuda_fp16.h>. [1][2]
- Example prototypes (device-only):
- device __half hsin(const __half a);
- device __half hcos(const __half a);
- device __half hsqrt(const __half a);
- device __half hrsqrt(const __half a);
- (plus many others: hexp, hlog, hrcp, hceil, hfloor, hrint, htrunc, atomicAdd for __half, etc.). [1][2]
Behavior notes
- Operations return __half and use round-to-nearest-even rounding. [1][2]
- NaN/Inf and special-case behavior (e.g., hsin(±∞) → NaN, hsqrt(x<0) → NaN, hrsqrt(±0) → ±∞) are documented per-function in the CUDA Math API. [2]
If you want, I can:
- paste a minimal device-kernel example using hsin/hsqrt, or
- list the full set of half math intrinsics and their per-function special-case rules.
Fix incorrect CUDA function names and consider double/half-precision functions.
Lines 80-81 contain incorrect function names: voidsincosf and voidsincospif should be sincosf and sincospif. The current typos prevent actual naming conflicts with the real CUDA functions.
Additionally, the implementation reserves only single-precision math functions. Consider also reserving:
- Double-precision equivalents (e.g.,
sin,cos,sqrt,asin,acos,atan) which exist as CUDA device functions - Half-precision functions (e.g.,
hsin,hcos,hsqrt,hexp) fromcuda_fp16.h
These would provide comprehensive coverage of CUDA math API intrinsics that could collide with user-defined identifiers.
🤖 Prompt for AI Agents
In src/target/codegen_cuda.cc around lines 219 to 311, the reserved keyword list
contains typos "voidsincosf" and "voidsincospif" (should be "sincosf" and
"sincospif"); replace those entries with the correct CUDA function names and add
reservations for common double-precision and half-precision CUDA math intrinsics
(e.g., sin, cos, sqrt, asin, acos, atan, exp, log and their f- and half-
counterparts where applicable, including names from cuda_fp16.h like hsin, hcos,
hsqt/hsqrt-style names) by calling name_supply_->ReserveName(...) for each new
identifier so the name supply prevents collisions with device math functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/target/codegen_cuda.cc (1)
219-407: Consider adding half-precision math function names.For comprehensive coverage of CUDA math API intrinsics, consider also reserving half-precision function names from
cuda_fp16.hsuch ashsin,hcos,hsqrt,hexp,hlog,hrsqrt, etc. These could also collide with user-defined identifiers.Example additions
After line 311, you could add:
+ // reserve half precision mathematical functions + name_supply_->ReserveName("hsin"); + name_supply_->ReserveName("hcos"); + name_supply_->ReserveName("hsqrt"); + name_supply_->ReserveName("hrsqrt"); + name_supply_->ReserveName("hexp"); + name_supply_->ReserveName("hlog"); + name_supply_->ReserveName("hceil"); + name_supply_->ReserveName("hfloor"); + name_supply_->ReserveName("htrunc"); + name_supply_->ReserveName("hrint"); + // ... other half-precision functions as needed +
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/target/codegen_cuda.cc
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-15T08:56:23.827Z
Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.
Applied to files:
src/target/codegen_cuda.cc
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
src/target/codegen_cuda.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
|
|
||
| void CodeGenTileLangCUDA::ReserveKeywordsAsUnique_() { | ||
| CodeGenC::ReserveKeywordsAsUnique(); | ||
| // skip single precision mathematical functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix misleading comments.
Lines 221 and 313 say "skip" but the code actually reserves these function names to prevent collisions. The comments should read something like "reserve single/double precision mathematical functions" instead.
🔎 Proposed fix
- // skip single precision mathematical functions
+ // reserve single precision mathematical functions
name_supply_->ReserveName("acosf");- // skip double precision mathematical functions
+ // reserve double precision mathematical functions
name_supply_->ReserveName("acos");Also applies to: 313-313
🤖 Prompt for AI Agents
In src/target/codegen_cuda.cc around lines 221 and 313, the comments currently
say "skip single/double precision mathematical functions" but the code actually
reserves those function names to avoid collisions; update the comments to read
something like "reserve single-precision mathematical function names to prevent
collisions" at line 221 and "reserve double-precision mathematical function
names to prevent collisions" at line 313 (or equivalent wording clarifying
reservation rather than skipping).
Fix #1561
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.