-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Related to #2075. Possibly related to #1598, if Clang-CUDA requires a recent NVIDIA CUDA Toolkit to be installed.
Test coverage
Currently, we have no test coverage for Clang-CUDA. If this scenario is important to some users (as it appears to be) and is likely to be damaged as we modify preprocessor logic for Clang and CUDA separately, we should have test coverage to prevent major/obvious regressions.
__CUDACC__ preprocessor logic
Also, if this is important, we should audit the codebase for places where we're testing __CUDACC__ but Clang-CUDA could handle the normal codepath instead of needing the workaround codepath. (I suspect that Clang-CUDA can handle the normal codepath when "front-end stuff" is involved, but that we need the workaround codepath when "codegen intrinsic stuff" is involved.)
Current examples:
Already patched
Lines 428 to 432 in 303df3d
#if defined(__CUDACC__) && !defined(__clang__) // TRANSITION, VSO-568006 #define _NODISCARD_FRIEND friend #else // ^^^ workaround ^^^ / vvv no workaround vvv #define _NODISCARD_FRIEND _NODISCARD friend #endif // TRANSITION, VSO-568006
Already patched by @CaseyCarter in<random>: Implement LWG-3519 #2208! 🎉 (This is a good example of "front-end stuff".)Lines 586 to 592 in 303df3d
#ifndef _ALLOW_COMPILER_AND_STL_VERSION_MISMATCH #ifdef __CUDACC__ #if __CUDACC_VER_MAJOR__ < 10 \ || (__CUDACC_VER_MAJOR__ == 10 \ && (__CUDACC_VER_MINOR__ < 1 || (__CUDACC_VER_MINOR__ == 1 && __CUDACC_VER_BUILD__ < 243))) #error STL1002: Unexpected compiler version, expected CUDA 10.1 Update 2 or newer. #endif // ^^^ old CUDA ^^^
This is what handle Clang-CUDA #2075 is patching.
No action necessary
Lines 15 to 16 in 303df3d
#if !defined(_M_CEE) && !defined(__clang__) && !defined(__CUDACC__) && !defined(__INTEL_COMPILER) #define _HAS_CMATH_INTRINSICS 1
Codegen intrinsics, excludes both Clang and CUDA, no action necessary.Lines 19 to 21 in 303df3d
#if (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE) && !defined(__CUDACC__) \ && !defined(__INTEL_COMPILER) && !defined(__clang__) // TRANSITION, LLVM-51488 #define _HAS_NEON_INTRINSICS 1 Lines 1054 to 1056 in 303df3d
#if (defined(_M_IX86) || (defined(_M_X64) && !defined(_M_ARM64EC))) && !defined(_M_CEE_PURE) && !defined(__CUDACC__) \ && !defined(__INTEL_COMPILER) #define _HAS_TZCNT_BSF_INTRINSICS 1 Lines 1157 to 1159 in 303df3d
#if (defined(_M_IX86) || (defined(_M_X64) && !defined(_M_ARM64EC))) && !defined(_M_CEE_PURE) && !defined(__CUDACC__) \ && !defined(__INTEL_COMPILER) #define _HAS_POPCNT_INTRINSICS 1
These are all codegen intrinsics.Lines 636 to 639 in 303df3d
#if defined(_IS_ASSIGNABLE_NOCHECK_SUPPORTED) && !defined(__CUDACC__) template <class _Ty> struct _Is_copy_assignable_no_precondition_check : bool_constant<__is_assignable_no_precondition_check( Lines 661 to 664 in 303df3d
#if defined(_IS_ASSIGNABLE_NOCHECK_SUPPORTED) && !defined(__CUDACC__) template <class _Ty> struct _Is_move_assignable_no_precondition_check : bool_constant<__is_assignable_no_precondition_check(add_lvalue_reference_t<_Ty>, _Ty)> {};
This front-end__is_assignable_no_precondition_checkmakes MSVC behave like Clang, thus I believe there's no need to investigate making Clang-CUDA take this path.Lines 438 to 441 in 303df3d
#elif defined(__CUDACC__) // TRANSITION, CUDA - warning: attribute namespace "msvc" is unrecognized #define _MSVC_KNOWN_SEMANTICS #elif __has_cpp_attribute(msvc::known_semantics) #define _MSVC_KNOWN_SEMANTICS [[msvc::known_semantics]]
This is for MSVC-specific type trait optimizations. No reason to make Clang-CUDA use this.Lines 556 to 563 in 303df3d
#ifdef __clang__ #define _STL_DISABLE_DEPRECATED_WARNING \ _Pragma("clang diagnostic push") \ _Pragma("clang diagnostic ignored \"-Wdeprecated-declarations\"") #elif defined(__CUDACC__) || defined(__INTEL_COMPILER) #define _STL_DISABLE_DEPRECATED_WARNING \ __pragma(warning(push)) \ __pragma(warning(disable : 4996)) // was declared deprecated
We already test for Clang before CUDA here, no action necessary. (Ditto for the restore macro below.)
Possible enhancements
Lines 858 to 859 in 303df3d
#ifdef __CUDACC__ // TRANSITION, CUDA #define _USE_FUNCTION_INT_0_SFINAE 0
Front-end SFINAE, I suspect that Clang-CUDA doesn't need this workaround. (Also applies to Useint = 0SFINAE in<memory>to improve compiler throughput #2124.)Lines 36 to 40 in 303df3d
#ifdef __CUDACC__ #define _CONSTEXPR_BIT_CAST inline #else // ^^^ workaround ^^^ / vvv no workaround vvv #define _CONSTEXPR_BIT_CAST constexpr #endif // ^^^ no workaround ^^^ Lines 66 to 73 in 303df3d
_NODISCARD _CONSTEXPR_BIT_CAST _To _Bit_cast(const _From& _Val) noexcept { #ifdef __CUDACC__ _To _To_obj; // assumes default-init _CSTD memcpy(_STD addressof(_To_obj), _STD addressof(_Val), sizeof(_To)); return _To_obj; #else // ^^^ workaround ^^^ / vvv no workaround vvv return __builtin_bit_cast(_To, _Val); #endif // ^^^ no workaround ^^^
Clang-CUDA might be capable of using__builtin_bit_cast.Lines 450 to 451 in 303df3d
#elif defined(__CUDACC__) || defined(__INTEL_COMPILER) #define _HAS_CONDITIONAL_EXPLICIT 0 // TRANSITION, CUDA/ICC
Front-end stuff: Clang-CUDA likely supports "conditionalexplicit" in all Standard modes, so making it use the modern path would be good (as we already do for vanilla Clang).