cuda::std::complex specializations for half and bfloat#1140
Merged
miscco merged 37 commits intoNVIDIA:mainfrom Mar 12, 2024
Merged
cuda::std::complex specializations for half and bfloat#1140miscco merged 37 commits intoNVIDIA:mainfrom
miscco merged 37 commits intoNVIDIA:mainfrom
Conversation
jrhemstad
reviewed
Nov 22, 2023
jrhemstad
reviewed
Nov 22, 2023
65e6f36 to
744f2d1
Compare
miscco
reviewed
Nov 23, 2023
Contributor
miscco
left a comment
There was a problem hiding this comment.
That is a great job working around the quirks of those types 👏
I would love to move some of the traits around (e.g. into is_floating_point.h) and importantly add a proper named define that one can grep for.
c2d87c2 to
add3d52
Compare
miscco
requested changes
Jan 30, 2024
Contributor
miscco
left a comment
There was a problem hiding this comment.
I am wondering whether we should just keep all the _LIBCUDACXX_HAS_NO_NVFP16 in place and define it conditionally for host
Specifically: * disable BF16 when FP16 is disabled, since the former includes the latter; * disable both when the toolkit version is lower than 12.2, since 12.2 is when both types got the host versions of a lot of functions we need to make useful heterogeneous things with them; * disable both in host-only TU, as there's no easy way I could find to detect the condition above. I've included an opt-in macro for asserting that the headers (if available) are from a sufficiently new CTK, will add that to docs in a later commit.
f2893fa to
8121bba
Compare
NVCC is spewing code that makes various versions of clang unhappy about a deprecated implicit copy constructor of a lambda wrapper, so just work around that by not using one.
miscco
approved these changes
Feb 26, 2024
2 tasks
griwes
commented
Feb 29, 2024
wmaxey
reviewed
Mar 7, 2024
wmaxey
approved these changes
Mar 7, 2024
Co-authored-by: Wesley Maxey <71408887+wmaxey@users.noreply.github.com>
Member
|
Note: As discussed offline, local tests show that at least on sm86/89 we need this patch for performance reasons. I haven't had a chance to test on sm70/80/90, though. diff --git a/libcudacxx/include/cuda/std/detail/libcxx/include/complex b/libcudacxx/include/cuda/std/detail/libcxx/include/complex
index 3ba249779..416c0e71d 100644
--- a/libcudacxx/include/cuda/std/detail/libcxx/include/complex
+++ b/libcudacxx/include/cuda/std/detail/libcxx/include/complex
@@ -1702,6 +1702,16 @@ atanh(const complex<_Tp>& __x)
return complex<_Tp>(__constexpr_copysign(__z.real(), __x.real()), __constexpr_copysign(__z.imag(), __x.imag()));
}
+// we add a specialization for fp16 atanh because of performance issues
+template<>
+_LIBCUDACXX_INLINE_VISIBILITY complex<__half>
+atanh(const complex<__half>& __x)
+{
+ complex<float> __temp(__x);
+ __temp = _CUDA_VSTD::atanh(__temp);
+ return complex<__half>(__temp.real(), __temp.imag());
+}
+
// sinh
template<class _Tp>
@@ -1815,6 +1825,16 @@ atan(const complex<_Tp>& __x)
return complex<_Tp>(__z.imag(), -__z.real());
}
+// we add a specialization for fp16 atanh because of performance issues
+template<>
+_LIBCUDACXX_INLINE_VISIBILITY complex<__half>
+atan(const complex<__half>& __x)
+{
+ complex<float> __temp(__x);
+ __temp = _CUDA_VSTD::atan(__temp);
+ return complex<__half>(__temp.real(), __temp.imag());
+}
+
// sin
template<class _Tp> |
Contributor
|
@leofang I added some workarounds for |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Resolves #1139
Introduce specializations of
complex<T>for half and bfloat.Checklist
Additional checklist