-
Notifications
You must be signed in to change notification settings - Fork 221
c11_atomics: unify host half representation and conversion with wrapper class #2503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c11_atomics: unify host half representation and conversion with wrapper class #2503
Conversation
|
Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks! |
Sorry for the late response - I was distracted by other work. The issue can be reproduced by running: with Intel OpenCL CPU implementation (which has fp16 support enabled). Root causeIn the OpenCL kernel: #pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernel void test_atomic_kernel(uint threadCount, uint numDestItems, volatile __global atomic_half *destMemory, __global half *oldValues)
{
uint tid = get_global_id(0);
atomic_store_explicit(&destMemory[tid], tid, memory_order_relaxed, memory_scope_all_devices);
}When host_atomic_store(&destMemory[tid], (HostDataType)tid, MemoryOrder());
|
|
Ah, gotcha, thank you for the explanation! FWIW, it was the static cast to a // For half types, convert from float to proper half-precision bit
// pattern
return cl_half_from_float(static_cast<float>(v), gHalfRoundingMode);But, it looks like this is matching what the kernel is doing (convert the Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running: ./test_conformance/c11_atomics/test_c11_atomics atomic_storeI'll add "focused review" and we'll see if we can get this merged next week. |
|
Discussed in the November 4th teleconference. Will merge after @shajder 's review. |
@bashbaug Thanks for pointing this out. OpenCL-CTS/test_conformance/c11_atomics/common.h Lines 327 to 333 in e641de9
OpenCL-CTS/test_conformance/c11_atomics/common.h Line 1141 in e641de9
OpenCL-CTS/test_conformance/c11_atomics/common.h Lines 1426 to 1428 in e641de9
OpenCL-CTS/test_conformance/c11_atomics/common.h Lines 146 to 155 in e641de9
OpenCL-CTS/test_conformance/c11_atomics/test_atomics.cpp Lines 87 to 92 in e641de9
The same issue can be exposed on CPU if we force using host threads for verification: |
93e0b57 to
e861104
Compare
|
I've refactored the host half handling with a new wrapper class |
|
@Nuullll would you mind resolving the merge conflicts? I took a look and they don't look too bad, but they aren't quite something I'm comfortable resolving via the web UI. Thanks! |
…er class Introduce HostHalf wrapper class to eliminate explicit cl_half_from_float and cl_half_to_float conversions throughout the test code. The wrapper provides semantic value constructors/operators and automatic conversions, simplifying half-precision arithmetic operations. Key improvements: - HostHalf class with operator overloading for arithmetic and comparisons - Type traits is_host_atomic_fp_v and is_host_fp_v for generic FP handling - Unified floating-point atomic operations (add/sub/min/max/exchange) - Removed 300+ lines of half-specific conditional branches - Consistent error tolerance calculation for all FP types fix windows build fix format fix format fix format
4cc3244 to
f6d3199
Compare
| return std::isfinite( | ||
| static_cast<double>(val)); | ||
| }) | ||
| && "Infinite subtraction value detected!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion indeed helped capture the kahan_sub implementation bug in #2368. See the fix here: https://github.com/KhronosGroup/OpenCL-CTS/pull/2503/files#diff-aac1c2081764e51ba19f3e93970d6de6c44ca0b530316e76acddfa4bb4a1829cR1533
| { | ||
| double y = num - compensation; | ||
| double t = sum - y; | ||
| compensation = (t - sum) - y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(t-sum) == -y, so compensation == -2y here, causing sum to explode and overflow to infinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good catch but this is still not quite correct. It should be:
double y = - num - compensation;
double t = sum + y;
compensation = (t - sum) - y;
| { | ||
| double y = num - compensation; | ||
| double t = sum - y; | ||
| compensation = (t - sum) - y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good catch but this is still not quite correct. It should be:
double y = - num - compensation;
double t = sum + y;
compensation = (t - sum) - y;
|
@Nuullll We would like to merge this PR shortly. I’d appreciate it if you could apply any upcoming corrections in separate PRs. Thanks! |
Introduce
HostHalfwrapper class to eliminate explicitcl_half_from_floatand
cl_half_to_floatconversions throughout the test code. The wrapperprovides semantic value constructors/operators and automatic conversions,
simplifying half-precision arithmetic operations.
Key improvements:
HostHalfclass with operator overloading for arithmetic and comparisonsis_host_atomic_fp_vandis_host_fp_vfor generic FP handling