c11_atomics: unify host half representation and conversion with wrapper class #2503

Nuullll · 2025-08-26T08:05:23Z

Introduce HostHalf wrapper class to eliminate explicit cl_half_from_float
and cl_half_to_float conversions throughout the test code. The wrapper
provides semantic value constructors/operators and automatic conversions,
simplifying half-precision arithmetic operations.

Key improvements:

HostHalf class with operator overloading for arithmetic and comparisons
Type traits is_host_atomic_fp_v and is_host_fp_v for generic FP handling
Unified floating-point atomic operations (add/sub/min/max/exchange)
Removed 300+ lines of half-specific conditional branches
Consistent calculation for all FP types

bashbaug · 2025-09-16T20:15:17Z

Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks!

Nuullll · 2025-10-30T05:57:03Z

Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks!

Sorry for the late response - I was distracted by other work.

The issue can be reproduced by running:

./OpenCL-CTS/build/test_conformance/c11_atomics/test_c11_atomics svm_atomic_store

with Intel OpenCL CPU implementation (which has fp16 support enabled).

Root cause

In the OpenCL kernel:

#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernel void test_atomic_kernel(uint threadCount, uint numDestItems, volatile __global atomic_half *destMemory, __global half *oldValues)
{
  uint  tid = get_global_id(0);
  atomic_store_explicit(&destMemory[tid], tid, memory_order_relaxed, memory_scope_all_devices);
}

When uint tid is stored as a half value, the OpenCL device properly converts the integer to IEEE 754 half-precision floating point format. However, in the host reference code, the original implementation simply cast cl_uint directly to cl_half:

host_atomic_store(&destMemory[tid], (HostDataType)tid, MemoryOrder());

cl_half is defined as uint16_t in OpenCL Header, a direct cast from cl_uint to cl_half just truncates the integer bits, rather than performing proper float-to-half conversion.

bashbaug · 2025-10-30T15:27:16Z

Ah, gotcha, thank you for the explanation!

FWIW, it was the static cast to a float in the snippet below that looked questionable (note: v is a uint)::

            // For half types, convert from float to proper half-precision bit
            // pattern
            return cl_half_from_float(static_cast<float>(v), gHalfRoundingMode);

But, it looks like this is matching what the kernel is doing (convert the uint to a half), so it's correct.

Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running:

./test_conformance/c11_atomics/test_c11_atomics atomic_store

I'll add "focused review" and we'll see if we can get this merged next week.

bashbaug · 2025-11-04T16:43:24Z

Discussed in the November 4th teleconference. Will merge after @shajder 's review.

test_conformance/c11_atomics/common.h

Nuullll · 2025-11-06T07:50:08Z

Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running:
./test_conformance/c11_atomics/test_c11_atomics atomic_store

@bashbaug Thanks for pointing this out. atomic_store passes for both CPU and GPU, because by default it does not use HostFunction for verification:

OpenCL-CTS/test_conformance/c11_atomics/common.h

Lines 327 to 333 in e641de9

    
           virtual cl_uint MaxHostThreads() 
        
           { 
        
               if (UseSVM() || gHost) 
        
                   return MAX_HOST_THREADS; 
        
               else 
        
                   return 0; 
        
           }

OpenCL-CTS/test_conformance/c11_atomics/common.h

Line 1141 in e641de9

hostThreadCount = MaxHostThreads();

OpenCL-CTS/test_conformance/c11_atomics/common.h

Lines 1426 to 1428 in e641de9

    
           if (hostThreadCount > 0) 
        
               ThreadPool_Do(HostThreadFunction, hostThreadCount, 
        
                             &hostThreadContexts[0]);

OpenCL-CTS/test_conformance/c11_atomics/common.h

Lines 146 to 155 in e641de9

    
           static cl_int HostThreadFunction(cl_uint job_id, cl_uint thread_id, 
        
                                            void *userInfo) 
        
           { 
        
               THostThreadContext *threadContext = 
        
                   ((THostThreadContext *)userInfo) + job_id; 
        
               threadContext->test->HostFunction( 
        
                   threadContext->tid, threadContext->threadCount, 
        
                   threadContext->destMemory, threadContext->oldValues); 
        
               return 0; 
        
           }

OpenCL-CTS/test_conformance/c11_atomics/test_atomics.cpp

Lines 87 to 92 in e641de9

    
           virtual void HostFunction(cl_uint tid, cl_uint threadCount, 
        
                                     volatile HostAtomicType *destMemory, 
        
                                     HostDataType *oldValues) 
        
           { 
        
               host_atomic_store(&destMemory[tid], (HostDataType)tid, MemoryOrder()); 
        
           }

The same issue can be exposed on CPU if we force using host threads for verification:

./test_conformance/c11_atomics/test_c11_atomics atomic_store -host

test_conformance/c11_atomics/common.h

Nuullll · 2025-11-24T08:34:59Z

I've refactored the host half handling with a new wrapper class HostHalf so that we don't need much extra caution for the half conversions everywhere.

test_conformance/c11_atomics/host_atomics.h

bashbaug · 2025-12-09T16:04:55Z

@Nuullll would you mind resolving the merge conflicts? I took a look and they don't look too bad, but they aren't quite something I'm comfortable resolving via the web UI. Thanks!

…er class Introduce HostHalf wrapper class to eliminate explicit cl_half_from_float and cl_half_to_float conversions throughout the test code. The wrapper provides semantic value constructors/operators and automatic conversions, simplifying half-precision arithmetic operations. Key improvements: - HostHalf class with operator overloading for arithmetic and comparisons - Type traits is_host_atomic_fp_v and is_host_fp_v for generic FP handling - Unified floating-point atomic operations (add/sub/min/max/exchange) - Removed 300+ lines of half-specific conditional branches - Consistent error tolerance calculation for all FP types fix windows build fix format fix format fix format

Nuullll · 2025-12-10T03:20:27Z

test_conformance/c11_atomics/test_atomics.cpp

+                                       return std::isfinite(
+                                           static_cast<double>(val));
+                                   })
+                       && "Infinite subtraction value detected!");


This assertion indeed helped capture the kahan_sub implementation bug in #2368. See the fix here: https://github.com/KhronosGroup/OpenCL-CTS/pull/2503/files#diff-aac1c2081764e51ba19f3e93970d6de6c44ca0b530316e76acddfa4bb4a1829cR1533

Nuullll · 2025-12-10T03:23:23Z

test_conformance/c11_atomics/test_atomics.cpp

    {
        double y = num - compensation;
        double t = sum - y;
-        compensation = (t - sum) - y;


(t-sum) == -y, so compensation == -2y here, causing sum to explode and overflow to infinity.

Thanks, good catch but this is still not quite correct. It should be:

double y = - num - compensation; double t = sum + y; compensation = (t - sum) - y;

Nuullll · 2025-12-10T03:25:30Z

@shajder @bashbaug I've resolved the conflict and fixed a minor issue of #2368 , please take another look. Thanks!

shajder · 2025-12-12T10:08:58Z

test_conformance/c11_atomics/test_atomics.cpp

    {
        double y = num - compensation;
        double t = sum - y;
-        compensation = (t - sum) - y;


Thanks, good catch but this is still not quite correct. It should be:

double y = - num - compensation; double t = sum + y; compensation = (t - sum) - y;

shajder · 2025-12-12T10:12:05Z

@Nuullll We would like to merge this PR shortly. I’d appreciate it if you could apply any upcoming corrections in separate PRs. Thanks!

bashbaug previously approved these changes Oct 30, 2025

View reviewed changes

bashbaug requested a review from shajder October 30, 2025 15:27

bashbaug added the focused review label Oct 30, 2025

shajder requested changes Nov 5, 2025

View reviewed changes

test_conformance/c11_atomics/common.h Outdated Show resolved Hide resolved

lakshmih previously approved these changes Nov 7, 2025

View reviewed changes

Nuullll requested a review from shajder November 12, 2025 05:32

silverclaw reviewed Nov 18, 2025

View reviewed changes

test_conformance/c11_atomics/common.h Outdated Show resolved Hide resolved

Nuullll dismissed stale reviews from lakshmih and bashbaug via e861104 November 24, 2025 08:25

Nuullll force-pushed the review/Nuullll/fix-host-half-conversion branch from 93e0b57 to e861104 Compare November 24, 2025 08:25

Nuullll changed the title ~~c11_atomics: Fix cl_uint --> cl_half conversion on host~~ c11_atomics: unify host half representation and conversion with wrapper class Nov 24, 2025

Nuullll requested review from bashbaug, lakshmih and silverclaw November 24, 2025 08:30

Nuullll marked this pull request as draft November 25, 2025 05:45

shajder requested changes Nov 25, 2025

View reviewed changes

test_conformance/c11_atomics/host_atomics.h Outdated Show resolved Hide resolved

Nuullll requested a review from shajder November 26, 2025 02:42

shajder previously approved these changes Nov 26, 2025

View reviewed changes

Nuullll marked this pull request as ready for review November 27, 2025 03:47

Nuullll dismissed shajder’s stale review via f6d3199 December 10, 2025 02:18

Nuullll force-pushed the review/Nuullll/fix-host-half-conversion branch from 4cc3244 to f6d3199 Compare December 10, 2025 02:18

fix format

befc6ed

Nuullll requested a review from shajder December 10, 2025 02:25

fix kahan_sub

f0430cc

Nuullll commented Dec 10, 2025

View reviewed changes

lakshmih previously approved these changes Dec 10, 2025

View reviewed changes

shajder requested changes Dec 12, 2025

View reviewed changes

apply suggestion

a052fca

Nuullll dismissed lakshmih’s stale review via a052fca December 16, 2025 01:28

Nuullll requested a review from shajder December 16, 2025 01:28

fix format

8f02e0e

shajder approved these changes Dec 16, 2025

View reviewed changes

neiltrevett merged commit 119af24 into KhronosGroup:main Dec 16, 2025
8 checks passed

c11_atomics: unify host half representation and conversion with wrapper class #2503

c11_atomics: unify host half representation and conversion with wrapper class #2503

Conversation

Nuullll commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashbaug commented Sep 16, 2025

Uh oh!

Nuullll commented Oct 30, 2025

Root cause

Uh oh!

bashbaug commented Oct 30, 2025

Uh oh!

bashbaug commented Nov 4, 2025

Uh oh!

Uh oh!

Nuullll commented Nov 6, 2025

Uh oh!

Uh oh!

Nuullll commented Nov 24, 2025

Uh oh!

Uh oh!

bashbaug commented Dec 9, 2025

Uh oh!

Nuullll Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Nuullll Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

shajder Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Nuullll commented Dec 10, 2025

Uh oh!

shajder Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

shajder commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Nuullll commented Aug 26, 2025 •

edited

Loading