nvptx: Incorrect use of LLVM intrinsics for f16x2_min/max(_nan)

The nvptx intrinsics `f16x2_min`/`f16x2_max`/`f16x2_min_nan`/`f16x2_max_nan` are currently being mapped to the LLVM intrinsics `minnum`/`minimum`/`maxnum`/`maximum`, respectively (in some cases this is indirected via `simd_fmin`/`simd_fmax`, which are documented to correspond to `minnum nsz`/`maxnum nsz`, but we currently don't actually emit the `nsz` attribute). See [here](https://llvm.org/docs/LangRef.html#floating-point-min-max-intrinsics-comparison) for an overview of the LLVM float min/max operations.

This is incorrect:
- According to the [docs](https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-min), the behavior for signed zeros is defined by `(a < b) ? a : b`, i.e., when both operands compare equal, the 2nd operand is returned. That's not what *any* of the LLVM intrinsics does: they either treat `-0.0` as smaller than `+0.0` (that's the default), or return either value non-deterministically (when the `nsz` attribute is present). [This means it is actually a bug that LLVM uses the `min.f16x2` nvptx operation for lowering `minnum`...]
- According to the docs, assuming that `isNaN` checks for both QNaN and SNaN, if exactly one input is *any* NaN, the other input is returned for `f16x2_min`/`f16x2_max`. In contrast, `minnum`/`maxnum` say that when an input is SNaN, the return value is a NaN or the other input. The LLVM variant with the correct NaN semantics is `minimumnum`/`maximumnum`.

Cc @kjetilkjeka @folkertdev 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvptx: Incorrect use of LLVM intrinsics for f16x2_min/max(_nan) #2056

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

nvptx: Incorrect use of LLVM intrinsics for f16x2_min/max(_nan) #2056

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions