Skip to content

JIT: fix Vector{128,256}.ConvertToInt32 use-before-def in gtNewSimdCvtNode#127524

Merged
EgorBo merged 1 commit intodotnet:mainfrom
EgorBo:fix-127440-cvtnode-evalorder
Apr 29, 2026
Merged

JIT: fix Vector{128,256}.ConvertToInt32 use-before-def in gtNewSimdCvtNode#127524
EgorBo merged 1 commit intodotnet:mainfrom
EgorBo:fix-127440-cvtnode-evalorder

Conversation

@EgorBo
Copy link
Copy Markdown
Member

@EgorBo EgorBo commented Apr 28, 2026

Fixes #127440

Fixes a use-before-def in the non-AVX-512 saturation path of Compiler::gtNewSimdCvtNode that produces wrong results for Vector{128,256,512}.ConvertToInt32(Vector*<float>) when the input is not invariant or a local.

Repro (DOTNET_EnableAVX512=0)

[MethodImpl(MethodImplOptions.NoInlining)]
static Vector512<int> Test() => Vector512.ConvertToInt32(Vector512.Create(float.MinValue));

Before this PR:

<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2147483647, 0, 0, 0>

After this PR:

<-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,
 -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648>

Root cause

GenTree* op1Clone1 = fgMakeMultiUse(&op1);   // op1 := COMMA(STORE temp = orig, LCL_VAR temp); op1Clone1 := fresh LCL_VAR temp
GenTree* mask1     = gtNewSimdIsNaNNode(type, op1, ...);                  // uses op1 (the COMMA)
fixupVal           = gtNewSimdBinOpNode(GT_AND_NOT, op1Clone1, mask1, …); // AND_NOT(LCL_VAR temp, IsNaN(COMMA(STORE temp, ...)))

AND_NOT(a, b) decomposes into AND(a, NOT(b)), so the left operand a evaluates first. The LCL_VAR temp read happens before the STORE temp that lives inside the COMMA on the right, so the AND consumes whatever was on the stack.

Disasm on main with AVX-512 disabled (only the relevant block, V05 / V08 are the temps in question):

vpcmpeqd ymm0, ymm0, ymm0
vandps   ymm0, ymm0, ymmword ptr [rsp+0x20]   ; reads V05 - never written
vbroadcastss ymm1, dword ptr [reloc @RWD00]
vcmpgeps ymm2, ymm0, ymm1
vbroadcastss ymm3, dword ptr [reloc @RWD04]
vcvttps2dq ymm0, ymm0
vpblendvb ymm0, ymm0, ymm3, ymm2
vmovups  ymm2, ymmword ptr [rsp]              ; reads V08 - never written
...

The bug has existed since gtNewSimdCvtNode was first introduced; it stayed latent because pre-#127124 / #127402 the inner IsNaN(op1) expanded into per-element compares that kept enough materialization around to mask the bad ordering. With SIMD32/64 constant propagation, CompareNotEqual(temp, temp) value-numbers as AllBitsSet and the whole right subtree collapses to constants, leaving only the broken left-side read - which is exactly what Vector512Tests.ConvertToInt32Test started catching on non-AVX-512 hosts.

Fix

Two-line swap: pass op1 (the COMMA, evaluated first) as AND_NOT's left arg; use the clone for the IsNaN check.

GenTree* op1Clone1 = fgMakeMultiUse(&op1);
GenTree* mask1     = gtNewSimdIsNaNNode(type, op1Clone1, simdSourceBaseType, simdSize);
fixupVal           = gtNewSimdBinOpNode(GT_AND_NOT, type, op1, mask1, simdSourceBaseType, simdSize);

Now the COMMA evaluates first, the STORE happens, both subsequent reads of the temp get the correct value.

Validation

  • Repro produces correct output on AVX-512 disabled and AVX-512 enabled.
  • Vector512Tests.ConvertToInt32Test / ConvertToInt32NativeTest pass with DOTNET_EnableAVX512=0.
  • SuperPMI replay against benchmarks.run clean (38409 contexts, 0 failures, 0 asserts).

Note on #127499

That PR adds an AVX-512 gate in impSpecialIntrinsic's NI_Vector512_ConvertToInt32 case. As @tannergooding pointed out in #127499 (comment) / #127499 (comment), that case is already unreachable on non-AVX-512 hosts because Compiler::lookupId returns NI_Illegal for Vector512 ISA when AVX-512 is not opportunistically supported. I verified that applying #127499 alone leaves the test failing - the gate it adds is dead code. This PR addresses the actual bug; #127499 can be closed.

Fixes #127440.

Copilot AI review requested due to automatic review settings April 28, 2026 18:42
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 28, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we could have a pass to help catch likely errors around. In particular, this creates a new local and so is functionally "use before initialization of a JIT allocated temp"

…tNode

When `op1` is not invariant or a local, `fgMakeMultiUse(&op1)` rewrites
`op1` into `COMMA(STORE temp, LCL_VAR temp)` and returns a fresh load of
`temp`. The STORE is the only place `temp` is written, so it must
evaluate before any later read of `temp`.

The non-AVX-512 branch passed the clean clone as the first argument of
`AND_NOT` and the COMMA-wrapped tree as the second. `AND_NOT(a, b)`
decomposes into `AND(a, NOT(b))`, so `a` evaluates first - meaning the
`LCL_VAR temp` read happened before the STORE inside `b`, producing
garbage for the non-NaN'd input.

The bug was latent: prior to dotnet#127124 / dotnet#127402 the inner `IsNaN(op1)`
expanded into real per-element compares that kept enough materialization
around to mask the bad ordering. With SIMD32/64 constant propagation,
`CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the entire
right subtree collapses to constants, leaving only the broken left-side
read - which is what the `Vector512Tests.ConvertToInt32Test` failure on
non-AVX-512 hosts (libraries-jitstress-random, nativeaot-outerloop,
iossimulator) was actually exercising.

Fix: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s first
argument and use the side-effect-free `op1Clone1` for the IsNaN check.

Verified by repro on a non-AVX-512 host (DOTNET_EnableAVX512=0):
  Vector512.ConvertToInt32(Vector512.Create(float.MinValue))
now returns Vector512<int>.Create(int.MinValue) as expected. SPMI
benchmarks.run replay clean.

Fixes dotnet#127440.
Supersedes dotnet#127499 (which gated unreachable code in `impSpecialIntrinsic`
- `NI_Vector512_ConvertToInt32` is already filtered upstream by
`lookupId` on non-AVX-512 hosts, so that gate did not actually fix the
failure).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@EgorBo EgorBo force-pushed the fix-127440-cvtnode-evalorder branch from 2ffb895 to bfa8018 Compare April 28, 2026 18:54
@EgorBo
Copy link
Copy Markdown
Member Author

EgorBo commented Apr 28, 2026

This seems like something we could have a pass to help catch likely errors around. In particular, this creates a new local and so is functionally "use before initialization of a JIT allocated temp"

yeah, unless it's actually possible to produce that with IL (e.g. skip locals init, etc)

@EgorBo EgorBo enabled auto-merge (squash) April 28, 2026 22:09
@EgorBo
Copy link
Copy Markdown
Member Author

EgorBo commented Apr 29, 2026

/ba-g timeouts

@EgorBo EgorBo merged commit 9a6034a into dotnet:main Apr 29, 2026
132 of 137 checks passed
@EgorBo EgorBo deleted the fix-127440-cvtnode-evalorder branch April 29, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure: System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests.ConvertToInt32Test

2 participants