JIT: fix Vector{128,256}.ConvertToInt32 use-before-def in gtNewSimdCvtNode#127524
Merged
EgorBo merged 1 commit intodotnet:mainfrom Apr 29, 2026
Merged
JIT: fix Vector{128,256}.ConvertToInt32 use-before-def in gtNewSimdCvtNode#127524EgorBo merged 1 commit intodotnet:mainfrom
EgorBo merged 1 commit intodotnet:mainfrom
Conversation
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
tannergooding
approved these changes
Apr 28, 2026
…tNode When `op1` is not invariant or a local, `fgMakeMultiUse(&op1)` rewrites `op1` into `COMMA(STORE temp, LCL_VAR temp)` and returns a fresh load of `temp`. The STORE is the only place `temp` is written, so it must evaluate before any later read of `temp`. The non-AVX-512 branch passed the clean clone as the first argument of `AND_NOT` and the COMMA-wrapped tree as the second. `AND_NOT(a, b)` decomposes into `AND(a, NOT(b))`, so `a` evaluates first - meaning the `LCL_VAR temp` read happened before the STORE inside `b`, producing garbage for the non-NaN'd input. The bug was latent: prior to dotnet#127124 / dotnet#127402 the inner `IsNaN(op1)` expanded into real per-element compares that kept enough materialization around to mask the bad ordering. With SIMD32/64 constant propagation, `CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the entire right subtree collapses to constants, leaving only the broken left-side read - which is what the `Vector512Tests.ConvertToInt32Test` failure on non-AVX-512 hosts (libraries-jitstress-random, nativeaot-outerloop, iossimulator) was actually exercising. Fix: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s first argument and use the side-effect-free `op1Clone1` for the IsNaN check. Verified by repro on a non-AVX-512 host (DOTNET_EnableAVX512=0): Vector512.ConvertToInt32(Vector512.Create(float.MinValue)) now returns Vector512<int>.Create(int.MinValue) as expected. SPMI benchmarks.run replay clean. Fixes dotnet#127440. Supersedes dotnet#127499 (which gated unreachable code in `impSpecialIntrinsic` - `NI_Vector512_ConvertToInt32` is already filtered upstream by `lookupId` on non-AVX-512 hosts, so that gate did not actually fix the failure). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2ffb895 to
bfa8018
Compare
Member
Author
yeah, unless it's actually possible to produce that with IL (e.g. skip locals init, etc) |
Member
Author
|
/ba-g timeouts |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #127440
Fixes a use-before-def in the non-AVX-512 saturation path of
Compiler::gtNewSimdCvtNodethat produces wrong results forVector{128,256,512}.ConvertToInt32(Vector*<float>)when the input is not invariant or a local.Repro (DOTNET_EnableAVX512=0)
Before this PR:
After this PR:
Root cause
AND_NOT(a, b)decomposes intoAND(a, NOT(b)), so the left operandaevaluates first. TheLCL_VAR tempread happens before theSTORE tempthat lives inside the COMMA on the right, so the AND consumes whatever was on the stack.Disasm on main with AVX-512 disabled (only the relevant block, V05 / V08 are the temps in question):
The bug has existed since
gtNewSimdCvtNodewas first introduced; it stayed latent because pre-#127124 / #127402 the innerIsNaN(op1)expanded into per-element compares that kept enough materialization around to mask the bad ordering. With SIMD32/64 constant propagation,CompareNotEqual(temp, temp)value-numbers as AllBitsSet and the whole right subtree collapses to constants, leaving only the broken left-side read - which is exactly whatVector512Tests.ConvertToInt32Teststarted catching on non-AVX-512 hosts.Fix
Two-line swap: pass
op1(the COMMA, evaluated first) asAND_NOT's left arg; use the clone for the IsNaN check.Now the COMMA evaluates first, the STORE happens, both subsequent reads of the temp get the correct value.
Validation
Vector512Tests.ConvertToInt32Test/ConvertToInt32NativeTestpass withDOTNET_EnableAVX512=0.benchmarks.runclean (38409 contexts, 0 failures, 0 asserts).Note on #127499
That PR adds an AVX-512 gate in
impSpecialIntrinsic'sNI_Vector512_ConvertToInt32case. As @tannergooding pointed out in #127499 (comment) / #127499 (comment), that case is already unreachable on non-AVX-512 hosts becauseCompiler::lookupIdreturnsNI_IllegalforVector512ISA when AVX-512 is not opportunistically supported. I verified that applying #127499 alone leaves the test failing - the gate it adds is dead code. This PR addresses the actual bug; #127499 can be closed.Fixes #127440.