Allow O2K_CONST_VEC to handle all TYP_SIMD32/64#127402
Merged
tannergooding merged 4 commits intodotnet:mainfrom Apr 25, 2026
Merged
Allow O2K_CONST_VEC to handle all TYP_SIMD32/64#127402tannergooding merged 4 commits intodotnet:mainfrom
tannergooding merged 4 commits intodotnet:mainfrom
Conversation
Extend the O2K_CONST_VEC assertion storage to handle TYP_SIMD32/64 via a heap-allocated simd_t* (m_bigSimdVal) in the existing union. SIMD8/12/16 continue to use the inline simd16_t. The active SIMD byte size is tracked in m_simdSize, unioned with m_encodedIconFlags to keep struct size unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…equality Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
Enables O2K_CONST_VEC assertions to represent all SIMD constants (including TYP_SIMD32/64) by storing wider constants out-of-line, allowing assertion propagation and related optimizations to apply to more SIMD constant cases.
Changes:
- Extend
AssertionDscSIMD-constant storage to supportTYP_SIMD32/64via heap allocation, while keepingTYP_SIMD8/12/16inline. - Update assertion equality and constant propagation logic to use a size-tagged SIMD payload (
GetSimdSize()/GetSimdConstant()). - Expand
optAssertionGenJtruehandling to recognize Vector256/Vector512 equality/inequality intrinsics on xarch.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/coreclr/jit/compiler.h | Adds size-tagged SIMD constant payload handling in AssertionDsc with inline vs heap-backed storage for wider vectors. |
| src/coreclr/jit/assertionprop.cpp | Creates and propagates SIMD constant assertions for all SIMD sizes, and recognizes additional SIMD equality/inequality intrinsics. |
EgorBo
approved these changes
Apr 25, 2026
This was referenced Apr 25, 2026
Open
EgorBo
added a commit
to EgorBo/runtime-1
that referenced
this pull request
Apr 28, 2026
…tNode When `op1` is not invariant or a local, `fgMakeMultiUse(&op1)` rewrites `op1` into `COMMA(STORE temp, LCL_VAR temp)` and returns a fresh load of `temp`. The STORE is the only place `temp` is written, so it must evaluate before any later read of `temp`. The non-AVX-512 branch passed the clean clone as the first argument of `AND_NOT` and the COMMA-wrapped tree as the second. `AND_NOT(a, b)` decomposes into `AND(a, NOT(b))`, so `a` evaluates first - meaning the `LCL_VAR temp` read happened before the STORE inside `b`, producing garbage for the non-NaN'd input. The bug was latent: prior to dotnet#127124 / dotnet#127402 the inner `IsNaN(op1)` expanded into real per-element compares that kept enough materialization around to mask the bad ordering. With SIMD32/64 constant propagation, `CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the entire right subtree collapses to constants, leaving only the broken left-side read - which is what the `Vector512Tests.ConvertToInt32Test` failure on non-AVX-512 hosts (libraries-jitstress-random, nativeaot-outerloop, iossimulator) was actually exercising. Fix: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s first argument and use the side-effect-free `op1Clone1` for the IsNaN check. Verified by repro on a non-AVX-512 host (DOTNET_EnableAVX512=0): Vector512.ConvertToInt32(Vector512.Create(float.MinValue)) now returns Vector512<int>.Create(int.MinValue) as expected. SPMI benchmarks.run replay clean. Fixes dotnet#127440. Supersedes dotnet#127499 (which gated unreachable code in `impSpecialIntrinsic` - `NI_Vector512_ConvertToInt32` is already filtered upstream by `lookupId` on non-AVX-512 hosts, so that gate did not actually fix the failure). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EgorBo
added a commit
that referenced
this pull request
Apr 29, 2026
…tNode (#127524) Fixes a use-before-def in the non-AVX-512 saturation path of `Compiler::gtNewSimdCvtNode` that produces wrong results for `Vector{128,256,512}.ConvertToInt32(Vector*<float>)` when the input is not invariant or a local. ## Repro (DOTNET_EnableAVX512=0) ```cs [MethodImpl(MethodImplOptions.NoInlining)] static Vector512<int> Test() => Vector512.ConvertToInt32(Vector512.Create(float.MinValue)); ``` Before this PR: ``` <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2147483647, 0, 0, 0> ``` After this PR: ``` <-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648> ``` ## Root cause ```cpp GenTree* op1Clone1 = fgMakeMultiUse(&op1); // op1 := COMMA(STORE temp = orig, LCL_VAR temp); op1Clone1 := fresh LCL_VAR temp GenTree* mask1 = gtNewSimdIsNaNNode(type, op1, ...); // uses op1 (the COMMA) fixupVal = gtNewSimdBinOpNode(GT_AND_NOT, op1Clone1, mask1, …); // AND_NOT(LCL_VAR temp, IsNaN(COMMA(STORE temp, ...))) ``` `AND_NOT(a, b)` decomposes into `AND(a, NOT(b))`, so the left operand `a` evaluates first. The `LCL_VAR temp` read happens **before** the `STORE temp` that lives inside the COMMA on the right, so the AND consumes whatever was on the stack. Disasm on main with AVX-512 disabled (only the relevant block, V05 / V08 are the temps in question): ```asm vpcmpeqd ymm0, ymm0, ymm0 vandps ymm0, ymm0, ymmword ptr [rsp+0x20] ; reads V05 - never written vbroadcastss ymm1, dword ptr [reloc @rwd00] vcmpgeps ymm2, ymm0, ymm1 vbroadcastss ymm3, dword ptr [reloc @RWD04] vcvttps2dq ymm0, ymm0 vpblendvb ymm0, ymm0, ymm3, ymm2 vmovups ymm2, ymmword ptr [rsp] ; reads V08 - never written ... ``` The bug has existed since `gtNewSimdCvtNode` was first introduced; it stayed latent because pre-#127124 / #127402 the inner `IsNaN(op1)` expanded into per-element compares that kept enough materialization around to mask the bad ordering. With SIMD32/64 constant propagation, `CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the whole right subtree collapses to constants, leaving only the broken left-side read - which is exactly what `Vector512Tests.ConvertToInt32Test` started catching on non-AVX-512 hosts. ## Fix Two-line swap: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s left arg; use the clone for the IsNaN check. ```cpp GenTree* op1Clone1 = fgMakeMultiUse(&op1); GenTree* mask1 = gtNewSimdIsNaNNode(type, op1Clone1, simdSourceBaseType, simdSize); fixupVal = gtNewSimdBinOpNode(GT_AND_NOT, type, op1, mask1, simdSourceBaseType, simdSize); ``` Now the COMMA evaluates first, the STORE happens, both subsequent reads of the temp get the correct value. ## Validation - Repro produces correct output on AVX-512 disabled and AVX-512 enabled. - `Vector512Tests.ConvertToInt32Test` / `ConvertToInt32NativeTest` pass with `DOTNET_EnableAVX512=0`. - SuperPMI replay against `benchmarks.run` clean (38409 contexts, 0 failures, 0 asserts). ## Note on #127499 That PR adds an AVX-512 gate in `impSpecialIntrinsic`'s `NI_Vector512_ConvertToInt32` case. As @tannergooding pointed out in #127499 (comment) / #127499 (comment), that case is already unreachable on non-AVX-512 hosts because `Compiler::lookupId` returns `NI_Illegal` for `Vector512` ISA when AVX-512 is not opportunistically supported. I verified that applying #127499 alone leaves the test failing - the gate it adds is dead code. This PR addresses the actual bug; #127499 can be closed. Fixes #127440. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This provides O2K_CONST_VEC support for all SIMD constants by heap allocating for values larger than TYP_SIMD16. It is an alternative to #127390.