Skip to content

Allow O2K_CONST_VEC to handle all TYP_SIMD32/64#127402

Merged
tannergooding merged 4 commits intodotnet:mainfrom
tannergooding:jit/wide-simd-assertions
Apr 25, 2026
Merged

Allow O2K_CONST_VEC to handle all TYP_SIMD32/64#127402
tannergooding merged 4 commits intodotnet:mainfrom
tannergooding:jit/wide-simd-assertions

Conversation

@tannergooding
Copy link
Copy Markdown
Member

This provides O2K_CONST_VEC support for all SIMD constants by heap allocating for values larger than TYP_SIMD16. It is an alternative to #127390.

EgorBo and others added 4 commits April 24, 2026 17:48
Extend the O2K_CONST_VEC assertion storage to handle TYP_SIMD32/64 via a
heap-allocated simd_t* (m_bigSimdVal) in the existing union. SIMD8/12/16
continue to use the inline simd16_t. The active SIMD byte size is tracked
in m_simdSize, unioned with m_encodedIconFlags to keep struct size unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…equality

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 21:00
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables O2K_CONST_VEC assertions to represent all SIMD constants (including TYP_SIMD32/64) by storing wider constants out-of-line, allowing assertion propagation and related optimizations to apply to more SIMD constant cases.

Changes:

  • Extend AssertionDsc SIMD-constant storage to support TYP_SIMD32/64 via heap allocation, while keeping TYP_SIMD8/12/16 inline.
  • Update assertion equality and constant propagation logic to use a size-tagged SIMD payload (GetSimdSize()/GetSimdConstant()).
  • Expand optAssertionGenJtrue handling to recognize Vector256/Vector512 equality/inequality intrinsics on xarch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/coreclr/jit/compiler.h Adds size-tagged SIMD constant payload handling in AssertionDsc with inline vs heap-backed storage for wider vectors.
src/coreclr/jit/assertionprop.cpp Creates and propagates SIMD constant assertions for all SIMD sizes, and recognizes additional SIMD equality/inequality intrinsics.

@tannergooding tannergooding marked this pull request as ready for review April 25, 2026 00:00
@tannergooding tannergooding enabled auto-merge (squash) April 25, 2026 00:05
@tannergooding tannergooding merged commit 51cfc7a into dotnet:main Apr 25, 2026
137 of 141 checks passed
EgorBo added a commit to EgorBo/runtime-1 that referenced this pull request Apr 28, 2026
…tNode

When `op1` is not invariant or a local, `fgMakeMultiUse(&op1)` rewrites
`op1` into `COMMA(STORE temp, LCL_VAR temp)` and returns a fresh load of
`temp`. The STORE is the only place `temp` is written, so it must
evaluate before any later read of `temp`.

The non-AVX-512 branch passed the clean clone as the first argument of
`AND_NOT` and the COMMA-wrapped tree as the second. `AND_NOT(a, b)`
decomposes into `AND(a, NOT(b))`, so `a` evaluates first - meaning the
`LCL_VAR temp` read happened before the STORE inside `b`, producing
garbage for the non-NaN'd input.

The bug was latent: prior to dotnet#127124 / dotnet#127402 the inner `IsNaN(op1)`
expanded into real per-element compares that kept enough materialization
around to mask the bad ordering. With SIMD32/64 constant propagation,
`CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the entire
right subtree collapses to constants, leaving only the broken left-side
read - which is what the `Vector512Tests.ConvertToInt32Test` failure on
non-AVX-512 hosts (libraries-jitstress-random, nativeaot-outerloop,
iossimulator) was actually exercising.

Fix: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s first
argument and use the side-effect-free `op1Clone1` for the IsNaN check.

Verified by repro on a non-AVX-512 host (DOTNET_EnableAVX512=0):
  Vector512.ConvertToInt32(Vector512.Create(float.MinValue))
now returns Vector512<int>.Create(int.MinValue) as expected. SPMI
benchmarks.run replay clean.

Fixes dotnet#127440.
Supersedes dotnet#127499 (which gated unreachable code in `impSpecialIntrinsic`
- `NI_Vector512_ConvertToInt32` is already filtered upstream by
`lookupId` on non-AVX-512 hosts, so that gate did not actually fix the
failure).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EgorBo added a commit that referenced this pull request Apr 29, 2026
…tNode (#127524)

Fixes a use-before-def in the non-AVX-512 saturation path of
`Compiler::gtNewSimdCvtNode` that produces wrong results for
`Vector{128,256,512}.ConvertToInt32(Vector*<float>)` when the input is
not invariant or a local.

## Repro (DOTNET_EnableAVX512=0)

```cs
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector512<int> Test() => Vector512.ConvertToInt32(Vector512.Create(float.MinValue));
```

Before this PR:
```
<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2147483647, 0, 0, 0>
```
After this PR:
```
<-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,
 -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648>
```

## Root cause

```cpp
GenTree* op1Clone1 = fgMakeMultiUse(&op1);   // op1 := COMMA(STORE temp = orig, LCL_VAR temp); op1Clone1 := fresh LCL_VAR temp
GenTree* mask1     = gtNewSimdIsNaNNode(type, op1, ...);                  // uses op1 (the COMMA)
fixupVal           = gtNewSimdBinOpNode(GT_AND_NOT, op1Clone1, mask1, …); // AND_NOT(LCL_VAR temp, IsNaN(COMMA(STORE temp, ...)))
```

`AND_NOT(a, b)` decomposes into `AND(a, NOT(b))`, so the left operand
`a` evaluates first. The `LCL_VAR temp` read happens **before** the
`STORE temp` that lives inside the COMMA on the right, so the AND
consumes whatever was on the stack.

Disasm on main with AVX-512 disabled (only the relevant block, V05 / V08
are the temps in question):

```asm
vpcmpeqd ymm0, ymm0, ymm0
vandps   ymm0, ymm0, ymmword ptr [rsp+0x20]   ; reads V05 - never written
vbroadcastss ymm1, dword ptr [reloc @rwd00]
vcmpgeps ymm2, ymm0, ymm1
vbroadcastss ymm3, dword ptr [reloc @RWD04]
vcvttps2dq ymm0, ymm0
vpblendvb ymm0, ymm0, ymm3, ymm2
vmovups  ymm2, ymmword ptr [rsp]              ; reads V08 - never written
...
```

The bug has existed since `gtNewSimdCvtNode` was first introduced; it
stayed latent because pre-#127124 / #127402 the inner `IsNaN(op1)`
expanded into per-element compares that kept enough materialization
around to mask the bad ordering. With SIMD32/64 constant propagation,
`CompareNotEqual(temp, temp)` value-numbers as AllBitsSet and the whole
right subtree collapses to constants, leaving only the broken left-side
read - which is exactly what `Vector512Tests.ConvertToInt32Test` started
catching on non-AVX-512 hosts.

## Fix

Two-line swap: pass `op1` (the COMMA, evaluated first) as `AND_NOT`'s
left arg; use the clone for the IsNaN check.

```cpp
GenTree* op1Clone1 = fgMakeMultiUse(&op1);
GenTree* mask1     = gtNewSimdIsNaNNode(type, op1Clone1, simdSourceBaseType, simdSize);
fixupVal           = gtNewSimdBinOpNode(GT_AND_NOT, type, op1, mask1, simdSourceBaseType, simdSize);
```

Now the COMMA evaluates first, the STORE happens, both subsequent reads
of the temp get the correct value.

## Validation

- Repro produces correct output on AVX-512 disabled and AVX-512 enabled.
- `Vector512Tests.ConvertToInt32Test` / `ConvertToInt32NativeTest` pass
with `DOTNET_EnableAVX512=0`.
- SuperPMI replay against `benchmarks.run` clean (38409 contexts, 0
failures, 0 asserts).

## Note on #127499

That PR adds an AVX-512 gate in `impSpecialIntrinsic`'s
`NI_Vector512_ConvertToInt32` case. As @tannergooding pointed out in
#127499 (comment) /
#127499 (comment),
that case is already unreachable on non-AVX-512 hosts because
`Compiler::lookupId` returns `NI_Illegal` for `Vector512` ISA when
AVX-512 is not opportunistically supported. I verified that applying
#127499 alone leaves the test failing - the gate it adds is dead code.
This PR addresses the actual bug; #127499 can be closed.

Fixes #127440.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants