Skip to content

JIT: Invalid codegen for Avx512F.VL.BlendVariable and Avx512BW.VL.BlendVariable for integral types larger than byte #127260

@saucecontrol

Description

@saucecontrol

Description

When using BlendVariable with vectors smaller than 512 bit, codegen may incorrectly emit pblendvb, causing the mask to be misinterpreted.

Reproduction Steps

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

Console.WriteLine(BlendVariable128(Vector128.Create<long>(-1), Vector128<long>.Zero, Vector128.Create(long.MinValue)));
Console.WriteLine(BlendVariable512(Vector512.Create<long>(-1), Vector512<long>.Zero, Vector512.Create(long.MinValue)));

[MethodImpl(MethodImplOptions.NoInlining)]
static Vector128<long> BlendVariable128(Vector128<long> left, Vector128<long> right, Vector128<long> mask)
    => Avx512F.VL.BlendVariable(left, right, mask);

[MethodImpl(MethodImplOptions.NoInlining)]
static Vector512<long> BlendVariable512(Vector512<long> left, Vector512<long> right, Vector512<long> mask)
    => Avx512F.BlendVariable(left, right, mask);

Expected behavior

Both methods should output zero vectors:

<0, 0>
<0, 0, 0, 0, 0, 0, 0, 0>

Actual behavior

Actual output:

<72057594037927935, 72057594037927935>
<0, 0, 0, 0, 0, 0, 0, 0>

Regression?

No, these were new APIs in net10.0 and have had the same behavior since release.

Other information

This is due to RewriteHWIntrinsicBlendv rewriting BlendVariableMask to BlendVariable, which emits pblendvb rather than the desired vpblendmq.

; Method Program:<<Main>$>g__BlendVariable128|0_0(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):System.Runtime.Intrinsics.Vector128`1[long] (FullOpts)
G_M56632_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M56632_IG02:  ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vmovups  xmm1, xmmword ptr [r9]
       vpblendvb xmm0, xmm0, xmmword ptr [r8], xmm1
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
						;; size=22 bbWeight=1 PerfScore 14.25

G_M56632_IG03:  ;; offset=0x0016
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 23

vs the correct:

; Method Program:<<Main>$>g__BlendVariable128|0_0(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):System.Runtime.Intrinsics.Vector128`1[long] (FullOpts)
G_M56632_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M56632_IG02:  ;; offset=0x0000
       vmovups  xmm0, xmmword ptr [rdx]
       vmovups  xmm1, xmmword ptr [r9]
       vpmovq2m k1, xmm1
       vpblendmq xmm0 {k1}, xmm0, xmmword ptr [r8]
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
						;; size=28 bbWeight=1 PerfScore 15.25

G_M56632_IG03:  ;; offset=0x001C
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 29

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions