Description
When using BlendVariable with vectors smaller than 512 bit, codegen may incorrectly emit pblendvb, causing the mask to be misinterpreted.
Reproduction Steps
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
Console.WriteLine(BlendVariable128(Vector128.Create<long>(-1), Vector128<long>.Zero, Vector128.Create(long.MinValue)));
Console.WriteLine(BlendVariable512(Vector512.Create<long>(-1), Vector512<long>.Zero, Vector512.Create(long.MinValue)));
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector128<long> BlendVariable128(Vector128<long> left, Vector128<long> right, Vector128<long> mask)
=> Avx512F.VL.BlendVariable(left, right, mask);
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector512<long> BlendVariable512(Vector512<long> left, Vector512<long> right, Vector512<long> mask)
=> Avx512F.BlendVariable(left, right, mask);
Expected behavior
Both methods should output zero vectors:
<0, 0>
<0, 0, 0, 0, 0, 0, 0, 0>
Actual behavior
Actual output:
<72057594037927935, 72057594037927935>
<0, 0, 0, 0, 0, 0, 0, 0>
Regression?
No, these were new APIs in net10.0 and have had the same behavior since release.
Other information
This is due to RewriteHWIntrinsicBlendv rewriting BlendVariableMask to BlendVariable, which emits pblendvb rather than the desired vpblendmq.
; Method Program:<<Main>$>g__BlendVariable128|0_0(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):System.Runtime.Intrinsics.Vector128`1[long] (FullOpts)
G_M56632_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M56632_IG02: ;; offset=0x0000
vmovups xmm0, xmmword ptr [rdx]
vmovups xmm1, xmmword ptr [r9]
vpblendvb xmm0, xmm0, xmmword ptr [r8], xmm1
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
;; size=22 bbWeight=1 PerfScore 14.25
G_M56632_IG03: ;; offset=0x0016
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 23
vs the correct:
; Method Program:<<Main>$>g__BlendVariable128|0_0(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):System.Runtime.Intrinsics.Vector128`1[long] (FullOpts)
G_M56632_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M56632_IG02: ;; offset=0x0000
vmovups xmm0, xmmword ptr [rdx]
vmovups xmm1, xmmword ptr [r9]
vpmovq2m k1, xmm1
vpblendmq xmm0 {k1}, xmm0, xmmword ptr [r8]
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
;; size=28 bbWeight=1 PerfScore 15.25
G_M56632_IG03: ;; offset=0x001C
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 29
Description
When using
BlendVariablewith vectors smaller than 512 bit, codegen may incorrectly emitpblendvb, causing the mask to be misinterpreted.Reproduction Steps
Expected behavior
Both methods should output zero vectors:
Actual behavior
Actual output:
Regression?
No, these were new APIs in net10.0 and have had the same behavior since release.
Other information
This is due to
RewriteHWIntrinsicBlendvrewritingBlendVariableMasktoBlendVariable, which emitspblendvbrather than the desiredvpblendmq.vs the correct: