As per Intel TechEmPower benchmark analysis, Kestrel.Internal.Infrastructure.MemoryPoolIterator.Seek() is one of the hot methods. It has the following code in the inner-most loop of a doubly nested while loop.
var data = new Vector<byte>(array, index);
// The below code is repeated 3 times in the method
// against 3 different byte vectors
var byte0Equals = Vector.Equals(data, byte0Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{
byte0Index = FindFirstEqualByte(ref byte0Equals);
}
The if-condition generates the following IR
!byte0Equals.Equals(Vector.Zero)
***** BB13, stmt 61
( 10, 9) [000371] ------------ * stmtExpr void (IL 0x109...0x115)
N007 ( 10, 9) [000370] ----G------- \--* jmpTrue void
N005 ( 1, 1) [000368] ------------ | /--* const int 0 $40
N006 ( 8, 7) [000369] J---G--N---- \--* != int $21d
N003 ( 2, 2) [000366] ------------ | /--* simd simd32 int init $148
N002 ( 1, 1) [000365] ------------ | | \--* const int 0 $40
N004 ( 6, 5) [000367] ----G------- \--* simd int ubyte == $149
N001 ( 3, 2) [000363] ----G------- \--* lclVar simd32(AX) V18 loc13 $349
SIMD (in)equality produces a bool result in a register, which is checked to see != 0 above. Here is the code generated
// SIMD opEquality produces the below code
IN0058: vmovaps ymm2, ymm0
IN0059: vpcmpeqd ymm2, ymm1
IN005a: vextractf128 ymm3, ymm2, 1
IN005b: vandps ymm2, ymm3
IN005c: vpshufd ymm3, ymm2, 78
IN005d: vandps ymm2, ymm3
IN005e: vpshufd ymm3, ymm2, 1
IN005f: vpand ymm2, ymm3
IN0060: vmovd r14d, xmm2
IN0061: cmp r14d, 0xFFFFFFFF
IN0062: sete r14b
IN0063: movzx r14, r14b
// (SIMD opEquality != 0) produces the following code
test r14d, r14d
jne L_M48761_BB15
Here there is no need to produce the result of SIMD opEquality into a register. It would just suffice to set flags. Comparison operation !=0 is a redundant. We should be able to produce the following code
IN0058: vmovaps ymm2, ymm0
IN0059: vpcmpeqd ymm2, ymm1
IN005a: vextractf128 ymm3, ymm2, 1
IN005b: vandps ymm2, ymm3
IN005c: vpshufd ymm3, ymm2, 78
IN005d: vandps ymm2, ymm3
IN005e: vpshufd ymm3, ymm2, 1
IN005f: vpand ymm2, ymm3
IN0060: vmovd r14d, xmm2
IN0061: cmp r14d, 0xFFFFFFFF
jne L_M48761_BB15
Assuming as per #6719, we optimize codegen of SIMD opEquality against Vector Zero on AVX, resulting code would be
ptest ymm0, ymm0
jne L_M48761_BB15
As per Intel TechEmPower benchmark analysis, Kestrel.Internal.Infrastructure.MemoryPoolIterator.Seek() is one of the hot methods. It has the following code in the inner-most loop of a doubly nested while loop.
The if-condition generates the following IR
!byte0Equals.Equals(Vector.Zero)
SIMD (in)equality produces a bool result in a register, which is checked to see != 0 above. Here is the code generated
Here there is no need to produce the result of SIMD opEquality into a register. It would just suffice to set flags. Comparison operation !=0 is a redundant. We should be able to produce the following code
Assuming as per #6719, we optimize codegen of SIMD opEquality against Vector Zero on AVX, resulting code would be