RyuJIT SIMD: optimize codegen when Op(in)Equality that produces bool result is compared against 1 or 0

As per Intel TechEmPower benchmark analysis, Kestrel.Internal.Infrastructure.MemoryPoolIterator.Seek() is one of the hot methods.  It has the following code in the inner-most loop of a doubly nested while loop.

```
var data = new Vector<byte>(array, index);

// The below code is repeated 3 times in the method
// against 3 different byte vectors
var byte0Equals = Vector.Equals(data, byte0Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{
     byte0Index = FindFirstEqualByte(ref byte0Equals);
}

```

The if-condition generates the following IR
**!byte0Equals.Equals(Vector<byte>.Zero)**

```
***** BB13, stmt 61
     ( 10,  9) [000371] ------------             *  stmtExpr  void  (IL 0x109...0x115)
N007 ( 10,  9) [000370] ----G-------             \--*  jmpTrue   void  
N005 (  1,  1) [000368] ------------                |  /--*  const     int    0 $40
N006 (  8,  7) [000369] J---G--N----                \--*  !=        int    $21d
N003 (  2,  2) [000366] ------------                   |  /--*  simd      simd32 int init $148
N002 (  1,  1) [000365] ------------                   |  |  \--*  const     int    0 $40
N004 (  6,  5) [000367] ----G-------                   \--*  simd      int    ubyte == $149
N001 (  3,  2) [000363] ----G-------                      \--*  lclVar    simd32(AX) V18 loc13         $349
```

SIMD (in)equality produces a bool result in a register, which is checked to see != 0 above.  Here is the code generated

```
// SIMD opEquality produces the below code
IN0058:        vmovaps  ymm2, ymm0
IN0059:        vpcmpeqd ymm2, ymm1
IN005a:        vextractf128 ymm3, ymm2, 1
IN005b:        vandps   ymm2, ymm3
IN005c:        vpshufd  ymm3, ymm2, 78
IN005d:        vandps   ymm2, ymm3
IN005e:        vpshufd  ymm3, ymm2, 1
IN005f:        vpand    ymm2, ymm3
IN0060:        vmovd    r14d, xmm2
IN0061:        cmp      r14d, 0xFFFFFFFF
IN0062:        sete     r14b
IN0063:        movzx    r14, r14b

// (SIMD opEquality != 0) produces the following code
test     r14d, r14d
jne      L_M48761_BB15
```

Here there is no need to produce the result of SIMD opEquality into a register.  It would just suffice to set flags.  Comparison operation !=0 is a redundant.  We should be able to produce the following code

```
IN0058:        vmovaps  ymm2, ymm0
IN0059:        vpcmpeqd ymm2, ymm1
IN005a:        vextractf128 ymm3, ymm2, 1
IN005b:        vandps   ymm2, ymm3
IN005c:        vpshufd  ymm3, ymm2, 78
IN005d:        vandps   ymm2, ymm3
IN005e:        vpshufd  ymm3, ymm2, 1
IN005f:        vpand    ymm2, ymm3
IN0060:        vmovd    r14d, xmm2
IN0061:        cmp      r14d, 0xFFFFFFFF

jne      L_M48761_BB15
```

Assuming as per dotnet/runtime#6719, we optimize codegen of SIMD opEquality against Vector Zero on AVX, resulting code would be

```
ptest ymm0, ymm0
jne      L_M48761_BB15 
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT SIMD: optimize codegen when Op(in)Equality that produces bool result is compared against 1 or 0 #6728

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RyuJIT SIMD: optimize codegen when Op(in)Equality that produces bool result is compared against 1 or 0 #6728

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions