Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. by sivarv · Pull Request #7407 · dotnet/coreclr

sivarv · 2016-09-28T23:36:42Z

As per Intel TechEmPower benchmark analysis, Kestrel.Internal.Infrastructure.MemoryPoolIterator.Seek() is one of the hot methods. It has the following code in the inner-most loop of a doubly nested while loop.

var data = new Vector<byte>(array, index);

// The below code is repeated 3 times in the method
// against 3 different byte vectors
var byte0Equals = Vector.Equals(data, byte0Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{
     byte0Index = FindFirstEqualByte(ref byte0Equals);
}

The if-condition generates the following IR
!byte0Equals.Equals(Vector.Zero)

***** BB13, stmt 61
     ( 10,  9) [000371] ------------             *  stmtExpr  void  (IL 0x109...0x115)
N007 ( 10,  9) [000370] ----G-------             \--*  jmpTrue   void  
N005 (  1,  1) [000368] ------------                |  /--*  const     int    0 $40
N006 (  8,  7) [000369] J---G--N----                \--*  !=        int    $21d
N003 (  2,  2) [000366] ------------                   |  /--*  simd      simd32 int init $148
N002 (  1,  1) [000365] ------------                   |  |  \--*  const     int    0 $40
N004 (  6,  5) [000367] ----G-------                   \--*  simd      int    ubyte == $149
N001 (  3,  2) [000363] ----G-------                      \--*  lclVar    simd32(AX) V18 loc13         $349

SIMD (in)equality produces a bool result in a register, which is checked to see != 0 above. Here is the code generated

// SIMD opEquality produces the below code
IN0058:        vmovaps  ymm2, ymm0
IN0059:        vpcmpeqd ymm2, ymm1
IN005a:        vextractf128 ymm3, ymm2, 1
IN005b:        vandps   ymm2, ymm3
IN005c:        vpshufd  ymm3, ymm2, 78
IN005d:        vandps   ymm2, ymm3
IN005e:        vpshufd  ymm3, ymm2, 1
IN005f:        vpand    ymm2, ymm3
IN0060:        vmovd    r14d, xmm2
IN0061:        cmp      r14d, 0xFFFFFFFF
IN0062:        sete     r14b
IN0063:        movzx    r14, r14b

// (SIMD opEquality != 0) produces the following code
test     r14d, r14d
jne      L_M48761_BB15

Here there is no need to produce the result of SIMD opEquality into a register. It would just suffice to set flags. Comparison operation !=0 is a redundant. We should be able to produce the following code

IN0058:        vmovaps  ymm2, ymm0
IN0059:        vpcmpeqd ymm2, ymm1
IN005a:        vextractf128 ymm3, ymm2, 1
IN005b:        vandps   ymm2, ymm3
IN005c:        vpshufd  ymm3, ymm2, 78
IN005d:        vandps   ymm2, ymm3
IN005e:        vpshufd  ymm3, ymm2, 1
IN005f:        vpand    ymm2, ymm3
IN0060:        vmovd    r14d, xmm2
IN0061:        cmp      r14d, 0xFFFFFFFF

jne      L_M48761_BB15

Assuming as per #7358, we optimize codegen of SIMD opEquality against Vector Zero on AVX, resulting code would be

ptest ymm0, ymm0
jne      L_M48761_BB15

In general, this fix will benefit both SSE2 and AVX codegen of SIMD (in)Equality even while not comparing against Vector Zero.

Summary of code changes:
Lower will recognize the above IR and will clear operand counts on GT_EQ/NE node and dst count on SIMD (in)Equality node. SIMD codegen will similar to genCompareInt() will materialize result in to a reg only when targetReg != REG_NA.

When not comparing against Zero Vector on AVX, SIMD (in)Equality codegen would need 2 XMM regs and one int type reg. Since targetReg is an int type reg, it is used. With dst count cleared on SIMD node, an int type internal register needs to be reserved.

Fix #7382

sivarv · 2016-09-28T23:43:42Z

@dotnet-bot test Windows_NT jitstressregs1
@dotnet-bot test Windows_NT jitstressregs2
@dotnet-bot test Windows_NT jitstressregs3
@dotnet-bot test Windows_NT jitstressregs4
@dotnet-bot test Windows_NT jitstressregs8
@dotnet-bot test Windows_NT jitstressregs0x10
@dotnet-bot test Windows_NT jitstressregs0x80
@dotnet-bot test Windows_NT jitstress2
@dotnet-bot test Windows_NT corefx_baseline
@dotnet-bot test Ubuntu jitstressregs1
@dotnet-bot test Ubuntu jitstressregs2
@dotnet-bot test Ubuntu jitstressregs3
@dotnet-bot test Ubuntu jitstressregs4
@dotnet-bot test Ubuntu jitstressregs8
@dotnet-bot test Ubuntu jitstressregs0x10
@dotnet-bot test Ubuntu jitstressregs0x80
@dotnet-bot test Ubuntu jitstress2
@dotnet-bot test Ubuntu corefx_baseline

sivarv · 2016-09-29T03:45:17Z

@CarolEidt - Please review this.
CC @dotnet/jit-contrib

sivarv · 2016-09-29T22:58:28Z

ping.

CarolEidt · 2016-09-29T23:13:18Z

+#ifdef FEATURE_SIMD
+    // If we have GT_JTRUE(GT_EQ/NE(GT_SIMD((in)Equality, v1, v2), true/false)),
+    // then we don't need to generate code for GT_EQ/GT_NE, since SIMD (in)Equality intrinsic
+    // would set or clear Zero flag.


This is just a little confusing, because this is a case where both treeNode and op1 do not require a register. It might be worth breaking out the example, e.g.
simdCompareResult = GT_SIMD((In)Equality, v1, v2)
integerCompareResult = GT_EQ/NE(simdCompareResult, true/false)
GT_JTRUE(integerCompareResult)
And mention that for this case we don't need to generate either CompareResult into a register. #Resolved

I have added this comment in lowexarch.cpp

In reply to: 81253992 [](ancestors = 81253992)

CarolEidt · 2016-09-29T23:14:12Z

+#ifdef FEATURE_SIMD
+            // If we have GT_JTRUE(GT_EQ/NE(GT_SIMD((in)Equality, v1, v2), true/false)),
+            // then we don't need to generate code for GT_EQ/GT_NE, since SIMD (in)Equality intrinsic
+            // would set or clear Zero flag.


Or you could put the more detailed explanation here ... #Resolved

CarolEidt · 2016-09-29T23:14:57Z

+                if (cmpOp1->IsSIMDEqualityOrInequality() && (cmpOp2->IsIntegralConst(0) || cmpOp2->IsIntegralConst(1)))
+                {
+                    // clear dstCount on SIMD node to indicate that
+                    // result doesn't need to materialized into a register.


missing "be" (need to be materialized) #Resolved

CarolEidt · 2016-09-29T23:15:28Z

+                    l->clearOperandCounts(cmpOp2);
+
+                    // Codegen of SIMD (in)Equality uses target integer reg
+                    // on for setting flags.  The same is not needed on AVX


Should this be "only for setting flags"? #Resolved

CarolEidt · 2016-09-29T23:17:00Z

+                    // when comparing against Vector Zero.  Since we have
+                    // cleared dstCount, we need to reserve an int type internal
+                    // register.
+                    if (compiler->canUseAVX() && cmpOp1->gtGetOp2()->IsIntegralConstVector(0))


Suggestion: consider reversing the condition - I was a little confused at first because the condition is the opposite of the one you are describing above. #Resolved

I have updated comment to match the condition.

In reply to: 81254398 [](ancestors = 81254398)

CarolEidt · 2016-09-29T23:19:02Z

+
+                    // We would have to reverse compare oper in the following cases:
+                    // 1) SIMD Equality: Sets Zero flag on equal otherwise clears it.
+                    //    Therefore, if compare oper is == or != against false, we will


I would say "against false (0)" here and "against true (1)" below to make it easier to match the code and the text. #Resolved

CarolEidt · 2016-09-29T23:25:39Z

+                    //    Therefore, if compare oper is == or != against false, we will
+                    //    be checking opposite of what is required.
+                    //
+                    // 2) SIMD inEquality: Clears Zero flag on true otherwise clears it.


I think this should be "Clears Zero flag on unequal, otherwise sets it."? #Resolved

CarolEidt · 2016-09-29T23:28:36Z

LGTM with some comment suggestions.

In reply to: 250616016 [](ancestors = 250616016)

CarolEidt · 2016-09-29T23:28:39Z

…compared against true/false.

Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Commit migrated from dotnet/coreclr@b3f150d

sivarv added optimization area-CodeGen labels Sep 28, 2016

sivarv self-assigned this Sep 28, 2016

dnfclas added the cla-already-signed label Sep 28, 2016

sivarv force-pushed the simdOpt branch 2 times, most recently from d91fd50 to 9e8ec11 Compare September 28, 2016 23:40

sivarv changed the title ~~Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false.~~ [WIP]Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Sep 28, 2016

benaadams mentioned this pull request Sep 28, 2016

Consider optimizing the use of Vector in MemoryPoolIterator aspnet/KestrelHttpServer#1129

Closed

sivarv changed the title ~~[WIP]Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false.~~ Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Sep 29, 2016

sivarv force-pushed the simdOpt branch from 9e8ec11 to ac5d188 Compare September 29, 2016 17:28

CarolEidt reviewed Sep 29, 2016

View reviewed changes

Optimize codegen when SIMD (in)Equality that produces bool result is …

9efcc72

…compared against true/false.

sivarv force-pushed the simdOpt branch from ac5d188 to 9efcc72 Compare September 29, 2016 23:36

sivarv merged commit b3f150d into dotnet:master Sep 30, 2016

benaadams mentioned this pull request Oct 1, 2016

Use Ben's Magic Number for FindFirstEqualByte aspnet/KestrelHttpServer#1136

Closed

benaadams mentioned this pull request Mar 20, 2017

Vectorize SpanHelpers.IndexOf for byte dotnet/corefx#17143

Merged

Conversation

sivarv commented Sep 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sivarv commented Sep 28, 2016

Uh oh!

sivarv commented Sep 29, 2016

Uh oh!

sivarv commented Sep 29, 2016

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivarv Sep 29, 2016

Choose a reason for hiding this comment

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivarv Sep 29, 2016

Choose a reason for hiding this comment

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt Sep 29, 2016 • edited by sivarv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt commented Sep 29, 2016

Uh oh!

CarolEidt commented Sep 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sivarv commented Sep 28, 2016 •

edited

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading

CarolEidt Sep 29, 2016 •

edited by sivarv

Loading