Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false.#7407

Merged
sivarv merged 1 commit into
dotnet:masterfrom
sivarv:simdOpt
Sep 30, 2016
Merged

Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false.#7407
sivarv merged 1 commit into
dotnet:masterfrom
sivarv:simdOpt

Conversation

@sivarv
Copy link
Copy Markdown
Member

@sivarv sivarv commented Sep 28, 2016

As per Intel TechEmPower benchmark analysis, Kestrel.Internal.Infrastructure.MemoryPoolIterator.Seek() is one of the hot methods. It has the following code in the inner-most loop of a doubly nested while loop.

var data = new Vector<byte>(array, index);

// The below code is repeated 3 times in the method
// against 3 different byte vectors
var byte0Equals = Vector.Equals(data, byte0Vector);
if (!byte0Equals.Equals(Vector<byte>.Zero))
{
     byte0Index = FindFirstEqualByte(ref byte0Equals);
}

The if-condition generates the following IR
!byte0Equals.Equals(Vector.Zero)

***** BB13, stmt 61
     ( 10,  9) [000371] ------------             *  stmtExpr  void  (IL 0x109...0x115)
N007 ( 10,  9) [000370] ----G-------             \--*  jmpTrue   void  
N005 (  1,  1) [000368] ------------                |  /--*  const     int    0 $40
N006 (  8,  7) [000369] J---G--N----                \--*  !=        int    $21d
N003 (  2,  2) [000366] ------------                   |  /--*  simd      simd32 int init $148
N002 (  1,  1) [000365] ------------                   |  |  \--*  const     int    0 $40
N004 (  6,  5) [000367] ----G-------                   \--*  simd      int    ubyte == $149
N001 (  3,  2) [000363] ----G-------                      \--*  lclVar    simd32(AX) V18 loc13         $349

SIMD (in)equality produces a bool result in a register, which is checked to see != 0 above. Here is the code generated

// SIMD opEquality produces the below code
IN0058:        vmovaps  ymm2, ymm0
IN0059:        vpcmpeqd ymm2, ymm1
IN005a:        vextractf128 ymm3, ymm2, 1
IN005b:        vandps   ymm2, ymm3
IN005c:        vpshufd  ymm3, ymm2, 78
IN005d:        vandps   ymm2, ymm3
IN005e:        vpshufd  ymm3, ymm2, 1
IN005f:        vpand    ymm2, ymm3
IN0060:        vmovd    r14d, xmm2
IN0061:        cmp      r14d, 0xFFFFFFFF
IN0062:        sete     r14b
IN0063:        movzx    r14, r14b

// (SIMD opEquality != 0) produces the following code
test     r14d, r14d
jne      L_M48761_BB15

Here there is no need to produce the result of SIMD opEquality into a register. It would just suffice to set flags. Comparison operation !=0 is a redundant. We should be able to produce the following code

IN0058:        vmovaps  ymm2, ymm0
IN0059:        vpcmpeqd ymm2, ymm1
IN005a:        vextractf128 ymm3, ymm2, 1
IN005b:        vandps   ymm2, ymm3
IN005c:        vpshufd  ymm3, ymm2, 78
IN005d:        vandps   ymm2, ymm3
IN005e:        vpshufd  ymm3, ymm2, 1
IN005f:        vpand    ymm2, ymm3
IN0060:        vmovd    r14d, xmm2
IN0061:        cmp      r14d, 0xFFFFFFFF

jne      L_M48761_BB15

Assuming as per #7358, we optimize codegen of SIMD opEquality against Vector Zero on AVX, resulting code would be

ptest ymm0, ymm0
jne      L_M48761_BB15 

In general, this fix will benefit both SSE2 and AVX codegen of SIMD (in)Equality even while not comparing against Vector Zero.

Summary of code changes:
Lower will recognize the above IR and will clear operand counts on GT_EQ/NE node and dst count on SIMD (in)Equality node. SIMD codegen will similar to genCompareInt() will materialize result in to a reg only when targetReg != REG_NA.

When not comparing against Zero Vector on AVX, SIMD (in)Equality codegen would need 2 XMM regs and one int type reg. Since targetReg is an int type reg, it is used. With dst count cleared on SIMD node, an int type internal register needs to be reserved.

Fix #7382

@sivarv
Copy link
Copy Markdown
Member Author

sivarv commented Sep 28, 2016

@dotnet-bot test Windows_NT jitstressregs1
@dotnet-bot test Windows_NT jitstressregs2
@dotnet-bot test Windows_NT jitstressregs3
@dotnet-bot test Windows_NT jitstressregs4
@dotnet-bot test Windows_NT jitstressregs8
@dotnet-bot test Windows_NT jitstressregs0x10
@dotnet-bot test Windows_NT jitstressregs0x80
@dotnet-bot test Windows_NT jitstress2
@dotnet-bot test Windows_NT corefx_baseline
@dotnet-bot test Ubuntu jitstressregs1
@dotnet-bot test Ubuntu jitstressregs2
@dotnet-bot test Ubuntu jitstressregs3
@dotnet-bot test Ubuntu jitstressregs4
@dotnet-bot test Ubuntu jitstressregs8
@dotnet-bot test Ubuntu jitstressregs0x10
@dotnet-bot test Ubuntu jitstressregs0x80
@dotnet-bot test Ubuntu jitstress2
@dotnet-bot test Ubuntu corefx_baseline

@sivarv sivarv changed the title Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. [WIP]Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Sep 28, 2016
@sivarv sivarv changed the title [WIP]Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Sep 29, 2016
@sivarv
Copy link
Copy Markdown
Member Author

sivarv commented Sep 29, 2016

@CarolEidt - Please review this.
CC @dotnet/jit-contrib

@sivarv
Copy link
Copy Markdown
Member Author

sivarv commented Sep 29, 2016

ping.

Comment thread src/jit/codegenxarch.cpp
#ifdef FEATURE_SIMD
// If we have GT_JTRUE(GT_EQ/NE(GT_SIMD((in)Equality, v1, v2), true/false)),
// then we don't need to generate code for GT_EQ/GT_NE, since SIMD (in)Equality intrinsic
// would set or clear Zero flag.
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a little confusing, because this is a case where both treeNode and op1 do not require a register. It might be worth breaking out the example, e.g.
simdCompareResult = GT_SIMD((In)Equality, v1, v2)
integerCompareResult = GT_EQ/NE(simdCompareResult, true/false)
GT_JTRUE(integerCompareResult)
And mention that for this case we don't need to generate either CompareResult into a register. #Resolved

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this comment in lowexarch.cpp


In reply to: 81253992 [](ancestors = 81253992)

Comment thread src/jit/lowerxarch.cpp Outdated
#ifdef FEATURE_SIMD
// If we have GT_JTRUE(GT_EQ/NE(GT_SIMD((in)Equality, v1, v2), true/false)),
// then we don't need to generate code for GT_EQ/GT_NE, since SIMD (in)Equality intrinsic
// would set or clear Zero flag.
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could put the more detailed explanation here ... #Resolved

Comment thread src/jit/lowerxarch.cpp Outdated
if (cmpOp1->IsSIMDEqualityOrInequality() && (cmpOp2->IsIntegralConst(0) || cmpOp2->IsIntegralConst(1)))
{
// clear dstCount on SIMD node to indicate that
// result doesn't need to materialized into a register.
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "be" (need to be materialized) #Resolved

Comment thread src/jit/lowerxarch.cpp Outdated
l->clearOperandCounts(cmpOp2);

// Codegen of SIMD (in)Equality uses target integer reg
// on for setting flags. The same is not needed on AVX
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "only for setting flags"? #Resolved

Comment thread src/jit/lowerxarch.cpp
// when comparing against Vector Zero. Since we have
// cleared dstCount, we need to reserve an int type internal
// register.
if (compiler->canUseAVX() && cmpOp1->gtGetOp2()->IsIntegralConstVector(0))
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: consider reversing the condition - I was a little confused at first because the condition is the opposite of the one you are describing above. #Resolved

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated comment to match the condition.


In reply to: 81254398 [](ancestors = 81254398)

Comment thread src/jit/lowerxarch.cpp Outdated

// We would have to reverse compare oper in the following cases:
// 1) SIMD Equality: Sets Zero flag on equal otherwise clears it.
// Therefore, if compare oper is == or != against false, we will
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "against false (0)" here and "against true (1)" below to make it easier to match the code and the text. #Resolved

Comment thread src/jit/lowerxarch.cpp Outdated
// Therefore, if compare oper is == or != against false, we will
// be checking opposite of what is required.
//
// 2) SIMD inEquality: Clears Zero flag on true otherwise clears it.
Copy link
Copy Markdown

@CarolEidt CarolEidt Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be "Clears Zero flag on unequal, otherwise sets it."? #Resolved

@CarolEidt
Copy link
Copy Markdown

LGTM with some comment suggestions.


In reply to: 250616016 [](ancestors = 250616016)

@CarolEidt
Copy link
Copy Markdown

:shipit:

@sivarv sivarv merged commit b3f150d into dotnet:master Sep 30, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false.

Commit migrated from dotnet/coreclr@b3f150d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RyuJIT SIMD: optimize codegen when Op(in)Equality that produces bool result is compared against 1 or 0

3 participants