Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false.#7407
Conversation
d91fd50 to
9e8ec11
Compare
|
@dotnet-bot test Windows_NT jitstressregs1 |
|
@CarolEidt - Please review this. |
|
ping. |
| #ifdef FEATURE_SIMD | ||
| // If we have GT_JTRUE(GT_EQ/NE(GT_SIMD((in)Equality, v1, v2), true/false)), | ||
| // then we don't need to generate code for GT_EQ/GT_NE, since SIMD (in)Equality intrinsic | ||
| // would set or clear Zero flag. |
There was a problem hiding this comment.
This is just a little confusing, because this is a case where both treeNode and op1 do not require a register. It might be worth breaking out the example, e.g.
simdCompareResult = GT_SIMD((In)Equality, v1, v2)
integerCompareResult = GT_EQ/NE(simdCompareResult, true/false)
GT_JTRUE(integerCompareResult)
And mention that for this case we don't need to generate either CompareResult into a register. #Resolved
There was a problem hiding this comment.
| #ifdef FEATURE_SIMD | ||
| // If we have GT_JTRUE(GT_EQ/NE(GT_SIMD((in)Equality, v1, v2), true/false)), | ||
| // then we don't need to generate code for GT_EQ/GT_NE, since SIMD (in)Equality intrinsic | ||
| // would set or clear Zero flag. |
There was a problem hiding this comment.
Or you could put the more detailed explanation here ... #Resolved
| if (cmpOp1->IsSIMDEqualityOrInequality() && (cmpOp2->IsIntegralConst(0) || cmpOp2->IsIntegralConst(1))) | ||
| { | ||
| // clear dstCount on SIMD node to indicate that | ||
| // result doesn't need to materialized into a register. |
There was a problem hiding this comment.
missing "be" (need to be materialized) #Resolved
| l->clearOperandCounts(cmpOp2); | ||
|
|
||
| // Codegen of SIMD (in)Equality uses target integer reg | ||
| // on for setting flags. The same is not needed on AVX |
There was a problem hiding this comment.
Should this be "only for setting flags"? #Resolved
| // when comparing against Vector Zero. Since we have | ||
| // cleared dstCount, we need to reserve an int type internal | ||
| // register. | ||
| if (compiler->canUseAVX() && cmpOp1->gtGetOp2()->IsIntegralConstVector(0)) |
There was a problem hiding this comment.
Suggestion: consider reversing the condition - I was a little confused at first because the condition is the opposite of the one you are describing above. #Resolved
There was a problem hiding this comment.
|
|
||
| // We would have to reverse compare oper in the following cases: | ||
| // 1) SIMD Equality: Sets Zero flag on equal otherwise clears it. | ||
| // Therefore, if compare oper is == or != against false, we will |
There was a problem hiding this comment.
I would say "against false (0)" here and "against true (1)" below to make it easier to match the code and the text. #Resolved
| // Therefore, if compare oper is == or != against false, we will | ||
| // be checking opposite of what is required. | ||
| // | ||
| // 2) SIMD inEquality: Clears Zero flag on true otherwise clears it. |
There was a problem hiding this comment.
I think this should be "Clears Zero flag on unequal, otherwise sets it."? #Resolved
|
LGTM with some comment suggestions. In reply to: 250616016 [](ancestors = 250616016) |
|
|
…compared against true/false.
Optimize codegen when SIMD (in)Equality that produces bool result is compared against true/false. Commit migrated from dotnet/coreclr@b3f150d
As per Intel TechEmPower benchmark analysis, Kestrel.Internal.Infrastructure.MemoryPoolIterator.Seek() is one of the hot methods. It has the following code in the inner-most loop of a doubly nested while loop.
The if-condition generates the following IR
!byte0Equals.Equals(Vector.Zero)
SIMD (in)equality produces a bool result in a register, which is checked to see != 0 above. Here is the code generated
Here there is no need to produce the result of SIMD opEquality into a register. It would just suffice to set flags. Comparison operation !=0 is a redundant. We should be able to produce the following code
Assuming as per #7358, we optimize codegen of SIMD opEquality against Vector Zero on AVX, resulting code would be
In general, this fix will benefit both SSE2 and AVX codegen of SIMD (in)Equality even while not comparing against Vector Zero.
Summary of code changes:
Lower will recognize the above IR and will clear operand counts on GT_EQ/NE node and dst count on SIMD (in)Equality node. SIMD codegen will similar to genCompareInt() will materialize result in to a reg only when targetReg != REG_NA.
When not comparing against Zero Vector on AVX, SIMD (in)Equality codegen would need 2 XMM regs and one int type reg. Since targetReg is an int type reg, it is used. With dst count cleared on SIMD node, an int type internal register needs to be reserved.
Fix #7382