Avoid gtCloneExpr in HW helper-intrinsics#16766
Conversation
|
No Merge, just for discussing. |
| op1 = impPopStack().val; | ||
| retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, gtCloneExpr(op1), gtNewIconNode(0), NI_SSE_Shuffle, | ||
| TYP_FLOAT, simdSize); | ||
| retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, gtNewIconNode(0), NI_SSE_Shuffle, TYP_FLOAT, simdSize); |
There was a problem hiding this comment.
Wouldn't this require more changes in codegen to ensure that the appropriate overload of emitIns_SIMD is called?
There was a problem hiding this comment.
Probably not, I saw the Vector<T> code always uses shuffle as 2-op form. Let me investigate more.
There was a problem hiding this comment.
It is not necessary to get new internal overload for SSE_Shuffle see comment: #16758 (comment)
You can substitute:
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, gtNewIconNode(0), NI_SSE2_Shuffle,
TYP_INT, simdSize);I think it's safe as I do not know any processors up to 10 years back which would support SSE and not support SSE2
There was a problem hiding this comment.
Otherwise, what I have seen in #16758 with Compiler::fgMakeMultiUse works really well with HW intrinsics.
There was a problem hiding this comment.
Otherwise, what I have seen in #16758 with Compiler::fgMakeMultiUse works really well with HW intrinsics.
Why not adopt a simpler solution?
There was a problem hiding this comment.
I would just use SSE2_Shuffle we just have it already.
There was a problem hiding this comment.
I just found that this solution does not work because VEX-encoding always duplicates dst rather than src for this instruction.
This PR changes
SSE_Shuffleto internally accept 2 or 3 operands to avoidgtCloneExprin helper-intrinsics.The similar techniques can simply solve the
gtCloneExprproblems that we are discussing in #16758@CarolEidt @AndyAyersMS @tannergooding @4creators @mikedn