Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Avoid gtCloneExpr in HW helper-intrinsics#16766

Closed
fiigii wants to merge 1 commit into
dotnet:masterfrom
fiigii:cloneexp
Closed

Avoid gtCloneExpr in HW helper-intrinsics#16766
fiigii wants to merge 1 commit into
dotnet:masterfrom
fiigii:cloneexp

Conversation

@fiigii
Copy link
Copy Markdown

@fiigii fiigii commented Mar 5, 2018

This PR changes SSE_Shuffle to internally accept 2 or 3 operands to avoid gtCloneExpr in helper-intrinsics.

The similar techniques can simply solve the gtCloneExpr problems that we are discussing in #16758

@CarolEidt @AndyAyersMS @tannergooding @4creators @mikedn

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Mar 5, 2018

No Merge, just for discussing.

@tannergooding tannergooding added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 5, 2018
op1 = impPopStack().val;
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, gtCloneExpr(op1), gtNewIconNode(0), NI_SSE_Shuffle,
TYP_FLOAT, simdSize);
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, gtNewIconNode(0), NI_SSE_Shuffle, TYP_FLOAT, simdSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this require more changes in codegen to ensure that the appropriate overload of emitIns_SIMD is called?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, I saw the Vector<T> code always uses shuffle as 2-op form. Let me investigate more.

Copy link
Copy Markdown

@4creators 4creators Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to get new internal overload for SSE_Shuffle see comment: #16758 (comment)

You can substitute:

retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, gtNewIconNode(0), NI_SSE2_Shuffle,
                                               TYP_INT, simdSize);

I think it's safe as I do not know any processors up to 10 years back which would support SSE and not support SSE2

Copy link
Copy Markdown

@4creators 4creators Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, what I have seen in #16758 with Compiler::fgMakeMultiUse works really well with HW intrinsics.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, what I have seen in #16758 with Compiler::fgMakeMultiUse works really well with HW intrinsics.

Why not adopt a simpler solution?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use SSE2_Shuffle we just have it already.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found that this solution does not work because VEX-encoding always duplicates dst rather than src for this instruction.

@fiigii fiigii closed this Mar 5, 2018
@fiigii fiigii deleted the cloneexp branch March 5, 2018 23:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

* NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants