Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Fix extra register-dependency on mem-form of vcvtsd/s2ss#17560

Merged
CarolEidt merged 1 commit into
dotnet:masterfrom
fiigii:fixcvt
Apr 17, 2018
Merged

Fix extra register-dependency on mem-form of vcvtsd/s2ss#17560
CarolEidt merged 1 commit into
dotnet:masterfrom
fiigii:fixcvt

Conversation

@fiigii
Copy link
Copy Markdown

@fiigii fiigii commented Apr 13, 2018

Fix https://github.com/dotnet/coreclr/issues/17544

Originally, #14274 attempts to optimize vcvtsd/s2ss/d by keeping the second and third register same to avoid unnecessary register-dependency.

vcvtss2sd xmm0, xmm0, xmm1 ;;; depends on xmm0 and xmm1
vcvtss2sd xmm0, xmm1, xmm1 ;;; depends on xmm1 only

However, that does not work with the containment form. When the last op is a memory address, RyuJIT cannot determine the second register and just generate the default value of VEX.vvvv field (XMM0).

vcvtss2sd xmm6, xmm0, dword ptr [rip-0x11d1f3] ;;; depends on xmm0 that does not belong to vcvtss2sd

This PR reverts the change of #14274 to fix this performance regression, but we need to find a better solution for the containment form in the future.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Apr 13, 2018

@CarolEidt @AndyAyersMS PTAL

@CarolEidt
Copy link
Copy Markdown

@fiigii - thanks!

When the last op is a memory address, RyuJIT cannot determine the second register and just generate the default value of VEX.vvvv field (XMM0).

So, could this have actually led to incorrect code, depending on the contents of XMM0? Or is it the case that, since this is effectively producing a scalar result, it won't matter because those upper bits won't be used?

We should consider this issue when addressing #14523, if not sooner.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Apr 13, 2018

Or is it the case that, since this is effectively producing a scalar result, it won't matter because those upper bits won't be used?

Right, the second register just provides the upper-bits that are never used by scalar programs.

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Apr 14, 2018

test OSX10.12 x64 Checked Innerloop Build and Test

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Apr 16, 2018

Can we merge this PR?

@CarolEidt CarolEidt merged commit c127548 into dotnet:master Apr 17, 2018
@stephentoub
Copy link
Copy Markdown
Member

https://github.com/dotnet/coreclr/issues/17544 is a 2.1 issue, but it's now closed via this PR. I assume this change will be brought through ask mode for 2.1?

@CarolEidt
Copy link
Copy Markdown

I assume this change will be brought through ask mode for 2.1?

I am trying to figure out now whether it meets the bar. It would be good to see it in.

@RussKeldorph
Copy link
Copy Markdown

I would think that if it indeed also fixes #17603, then it's a good candidate for 2.1.

@AndyAyersMS @maryamariyan

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Perf] Benchstone/BenchF/BenchMrk regression of ~15% on x86

6 participants