Skip to content

JIT: Accelerate long -> floating casts on x86#113930

Merged
BruceForstall merged 7 commits intodotnet:mainfrom
saucecontrol:x86convert
May 2, 2025
Merged

JIT: Accelerate long -> floating casts on x86#113930
BruceForstall merged 7 commits intodotnet:mainfrom
saucecontrol:x86convert

Conversation

@saucecontrol
Copy link
Copy Markdown
Member

@saucecontrol saucecontrol commented Mar 26, 2025

This adds support for using EVEX SIMD conversion instructions to handle long/ulong to float/double casts on x86 rather than going through helper calls. With this, all integral -> floating casts are accelerated on x86 with AVX-512.

Typical diff:

 G_M57389_IG01:        ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
-       sub      esp, 8
-       vzeroupper 
-						;; size=6 bbWeight=1 PerfScore 1.25
+						;; size=0 bbWeight=1 PerfScore 0.00
 G_M57389_IG02:        ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
-       push     dword ptr [esp+0x10+0x04]
-       ; npt arg push 0
-       push     dword ptr [esp+0x0C+0x08]
-       ; npt arg push 1
-       call     CORINFO_HELP_LNG2DBL
-       ; gcr arg pop 2
-       fstp     qword ptr [esp]
-       vmovsd   xmm0, qword ptr [esp]
-       vcvtsd2ss xmm0, xmm0, xmm0
+       vmovq    xmm0, qword ptr [esp+0x04]
+       vcvtqq2ps xmm0, xmm0
        vmovd    eax, xmm0
-						;; size=29 bbWeight=1 PerfScore 12.50
+						;; size=16 bbWeight=1 PerfScore 9.00
 G_M57389_IG03:        ; bbWeight=1, epilog, nogc, extend
-       add      esp, 8
        ret      8
-						;; size=6 bbWeight=1 PerfScore 2.25
+						;; size=3 bbWeight=1 PerfScore 2.00

Full diffs

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 26, 2025
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 26, 2025
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saucecontrol saucecontrol changed the title JIT: Accelerated long -> floating casts on x86 JIT: Accelerate long -> floating casts on x86 Mar 26, 2025
Comment thread src/tests/JIT/Regression/JitBlue/Runtime_106338/Runtime_106338.cs Outdated
Comment thread src/coreclr/jit/morph.cpp Outdated
@saucecontrol
Copy link
Copy Markdown
Member Author

saucecontrol commented Mar 28, 2025

This is ready for review. To summarize, it simply replaces helper calls with AVX-512 conversion instructions, as follows:

  • long->double: vcvtqq2pd
  • long->float: vcvtqq2ps
  • ulong->double: vcvtuqq2pd
  • ulong->float: vcvtuqq2pd+vcvtsd2ss (double conversion preserves existing behavior)

Latest diffs show no regressions other than those expected from inlining.


#if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_X86)
if (!tree->TypeIs(TYP_LONG) &&
!(tree->OperIs(GT_CAST) && varTypeIsLong(tree->AsCast()->CastOp()) && varTypeIsFloating(tree)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: negated conditions like this can be hard to read. A small comment covering that we want to handle nodes that produce long or GT_CAST float->long would be beneficial IMO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I actually plan on extending this to handle casts in the opposite direction as well, and that will make this check even more hairy. I'll do something to simplify it then.

#endif // FEATURE_HW_INTRINSICS && TARGET_X86
{
return tree->gtNext;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that from this point onwards it can now also be GT_CAST float rather than only some NODE long seems like a tricky thing that might trip people up in the future.

Comment thread src/coreclr/jit/decomposelongs.cpp
Copy link
Copy Markdown
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Just a couple minor bits of feedback

CC. @dotnet/jit-contrib, @EgorBo for secondary review

@tannergooding
Copy link
Copy Markdown
Member

Ping @dotnet/jit-contrib, @EgorBo for secondary review

@BruceForstall
Copy link
Copy Markdown
Contributor

/azp run runtime-coreclr outerloop, Fuzzlyn

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@BruceForstall BruceForstall merged commit 97ad3a7 into dotnet:main May 2, 2025
160 of 168 checks passed
@saucecontrol saucecontrol deleted the x86convert branch May 4, 2025 15:19
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants