Skip to content

JIT: Accelerate floating->long casts on x86#125180

Merged
tannergooding merged 11 commits intodotnet:mainfrom
saucecontrol:lng2flt6
Apr 29, 2026
Merged

JIT: Accelerate floating->long casts on x86#125180
tannergooding merged 11 commits intodotnet:mainfrom
saucecontrol:lng2flt6

Conversation

@saucecontrol
Copy link
Copy Markdown
Member

@saucecontrol saucecontrol commented Mar 4, 2026

This adds floating->long/ulong cast codegen for AVX-512 and AVX10.2 on x86. With this, all non-overflow casts are now hardware accelerated. This is the last bit pulled from #116805.

Typical Diff (double->long AVX-512):

-       sub      esp, 8
-       vzeroupper 
-       vmovsd   xmm0, qword ptr [esp+0x0C]
-       sub      esp, 8
-       ; npt arg push 0
-       ; npt arg push 1
-       vmovsd   qword ptr [esp], xmm0
-       call     CORINFO_HELP_DBL2LNG
-       ; gcr arg pop 2
+       vmovsd   xmm0, qword ptr [esp+0x04]
+       vcmpordsd k1, xmm0, xmm0
+       vcmpge_oqsd k2, xmm0, qword ptr [@RWD00]
+       vcvttpd2qq xmm0 {k1}{z}, xmm0
+       vpblendmq xmm0 {k2}, xmm0, qword ptr [@RWD08] {1to2}
+       vmovd    eax, xmm0
+       vpextrd  edx, xmm0, 1
-       add      esp, 8
        ret      8

+RWD00  	dq	43E0000000000000h
+RWD08  	dq	7FFFFFFFFFFFFFFFh
 
-; Total bytes of code 31
+; Total bytes of code 53

Full Diffs

Breakdown of the double->long asm:

; load the scalar double
vmovsd   xmm0, qword ptr [esp+0x04]

; set the low bit of k1 if the scalar value is not NaN
vcmpordsd k1, xmm0, xmm0

; set the low bit of k2 if the input was greater than or equal to 2^63 (nearest double greater than long.MaxValue)
vcmpge_oqsd k2, xmm0, qword ptr [@RWD00]

; convert, using k1 mask bit.  if the mask bit is not set (meaning we have a NaN), set the value to zero
vcvttpd2qq xmm0 {k1}{z}, xmm0

; if the low bit of k2 is set (meaning overflow), set the value to long.MaxValue, otherwise take the conversion result
vpblendmq xmm0 {k2}, xmm0, qword ptr [@RWD08] {1to2}

; extract the two 32-bit halves of the long result
vmovd    eax, xmm0
vpextrd  edx, xmm0, 1

Copilot AI review requested due to automatic review settings March 4, 2026 15:43
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 4, 2026
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends x86 JIT codegen to hardware-accelerate non-overflow floating→long/ulong casts using AVX-512 and AVX10.2, completing the remaining cast-acceleration work pulled from #116805.

Changes:

  • Teach cast helper selection to allow floating↔long casts to stay intrinsic-based on x86 when AVX-512 is available.
  • Add/extend x86 long decomposition logic to generate AVX-512/AVX10.2 sequences for floating→long/ulong and long→floating casts.
  • Introduce a new AVX-512 scalar compare-mask intrinsic and wire it up for immediate bounds + containment.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/coreclr/jit/lowerxarch.cpp Refactors vector constant construction and adds containment support for the new AVX-512 scalar compare-mask intrinsic.
src/coreclr/jit/hwintrinsicxarch.cpp Adds immediate upper-bound handling for the new AVX-512 scalar compare-mask intrinsic.
src/coreclr/jit/hwintrinsiclistxarch.h Introduces AVX512.CompareScalarMask as a new intrinsic mapping to vcmpss/vcmpsd with IMM.
src/coreclr/jit/flowgraph.cpp Updates helper-requirement logic so x86 floating↔long casts can avoid helper calls when AVX-512 is available.
src/coreclr/jit/decomposelongs.cpp Implements the AVX-512/AVX10.2-based lowering/decomposition sequences for floating↔long/ulong on x86.

Comment thread src/coreclr/jit/decomposelongs.cpp Outdated
Comment thread src/coreclr/jit/decomposelongs.cpp Outdated
Copilot AI review requested due to automatic review settings March 4, 2026 16:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@saucecontrol saucecontrol marked this pull request as ready for review March 4, 2026 19:31
Copilot AI review requested due to automatic review settings March 4, 2026 19:31
@saucecontrol
Copy link
Copy Markdown
Member Author

saucecontrol commented Mar 4, 2026

@dotnet/jit-contrib this is ready for review

diffs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/jit/decomposelongs.cpp
Comment thread src/coreclr/jit/decomposelongs.cpp
@JulieLeeMSFT
Copy link
Copy Markdown
Member

@EgorBo, please review this community PR.

Copilot AI review requested due to automatic review settings April 16, 2026 19:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/coreclr/jit/flowgraph.cpp:1347

  • fgCastRequiresHelper on x86 currently only exempts long<->floating casts when InstructionSet_AVX512 is enabled. This PR adds long/floating cast acceleration that can use AVX10.2 (InstructionSet_AVX10v2) as well (e.g., DecomposeLongs::DecomposeCast checks compOpportunisticallyDependsOn(InstructionSet_AVX10v2)). If AVX10v2 is enabled while AVX512 is disabled/unavailable, morphing may still force a helper call and bypass the new codegen. Consider updating the x86 condition to treat AVX10v2 as sufficient (e.g., require helper only when neither AVX512 nor AVX10v2 is available).
#if defined(TARGET_X86) || defined(TARGET_ARM)
    if ((varTypeIsLong(fromType) && varTypeIsFloating(toType)) ||
        (varTypeIsFloating(fromType) && varTypeIsLong(toType)))
    {
#if defined(TARGET_X86)
        return !compOpportunisticallyDependsOn(InstructionSet_AVX512);
#else

Comment thread src/coreclr/jit/hwintrinsiclistxarch.h Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 16, 2026 19:55
Copilot AI review requested due to automatic review settings April 20, 2026 04:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread src/coreclr/jit/gentree.cpp
Comment thread src/coreclr/jit/decomposelongs.cpp Outdated
@tannergooding tannergooding added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2026
Copilot AI review requested due to automatic review settings April 22, 2026 00:47
@dotnet-policy-service dotnet-policy-service Bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Comment thread src/coreclr/jit/decomposelongs.cpp Outdated
Copy link
Copy Markdown
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib, @EgorBo, @kg for secondary review on the community PR

@tannergooding tannergooding requested a review from kg April 22, 2026 17:30
@saucecontrol
Copy link
Copy Markdown
Member Author

Pushed a change to simplify the IR. No change to the codegen.

@tannergooding tannergooding enabled auto-merge (squash) April 29, 2026 05:30
@tannergooding tannergooding merged commit d1163e5 into dotnet:main Apr 29, 2026
135 of 137 checks passed
@saucecontrol saucecontrol deleted the lng2flt6 branch April 29, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants