Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Jun 18, 2020

Fixes #37802

Try to avoid GT_COMMA insertion before optHoistLoopCode phase to let it hoist such expressions first. A minimal repro from #37802:

static void Test(double x)
{
    for (int i = 0; i < 1000; i++)
    {
        Consume(Math.Exp(-x * 2)); // Math.Exp(-x * 2) should be hoisted from the loop
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void Consume(double x){ }

Current codegen:

; Method Program:Test(double)
G_M55510_IG01:
       56                   push     rsi
       4883EC30             sub      rsp, 48
       C5F877               vzeroupper 
       C5F829742420         vmovaps  qword ptr [rsp+20H], xmm6
						;; bbWeight=1    PerfScore 5.25

G_M55510_IG02:
       33F6                 xor      esi, esi
       C5F828F0             vmovaps  xmm6, xmm0
       C5FB100D3C000000     vmovsd   xmm1, qword ptr [reloc @RWD08]
       C5C857F1             vxorps   xmm6, xmm1
						;; bbWeight=1    PerfScore 2.83

G_M55510_IG03:
       C5F828C6             vmovaps  xmm0, xmm6
       C5FB58C0             vaddsd   xmm0, xmm0, xmm0
       E85323B45F           call     System.Math:Exp(double):double
       E88699FEFF           call     Program:Consume(double)
       FFC6                 inc      esi
       81FEE8030000         cmp      esi, 0x3E8
       7CE4                 jl       SHORT G_M55510_IG03
						;; bbWeight=4    PerfScore 27.00

G_M55510_IG04:
       C5F828742420         vmovaps  xmm6, qword ptr [rsp+20H]
       4883C430             add      rsp, 48
       5E                   pop      rsi
       C3                   ret      
						;; bbWeight=1    PerfScore 5.75
RWD00  dq	0000000000000000h
RWD08  dq	8000000000000000h
; Total bytes of code: 72

New codegen:

; Method Program:Test(double)
G_M55510_IG01:
       56                   push     rsi
       4883EC30             sub      rsp, 48
       C5F877               vzeroupper 
       C5F829742420         vmovaps  qword ptr [rsp+20H], xmm6
						;; bbWeight=1    PerfScore 5.25

G_M55510_IG02:
       33F6                 xor      esi, esi
       C5FB100D40000000     vmovsd   xmm1, qword ptr [reloc @RWD08]
       C5F857C1             vxorps   xmm0, xmm1
       C5FB58C0             vaddsd   xmm0, xmm0, xmm0
       E85B23B75F           call     System.Math:Exp(double):double
       C5F828F0             vmovaps  xmm6, xmm0
						;; bbWeight=1    PerfScore 6.83

G_M55510_IG03:
       C5F828C6             vmovaps  xmm0, xmm6
       E88699FEFF           call     Program:Consume(double)
       FFC6                 inc      esi
       81FEE8030000         cmp      esi, 0x3E8
       7CED                 jl       SHORT G_M55510_IG03
						;; bbWeight=4    PerfScore 11.00

G_M55510_IG04:
       C5F828742420         vmovaps  xmm6, qword ptr [rsp+20H]
       4883C430             add      rsp, 48
       5E                   pop      rsi
       C3                   ret      
						;; bbWeight=1    PerfScore 5.75
RWD00  dq	0000000000000000h
RWD08  dq	8000000000000000h
; Total bytes of code: 72

Diff: https://www.diffchecker.com/y6dL37Qi

/cc @AndyAyersMS

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 18, 2020
@EgorBo
Copy link
Member Author

EgorBo commented Jun 18, 2020

"Cast" seems also is not hoisted:

public class Program
{
    static void Test(object o)
    {
        for (int i = 0; i < 1000; i++) 
            Consume((Program) o); // cast o to Program is not hoisted from the loop
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Consume(Program p) {}
}

@EgorBo
Copy link
Member Author

EgorBo commented Jun 18, 2020

GT_MOD by constant also inserts a GT_COMMA in the global morph phase and prevents the following code to be optimized;

static void Test(int x)
{
    for (int i = 0; i < 1000; i++) 
        Consume(-x % 10); // -x % 10 is not hoisted out of the loop
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void Consume(int x) {}

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast can cause exceptions, so hoisting is problematic.

How hard is it to fix the GT_MOD case?

// if op1 is not a leaf/local we have to introduce a temp via GT_COMMA.
// Unfortunately, it's not optHoistLoopCode-friendly yet so let's do it later
// in the Rationalization phase.
if (!needsComma || (mostRecentlyActivePhase > PHASE_HOIST_LOOP_CODE))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually kick in later? If so you might instead check for something like

comp->fgOrder == Compiler::FGOrderLinear

as mostRecentlyActivePhase is something we use for diagnostics but not to influence jit behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS it does for the snippet in the description. that MUL is visited from RewriteIntrinsicAsUserCall during the rationalization phase (Exp is not intrinsic so we convert it back to a normal call and call fgMorphArgs for the arg). However, it won't be visited for:

static void Test(double x)
{
    for (int i = 0; i < 1000; i++)
    {
        Consume(-x * 2); // Math.Exp(-x * 2) should be hoisted from the loop
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void Consume(double x) { }

the expression will be hoisted but jit won't re-visit that node during rationalization.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing this!

@AndyAyersMS AndyAyersMS merged commit 3b5522c into dotnet:master Jun 25, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Perf -40%] Benchstone.BenchF.Simpsn.Test

3 participants