Skip to content

Conversation

@AndyAyersMS
Copy link
Member

Update the check for arg invariance to include addresses of fields in local
structs. This allows the inliner to directly substitute more arguments into
the body of the inlinee.

Resolves dotnet/coreclr#27630.

Update the check for arg invariance to include addresses of fields in local
structs. This allows the inliner to directly substitute more arguments into
the body of the inlinee.

Resolves dotnet/coreclr#27630.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 13, 2019
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Dec 13, 2019

@dotnet/jit-contrib PTAL

Fixes the M1 and M2 CQ divergence in dotnet/coreclr#27630.

Diff summary for x86:

Total bytes of diff: -127 (-0.00% of base)
    diff is an improvement.

Top file regressions (bytes):
          17 : CommandLine.dasm (0.01% of base)
           9 : NuGet.Packaging.dasm (0.01% of base)
           2 : xunit.execution.dotnet.dasm (0.00% of base)
           1 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)
           1 : NuGet.Frameworks.dasm (0.00% of base)
           1 : NuGet.Protocol.Core.v3.dasm (0.00% of base)
           1 : System.Linq.Queryable.dasm (0.00% of base)
           1 : xunit.assert.dasm (0.00% of base)
           1 : xunit.core.dasm (0.00% of base)

Top file improvements (bytes):
         -60 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -28 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
         -25 : System.Private.CoreLib.dasm (-0.00% of base)
         -18 : System.Reflection.Metadata.dasm (-0.00% of base)
         -11 : System.Collections.dasm (-0.00% of base)
          -6 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
          -5 : System.Linq.Expressions.dasm (-0.00% of base)
          -4 : System.Drawing.Primitives.dasm (-0.01% of base)
          -3 : System.Data.Common.dasm (-0.00% of base)
          -1 : System.Linq.Parallel.dasm (-0.00% of base)

19 total files with Code Size differences (10 improved, 9 regressed), 110 unchanged.

Top method regressions (bytes):
           8 ( 0.98% of base) : System.Collections.dasm - HashSet`1:Remove(double):bool:this
           7 ( 1.61% of base) : System.Private.CoreLib.dasm - Vector64`1:GetHashCode():int:this (6 methods)
           2 ( 0.89% of base) : System.Private.CoreLib.dasm - <>f__AnonymousType0`1:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType0`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType3`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType4`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType5`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType7`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType8`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType9`2:GetHashCode():int:this (7 methods)

Top method improvements (bytes):
         -36 (-4.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - DeclarationOrderSymbolComparer:Compare(ISymbol,ISymbol):int:this
         -13 (-1.18% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(ref,int):int (7 methods)
         -12 (-1.83% of base) : System.Private.CoreLib.dasm - Vector128`1:GetHashCode():int:this (6 methods)
          -9 (-23.68% of base) : System.Reflection.Metadata.dasm - EntityHandle:Compare(int,int):int
          -9 (-20.93% of base) : System.Reflection.Metadata.dasm - HandleComparer:Compare(int,int):int:this
          -8 (-5.93% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DeclarationOrderTypeSymbolComparer:Compare(Symbol,Symbol):int:this
          -8 (-0.72% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(ImmutableArray`1,int):int (7 methods)
          -8 (-3.02% of base) : Microsoft.CodeAnalysis.dasm - TypesByNamespaceSortComparer:Compare(IGrouping`2,IGrouping`2):int:this
          -8 (-2.01% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VB$AnonymousType_0`2:GetHashCode():int:this (7 methods)
          -8 (-2.01% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VB$AnonymousType_1`2:GetHashCode():int:this (7 methods)

Top method regressions (percentages):
           7 ( 1.61% of base) : System.Private.CoreLib.dasm - Vector64`1:GetHashCode():int:this (6 methods)
           8 ( 0.98% of base) : System.Collections.dasm - HashSet`1:Remove(double):bool:this
           2 ( 0.89% of base) : System.Private.CoreLib.dasm - <>f__AnonymousType0`1:GetHashCode():int:this (7 methods)
           1 ( 0.33% of base) : System.Private.CoreLib.dasm - ValueTask`1:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType0`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType3`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType4`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType5`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType7`2:GetHashCode():int:this (7 methods)
           1 ( 0.25% of base) : CommandLine.dasm - <>f__AnonymousType8`2:GetHashCode():int:this (7 methods)

Top method improvements (percentages):
          -9 (-23.68% of base) : System.Reflection.Metadata.dasm - EntityHandle:Compare(int,int):int
          -9 (-20.93% of base) : System.Reflection.Metadata.dasm - HandleComparer:Compare(int,int):int:this
          -7 (-7.61% of base) : System.Collections.dasm - HashSet`1:InternalGetHashCode(double,IEqualityComparer`1):int
          -8 (-5.93% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DeclarationOrderTypeSymbolComparer:Compare(Symbol,Symbol):int:this
          -5 (-5.05% of base) : System.Private.CoreLib.dasm - HashCode:Add(double,IEqualityComparer`1):this
         -36 (-4.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - DeclarationOrderSymbolComparer:Compare(ISymbol,ISymbol):int:this
          -4 (-4.35% of base) : System.Drawing.Primitives.dasm - PointF:GetHashCode():int:this
          -2 (-3.39% of base) : System.Private.CoreLib.dasm - HashCode:Add(double):this
          -8 (-3.02% of base) : Microsoft.CodeAnalysis.dasm - TypesByNamespaceSortComparer:Compare(IGrouping`2,IGrouping`2):int:this
          -2 (-2.90% of base) : System.Data.Common.dasm - SqlSingle:GetHashCode():int:this

88 total methods with Code Size differences (38 improved, 50 regressed), 208371 unchanged.

and for x64:

Total bytes of diff: -326 (-0.00% of base)
    diff is an improvement.

Top file improvements (bytes):
         -80 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
         -51 : CommandLine.dasm (-0.01% of base)
         -50 : System.Private.CoreLib.dasm (-0.00% of base)
         -27 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -27 : NuGet.Packaging.dasm (-0.02% of base)
         -19 : System.Collections.dasm (-0.00% of base)
         -18 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
          -6 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
          -6 : System.Data.Common.dasm (-0.00% of base)
          -6 : System.Drawing.Primitives.dasm (-0.02% of base)

20 total files with Code Size differences (20 improved, 0 regressed), 109 unchanged.

Top method improvements (bytes):
         -13 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(ref,int):int (7 methods)
         -10 (-8.33% of base) : Microsoft.CodeAnalysis.dasm - ConstantValueDouble:GetHashCode():int:this
         -10 (-8.33% of base) : Microsoft.CodeAnalysis.dasm - ConstantValueSingle:GetHashCode():int:this
          -9 (-0.39% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(IEnumerable`1,int):int (7 methods)
          -9 (-2.20% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VB$AnonymousType_0`2:GetHashCode():int:this (7 methods)
          -9 (-2.20% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VB$AnonymousType_1`2:GetHashCode():int:this (7 methods)
          -9 (-2.20% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VB$AnonymousType_2`2:GetHashCode():int:this (7 methods)
          -8 (-0.60% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(ImmutableArray`1,int):int (7 methods)
          -8 (-1.11% of base) : System.Private.CoreLib.dasm - Vector128`1:GetHashCode():int:this (6 methods)
          -6 (-6.00% of base) : System.Drawing.Primitives.dasm - PointF:GetHashCode():int:this

Top method improvements (percentages):
         -10 (-8.33% of base) : Microsoft.CodeAnalysis.dasm - ConstantValueDouble:GetHashCode():int:this
         -10 (-8.33% of base) : Microsoft.CodeAnalysis.dasm - ConstantValueSingle:GetHashCode():int:this
          -6 (-6.00% of base) : System.Drawing.Primitives.dasm - PointF:GetHashCode():int:this
          -3 (-4.69% of base) : System.Data.Common.dasm - SqlSingle:GetHashCode():int:this
          -3 (-4.35% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:GetHashCode(double):int:this
          -3 (-4.35% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:GetHashCode(double):int:this
          -3 (-3.90% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - InternTable`1:BucketNumberFromValue(double,int):int
          -3 (-3.61% of base) : System.Private.CoreLib.dasm - HashCode:Add(double):this
          -4 (-3.28% of base) : System.Collections.dasm - HashSet`1:InternalGetHashCode(double,IEqualityComparer`1):int
          -3 (-3.09% of base) : System.Data.Common.dasm - SqlDouble:GetHashCode():int:this

88 total methods with Code Size differences (88 improved, 0 regressed), 208845 unchanged.


if ((curArgVal->OperKind() & GTK_CONST) ||
((curArgVal->gtOper == GT_ADDR) && (curArgVal->AsOp()->gtOp1->gtOper == GT_LCL_VAR)))
if ((curArgVal->OperKind() & GTK_CONST) || isAddressInLocal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some code between these 2 changes that doesn't appear in GitHub:

    if (curArgVal->gtFlags & GTF_ALL_EFFECT)
    {
        inlCurArgInfo->argHasGlobRef = (curArgVal->gtFlags & GTF_GLOB_REF) != 0;
        inlCurArgInfo->argHasSideEff = (curArgVal->gtFlags & (GTF_ALL_EFFECT & ~GTF_GLOB_REF)) != 0;
    }

It looks to me that in some cases this will unnecessarily set argHasGlobRef when the arg is a local address. Not sure what impact that has on CQ, if any.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your comment about that over on dotnet/coreclr#27630. But I didn't see any global ref flag on the test case here, at least when inlining.

I can look into bypassing that for invariant args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, here I see it in M1 on FIELD and ADDR:

STMT00002 (IL 0x00F...  ???)
               [000012] I-C-G-------              *  CALL      ref    System.Int32.ToString (exactContextHnd=0x02F7E975)
               [000011] ----G------- this in ecx  \--*  ADDR      byref 
               [000010] ----G-------                 \--*  FIELD     int    i
               [000009] ------------                    \--*  ADDR      byref 
               [000008] ------------                       \--*  LCL_VAR   int    V01 loc0         

@AndyAyersMS
Copy link
Member Author

Note this change (when it fires, which isn't often) results in fewer temps.

Looking at the regression for x86 on HashSet'1:Remove(double):bool:this, there is an awkward sequence early in the method from long decomposition:

N001 (  3,  4) [000339] ------------       t339 =    LCL_FLD   long   V01 arg1         u:1[+0] $340
                                                  /--*  t339   long   
N003 ( 10, 10) [000310] DA--G-------              *  STORE_LCL_VAR long   V17 tmp6         d:1

==>

N001 (  3,  4) [000339] ------------       t339 =    LCL_FLD   int    V01 arg1         u:1[+0] $340
               [001047] ------------      t1047 =    LCL_FLD   int    V01 arg1         [+4]
                                                  /--*  t339   int    
N003 ( 10, 10) [000310] DA--G-------              *  STORE_LCL_VAR int    V39 rat0         
                                                  /--*  t1047  int    
               [001049] D-----------              *  STORE_LCL_VAR int    V40 rat1

Seems like it would be better when decomposing a long copy to not create overlapping lifetimes, else we might need a lot of spilling, as we end up doing here here:

       mov      dword ptr [ebp-40H], ecx
       mov      ecx, dword ptr [ebp+0CH]
       mov      dword ptr [ebp-44H], ecx
       mov      ecx, dword ptr [ebp-40H]
       mov      dword ptr [ebp-28H], ecx
       mov      ecx, dword ptr [ebp-44H]
       mov      dword ptr [ebp-2CH], ecx
       mov      ecx, dword ptr [ebp-28H]

I also see the global flag set on the arg tree that ultmately feeds into for this, perhaps related in some way.

Argument #0: is a constant has global refs
               [000303] ----G-------              *  ADDR      byref 
               [000302] ----G-------              \--*  FIELD     double m_value
               [000300] ------------                 \--*  ADDR      byref 
               [000301] ------------                    \--*  LCL_VAR   double V01 arg1 

@AndyAyersMS
Copy link
Member Author

gtNewFieldRef goes conservative if it sees a field of a non-struct.

@AndyAyersMS
Copy link
Member Author

gtNewFieldRef goes conservative if it sees a field of a non-struct.

Also conservative for a field of a struct param, on x64/arm64.

Looking into cleaning up the global effect flag separately. I think the x64/arm64 conservatism for struct params may be needed because reflection can invoke methods with implicit by-ref structs referring to heap allocated structs, though one would hope it would not matter (as even if on the the heap, they should not alias one another, so not sure why we'd care)? Would have been nice to have the comment here be a bit more insightful.

#if defined(_TARGET_AMD64_) || defined(_TARGET_ARM64_)
            // These structs are passed by reference; we should probably be able to treat these
            // as non-global refs, but downstream logic expects these to be marked this way.

@CarolEidt
Copy link
Contributor

Seems like it would be better when decomposing a long copy to not create overlapping lifetimes, else we might need a lot of spilling, as we end up doing here here:

Maybe you could open an issue, and point to that case.

@AndyAyersMS
Copy link
Member Author

Maybe you could open an issue, and point to that case.

Sure, though I don't know where it shows up without this change.

Long decomposition in general is something we ought to improve (see https://github.com/dotnet/coreclr/issues/18339#issuecomment-468355602), but I'm not sure how much priority we should put on x86 CQ....

@CarolEidt
Copy link
Contributor

@AndyAyersMS - thanks for opening the issues. Presumably the long decomposition matters on arm32 as well as x86, though I'm not certain that changes the priority question.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks

@AndyAyersMS
Copy link
Member Author

I also see the global flag set on the arg tree that ultimately feeds into for this, perhaps related in some way.

Nope, not related... still see the long decomp issue even with the prototype fix for global flags.

@AndyAyersMS
Copy link
Member Author

I think the arm test run finished but did not get reported back. Am going to ignore it.

@AndyAyersMS AndyAyersMS merged commit 3fb2aba into dotnet:master Dec 16, 2019
@AndyAyersMS AndyAyersMS deleted the InlineDirectSubFieldAddress branch December 16, 2019 20:03
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants