Skip to content

Conversation

@AndyAyersMS
Copy link
Member

Allow copy prop to update GT_LCL_FLD nodes.

Update local assertion gen for block opts to use the pre-morph tree
to generate copy or zero assertions, since the semantics of the post-morph
tree are often obscured by the copy/zero expansions.

Allow copy prop to update GT_LCL_FLD nodes.

Update local assertion gen for block opts to use the pre-morph tree
to generate copy or zero assertions, since the semantics of the post-morph
tree are often obscured by the copy/zero expansions.
@ghost ghost assigned AndyAyersMS Aug 22, 2022
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 22, 2022
@ghost
Copy link

ghost commented Aug 22, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Allow copy prop to update GT_LCL_FLD nodes.

Update local assertion gen for block opts to use the pre-morph tree
to generate copy or zero assertions, since the semantics of the post-morph
tree are often obscured by the copy/zero expansions.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Aug 22, 2022
@AndyAyersMS
Copy link
Member Author

This is an alternate take on #73719.

Lots of good looking improvements. A few big regressions that I haven't investigated yet. Also some opportunities for follow up, eg if we have V06 == 0 and V04 == V06 and a lcl field use off of V04 we should be creating a zero, it seems.

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

Comment on lines 14399 to 14402
if ((oldTree != nullptr) && oldTree->OperIsBlkOp())
{
optAssertionGen(oldTree);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can really use oldTree here... It should really be a DEBUGARG, the contract is that it can be reshapen/destroyed/reused in arbitrary ways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All we really need to know is the identity of the local(s) involved, so I suppose we can find some other way to convey this information.

Copy link
Contributor

@SingleAccretion SingleAccretion Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would calling this from somewhere inside fgMorphCopy/InitBlock not work? We are already doing this "manual" assertion generation for field assigns there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to survive fgKillDependentAssertions somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it after the killing?

Copy link
Contributor

@SingleAccretion SingleAccretion Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am missing something here.

By the time we get to fgMorphTreeDone, for a decomposed store, we'll have a COMMA chain, which will not be recognized as a store (and thus will not kill the assertions).

What I was thinking as "the kill point" is this code:

// Kill everything about m_dstLclNum (and its field locals)
if (m_comp->optLocalAssertionProp && (m_comp->optAssertionCount > 0))
{
m_comp->fgKillDependentAssertions(m_dstLclNum DEBUGARG(m_asg));
}

Where we still have the original assign.

Edit: this does still leave the wrinkle of killing the assertions of non-decomposed stores, but that can, presumably, be dealt with by moving out all of struct copy assertion generation from fgMorphTreeDone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might work. It scatters the gen/kill logic around a bit more, but I suppose we've already crossed that bridge.

I certainly want to avoid having to inspect the expanded tree to figure out what to kill/preserve/gen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that we kill assertions for init block but not for copy block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess we inherit that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a variant of what's discussed above: putting the assertion gen over in MorphInitBlockHelper::Morph after the dest is prepared but before the copy is morphed.

This misses some cases where we turn the copy into a single assignment to its field.

;;; current PR ("hacky revival of old tree")

Morphing BB01 of 'SocketConnection:DisposeAsync():ValueTask:this'

fgMorphTree BB01, STMT00008 (before)
               [000037] IA---------                         *  ASG       struct (init)
               [000034] D------N---                         +--*  LCL_VAR   struct<System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder, 8>(P) V04 tmp1         
                                                            +--*    ref    V04.m_task (offs=0x00) -> V09 tmp6         
               [000036] -----------                         \--*  CNS_INT   int    0
Notify VM instruction set (SSE2) must be supported.
MorphInitBlock:
MorphBlock for dst tree, before:
               [000034] D----+-N---                         *  LCL_VAR   struct<System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder, 8>(P) V04 tmp1         
                                                            *    ref    V04.m_task (offs=0x00) -> V09 tmp6         
MorphBlock after:
               [000034] D----+-N---                         *  LCL_VAR   struct<System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder, 8>(P) V04 tmp1         
                                                            *    ref    V04.m_task (offs=0x00) -> V09 tmp6         
PrepareDst for [000034] have found a local var V04.
 using field by field initialization.
GenTreeNode creates assertion:
               [000047] -A---------                         *  ASG       ref   
In BB01 New Local Constant Assertion: V09 == null, index = #01
MorphInitBlock (after):
               [000047] -A---+-----                         *  ASG       ref   
               [000045] D------N---                         +--*  LCL_VAR   ref    V09 tmp6         
               [000046] -----------                         \--*  CNS_INT   ref    null

The assignment [000047] using V09 removes: Constant Assertion: V09 == null
GenTreeNode creates assertion:
               [000047] -A---+-----                         *  ASG       ref   
In BB01 New Local Constant Assertion: V09 == null, index = #01
GenTreeNode creates assertion:
               [000037] IA---------                         *  ASG       struct (init)
In BB01 New Local Constant Assertion: V04 == ZeroObj, index = #02

compared to

;;; alternate version (revive during morph block init)

Morphing BB01 of 'SocketConnection:DisposeAsync():ValueTask:this'

fgMorphTree BB01, STMT00008 (before)
               [000037] IA---------                         *  ASG       struct (init)
               [000034] D------N---                         +--*  LCL_VAR   struct<System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder, 8>(P) V04 tmp1         
                                                            +--*    ref    V04.m_task (offs=0x00) -> V09 tmp6         
               [000036] -----------                         \--*  CNS_INT   int    0
Notify VM instruction set (SSE2) must be supported.
MorphInitBlock:
MorphBlock for dst tree, before:
               [000034] D----+-N---                         *  LCL_VAR   struct<System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder, 8>(P) V04 tmp1         
                                                            *    ref    V04.m_task (offs=0x00) -> V09 tmp6         
MorphBlock after:
               [000034] D----+-N---                         *  LCL_VAR   struct<System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder, 8>(P) V04 tmp1         
                                                            *    ref    V04.m_task (offs=0x00) -> V09 tmp6         
PrepareDst for [000034] have found a local var V04.
GenTreeNode creates assertion:
               [000037] IA---------                         *  ASG       struct (init)
In BB01 New Local Constant Assertion: V04 == ZeroObj, index = #01
 using field by field initialization.
GenTreeNode creates assertion:
               [000047] -A---------                         *  ASG       ref   
In BB01 New Local Constant Assertion: V09 == null, index = #02
MorphInitBlock (after):
               [000047] -A---+-----                         *  ASG       ref   
               [000045] D------N---                         +--*  LCL_VAR   ref    V09 tmp6         
               [000046] -----------                         \--*  CNS_INT   ref    null

The assignment [000047] using V09 removes: Constant Assertion: V09 == null

The assignment [000047] using V04 removes: Constant Assertion: V04 == ZeroObj
GenTreeNode creates assertion:
               [000047] -A---+-----                         *  ASG       ref   
In BB01 New Local Constant Assertion: V09 == null, index = #01

I could either try and recognize those cases in fgMorphTreeDone and avoid the kill, or update optAssertionGen to understand that assertions about an entire field of a struct also apply to the struct, so that the assertion gets killed but then gets revived again.

//
if (tree->OperIs(GT_LCL_FLD))
{
if (copyVarDsc->IsEnregisterableLcl() || copyVarDsc->lvPromotedStruct())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: one almost never wants to call lvPromotedStruct() instead of just checking lvPromoted. If we start having non-struct promoted variables in the future, we'll want this behavior for them as well.

Orthogonally, this is a bit conservative. If we already know copyVarDsc is not going to be enregistered, it should be ok to propagate it, so lvaGetPromotionType(copyVarDsc) == PROMOTION_TYPE_INDEPENDENT should in theory be a better check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw quite a few more regressions with promoted structs. Will have to look more closely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw quite a few more regressions with promoted structs

I suppose that addresses the PROMOTION_TYPE_INDEPENDENT idea. Guessing these could have been from SSA-based optimization not working as well (since promoted structs are excluded from all of them).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Aug 22, 2022

Also wondering if it's worth checking for transitive copies. For scalar types it matters less -- one hopes LSRA, say can clean up all the residual copies. But for structs it could be a bigger deal.

Edit: I suppose this should just fall out, if we have say a chain of struct assigns and then a partial use:

y = x;
z = y;
   = lclFld(z)

we should produce

y = x;
z = x;
   = lclFld(z);

and then

y = x;
z = x;
   = lclFld(x);

But perhaps we could see cases where we think forwarding y is a bad idea but forwarding x is ok.

@AndyAyersMS
Copy link
Member Author

SPMI: despite the overall code size reduction, throughput increases, which is a little puzzling.

@AndyAyersMS
Copy link
Member Author

Interestingly we'll CSE LCL_FLD with zero VNs but will not constant prop zeros into LCL_FLDs. So will look into enabling constant prop I guess.

Still not settled on how to actually plumb through the info.

Comment on lines 3459 to 3466
if (varTypeIsStruct(tree))
{
tree->BashToZeroConst(TYP_INT);
}
else
{
tree->BashToZeroConst(tree->TypeGet());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the !varTypeIsStruct branch dead code?

(We have an assert to that effect at the top)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Comment on lines 3447 to 3452
// TODO: create proper simd zero constant
//
if (varTypeIsSIMD(tree))
{
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should skip creating ZeroObj assertions for varTypeIsSIMD. ZeroObj is really meant for TYP_STRUCT only.

Comment on lines 1639 to 1658
case GT_OBJ:
case GT_BLK:
{
GenTree* const addr = op2->AsIndir()->Addr();

if (addr->OperIs(GT_ADDR))
{
GenTree* const base = addr->AsOp()->gtOp1;

if (base->OperIs(GT_LCL_VAR))
{
// layout compat?
op2 = base;
goto IS_COPY;
}
}

goto DONE_ASSERTION;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary after #71584?

(If it is, it should have a TODO-ADDR: delete once <...> comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check once I pick up those changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some cases where it helps -- about 700 all told across SPMI

;;; String:Replace(ushort,ushort):String:this (MethodHash=5aa7fa78) #16249 in ASP.NET

fgMorphTree BB15, STMT00064 (before)
               [000326] -A---------                         *  ASG       simd32 (copy)
               [000324] D------N---                         +--*  LCL_VAR   simd32<System.Numerics.Vector`1[System.UInt16]> V29 tmp12        
               [000208] n----------                         \--*  OBJ       simd32<System.Numerics.Vector`1[System.UInt16]>
               [000207] -----------                            \--*  ADDR      byref 
               [000203] -------N---                               \--*  LCL_VAR   simd32<System.Numerics.Vector`1[System.UInt16]> V14 loc11

;; (with above)

GenTreeNode creates assertion:
               [000326] -A---------                         *  ASG       simd32 (copy)
In BB15 New Local Copy     Assertion: V29 == V14, index = #01

...
Assertion prop in BB15:
Copy     Assertion: V29 == V14, index = #01
               [000320] -----------                         *  LCL_VAR   simd32<System.Numerics.Vector`1[System.UInt16]> V14 loc11  

;; (without above)

no assertion, no copy prop        

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD types, as one would suspect...

So the comment should be TODO-ADDR: delete once local morph folds SIMD-typed indirections. and we definitely need to check for layout compatibility.

if (tree->OperIsConst())
{
goto DONE;
assert("ERROR: Did not morph this node!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert("ERROR: Did not morph this node!");
assert(!"ERROR: Did not morph this node!");

Is it intentional that we no longer check for double morphing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. That checking depended on old tree so I'd have to pass something like that down.

Seems like an odd contract though.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Aug 26, 2022

@dotnet/jit-contrib think this is getting close.

I want to spend some more time looking at regressions since some of them are sizeable. So far they fall into a few camps:

  • early assertion/prop gen/kill for block copies/inits will sometimes lose assertions if those expanded into a single promoted field copy. I have a variant where I rerun assertion gen late but it was not a consistent win.
  • some cases where value numbering's memory model is unable to reason about structs with known values; I think there is possibly something off with in the model (73 (192.11 % of base) : 44847.dasm - N.Repro1:RunTest():int from coreclr_tests.pmi.windows.x64.checked
  • cases where in-body zeroing is more compact that prolog zeroing
  • cases where LSRA make different choices
  • struct copyprop may lead to different copy expansions. These variations do not seem at times to be all that well motivated. At any rate, if it turns out that all the appearances of a struct are now per-field copy destinations, those dests get marked as GTF_VAR_USEASG and keep the struct and all writes alive even if there are no actual (explicit) reads. This doesn't happen if the struct is written wholesale. This explains quite a few of the bigger asp.net regressions. For example, in 72 ( 5.14% of base) : 92282.dasm - <ReadBufferAsync>d__72:MoveNext():this from asp.net we see:
;; before

       mov      rdx, gword ptr [rbp-40H]
       ; gcrRegs +[rdx]
       mov      ecx, dword ptr [rbp-38H]
       movsx    rax, word  ptr [rbp-34H]
       mov      gword ptr [rbp-60H], rdx
       mov      dword ptr [rbp-58H], ecx
       mov      word  ptr [rbp-54H], ax
       mov      byte  ptr [rbp-52H], 0
						;; size=77 bbWeight=4    PerfScore 97.00
G_M43996_IG07:        ; , nogc, extend
       vmovdqu  xmm0, xmmword ptr [rbp-60H]
       vmovdqu  xmmword ptr [rbp-30H], xmm0

						;; size=10 bbWeight=4    PerfScore 16.00
G_M43996_IG08:        ; , isz, extend
       mov      rdi, gword ptr [rbp-30H]

;; after

       mov      rdx, gword ptr [rbp-40H]
       ; gcrRegs +[rdx]
       mov      ecx, dword ptr [rbp-38H]
       movsx    rax, word  ptr [rbp-34H]
       mov      gword ptr [rbp-70H], rdx
       mov      dword ptr [rbp-68H], ecx
       mov      word  ptr [rbp-64H], ax
       mov      byte  ptr [rbp-62H], 0

       mov      gword ptr [rbp-50H], rdx   ;; dead copy
       mov      dword ptr [rbp-48H], ecx
       mov      word  ptr [rbp-44H], ax
       mov      byte  ptr [rbp-42H], 0

       mov      gword ptr [rbp-30H], rdx
       mov      dword ptr [rbp-28H], ecx
       mov      word  ptr [rbp-24H], ax
       mov      byte  ptr [rbp-22H], 0
       mov      rdi, gword ptr [rbp-30H]

(plus another identical case later)

I suspect that second copy is perhaps partially dead but perhaps kept alive by the same issue with GTF_VAR_USEASG. Though one would hope that last read could just be rdx so maybe more is going on.

I wonder how hard it would be to model this in liveness.

@AndyAyersMS
Copy link
Member Author

@kunalspathak can you review?

@kunalspathak
Copy link
Contributor

@kunalspathak can you review?

Sure, will do it later today.

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions specifically around optAssertionPropDone. I didn't quite understand it. Can you please explain the design around it?

// Block ops handle assertion kill/gen specially.
// See PrepareDst and PropagateAssertions
//
if (optAssertionPropDone != nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we say that we are doing struct1 = struct2 and is candidate for copy prop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, all that happens over in optAssertionProp_LclVar and the new optAssertionProp_LclFld. before the tree is morphed.

This check is asking whether or not we've already killed and generated assertions from the post-morph tree.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Aug 30, 2022

Added some questions specifically around optAssertionPropDone. I didn't quite understand it. Can you please explain the design around it?

Global morph runs over each basic block in order, and in each block, over each statement in order, and in each statement, each node in order. During global morph we enable "local" assertion prop; at the start of each block, we clear out the assertion table.

Before morphing each node, we try and apply the currently active assertions to the tree. Then when we're done morphing the node, we kill off any assertions that are no longer true and try and generate new assertions. This is more or less standard forward data flow on assertions. The set of assertion we track is pretty limited: var = constant; var is nonnull; var = other var.

The first part of this PR is to enable assertion prop into LCL_FLD uses. It was inspired by code I saw in #66776 where we copied a struct and then just read one field of the copy.

The approach outlines above works pretty well for most nodes, but does not work well for struct inits and struct copies, because the code in assertion prop can't always recognize the zeroings and copes once they've been expanded. In particular when a struct copy gets expanded as field by field copy, the sequence of stores to the LCL_FLDs of the dest would kill any assertion about the dest.

The second part of this PR tries to improve things generating assertions for struct inits and copies after the source and dest are processed but before the assign node itself is processed. (note we were already doing the kills early on). That way we know what assertions to gen.

We need to remember that we've done this "early assertion gen / kill" for these particular trees so we don't do kills again later on in fgMorphTreeDone. That's what optAssertionPropDone is for. We can't make this part of ambient morph state because it applies just to the root node of one particular tree, not to an entire tree.

@kunalspathak
Copy link
Contributor

We need to remember that we've done this "early assertion gen / kill" for these particular trees so we don't do kills again later on in fgMorphDone.

That makes sense. Thank you for the explaination.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Sep 1, 2022

Regressions:

Improvements

@AndyAyersMS
Copy link
Member Author

There are two tests that show substantial regressions across os/isa combinations:

(ubuntu x64)

newplot - 2022-09-07T180930 418
newplot - 2022-09-07T180948 247

A few other tests show small regressions.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Sep 9, 2022
In dotnet#74384 I modified morph to generate assertions before a block op
was morphed, but after the source and dest were morphed. This misses
generating some assertions that arise once the expanded form is
known.

We now re-run assertion gen on the expanded tree, when we do an
`OneAsgBlock` expansion. Other cases my also prove profitable.

Closes dotnet#75229.
AndyAyersMS added a commit that referenced this pull request Sep 9, 2022
In #74384 I modified morph to generate assertions before a block op
was morphed, but after the source and dest were morphed. This misses
generating some assertions that arise once the expanded form is
known.

We now re-run assertion gen on the expanded tree, when we do an
`OneAsgBlock` expansion. Other cases my also prove profitable.

Closes #75229.
@ghost ghost locked as resolved and limited conversation to collaborators Oct 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants