Skip to content

Conversation

@sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Apr 8, 2020

The failing assert was:

Assertion failed 'IsCompatibleType(cseLclVarTyp, expTyp)'  
F:\git\runtime\src\coreclr\src\jit\optcse.cpp Line: 2768

That was a combination of two issues:

  1. There was an IND struct under ADDR not marked with DONT_CSE, it came from INDEX morphing;
  2. There was another IND for the same LCL_VAR, but as ref from call return retyping, VNApplySelectorsTypeCheck saw this IND ref and returned struct type for it without checking sizes/structHandles (we don't have handles in this case, because we don't have a mechanism to keep them on pointers and we access array through a ref (pointer) local var);
    -> CSE saw IND struct and IND ref with the same VN and without DONT_CSE and failed.

The first issue was fixed the third commit 86a9f22,
it introduced a few diffs (20 methods in frameworks, then a few in BING/pri1 SPMI), all textual, not size diffs. I have fixed them in cd24383 commit.

The solution adds this flag on each node under ADDR and checks it in debugCheckFlags. It is questionable because we will for sure forget to clear DONT_CSE flag from time to time and we don't have a checker to assert when it is set without a reason, as we do for CALL, ASG, EXCEPT flags.
We could add a checker (in another PR) but it won't be cheap because:

  1. DONT_CSE is used not only for correctness but for profitability as well. I had a change that cleaned it from NULLCHECK nodes, but diffs were negative: CSE started to create CSE copies for them, but the null checks were later deleted by assertion propagation, so changing CSE logic to have zero diffs would be non-trivial;
  2. we don't have a clear contract where DONT_CSE is required for the correctness and there are comments with TODO to delete some that reference failing tests if you do.

Maybe it would be better to drop the commit for SPMI diffs and accept textual diffs instead of adding these non-obvious conditions. I am sure there are some other places where we forgot to clean that, they just don't produce any diffs.

Commits:
6128613: Fix old printing issues.
df5266c: add a repro test that doesn't require crossgen2.
86a9f22: Check that all ADDR sources are marked as DONT_CSE.
cd24383: Fix text(no size) diffs found by SPMI (could be dropped).
c0300c4: Add a question/comment (will be dropped/fixed before merge).

Fixes #33884.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 8, 2020
@sandreenko sandreenko changed the title test run Fix an incorrect CSE case with struct retyping. Apr 11, 2020
@sandreenko
Copy link
Contributor Author

PTAL @erozenfeld @briansull, cc @dotnet/jit-contrib
I would like to hear your opinion about adding a checker for DONT_CSE or move its setting to another place/phase.

@sandreenko sandreenko marked this pull request as ready for review April 11, 2020 16:46
bool canCSE = tree->CanCSE();
// That doesn't make any sense, was it supposed to clean all flags except GTF_NODE_MASK here? 2012<
tree->gtFlags &= GTF_NODE_MASK;
// If yes then why a year after (2013) was this cleaning added, that obviously does nothing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you want to checkin these two comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will drop that commit before merge, want to see if somebody knows more about these changes.

op = op1->AsOp()->gtOp1;
if (canCSE)
{
op->ClearDoNotCSE();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't necessary to guard this call to ClearDoNotCSE()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is not it? It could be a volatile LCL_VAR that needs that flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of CanCSE is
return ((gtFlags & GTF_DONT_CSE) == 0);
So this is basically saying if the flag is set then clear it using the AND NOT operation.
Not sure how a volatile LCL_VAR would behave any differently here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is basically saying if the flag is set then clear it using the AND NOT operation.

Maybe I am missing something, but I don't understand.
We are checking the flag on IND(ADDR(X)) tree and clear it on X. X was marked as DONT_CSE because it was ADDR source, IND(ADDR(X)) could be marked as DONT_CSE for other reasons that we don't change here, so if IND(ADDR(X)) was marked with that flag then after the transformation X should have this flag as well.

For example, IND1(ADDR2(IND3(ADDR4(X)) in the beginning X and IND3 have DONT_CSE because they are ADDR sources, when we optimize IND3(ADDR4(X)) we have to keep DONT_CSE on X .

Copy link
Contributor

@briansull briansull Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see.
I missed that we were overwriting op after setting canCSE.

 bool canCSE = op->CanCSE();
          op        = op1->AsOp()->gtOp1;

This sequence is confusing and could use a comment, saying we are preserving any DONT_CSE that was set on the original version of op

Actually I'm not sure that this implementation does properly preserve this flag.

When a parent node sets DONT_CSE on a node it means that it wants the child node to always have this flag set.

GenTree* byReferenceStruct = gtCloneExpr(thisptr->gtGetOp1());
assert(byReferenceStruct != nullptr);
GenTreeLclVar* byReferenceStruct = gtCloneExpr(lclVar)->AsLclVar();
assert(!byReferenceStruct->CanCSE());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to remove the assert? I believe that gtCloneExpr can return nullptr for some complex trees.
assert(byReferenceStruct != nullptr);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CanCSE does this dereferencing, so it is not necessary to check it separately, but I will return it if it reads better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An A/V in a Checked build is worse than an assert firing, in terms of debuggablity.

@briansull
Copy link
Contributor

I would like to hear your opinion about adding a checker for DONT_CSE or move its setting to another place/phase.

Checker phases are typically used to enforce correctness issues.
The DONT_CSE is a flag that prevents an optimization, presumably because it is unsafe.
When code sets it is could be unsafe for all cases or if is is difficult to determine it could be unsafe just for some cases.

Failing to perform a CSE isn't a correctness issue and often won't even significantly change the performance of the generated code. It will show up in code diffs, which looks like what your bug fix is trying to correct here.

Maybe it would be better to drop the commit for SPMI diffs and accept textual diffs instead of adding these non-obvious conditions.

I am fine with accepting a small amount of textual diffs and we often get them when code changes the behavior of the CSE heuristics .

@sandreenko
Copy link
Contributor Author

Checker phases are typically used to enforce correctness issues.
The DONT_CSE is a flag that prevents an optimization, presumably because it is unsafe.
When code sets it is could be unsafe for all cases or if is is difficult to determine it could be unsafe just for some cases.

How is it different from, for example, GTF_EXCEPT? When we don't set GTF_EXCEPT where we need it - it is a correctness issue. When we set it if we don't need it - it forbids some optimizations.

@briansull
Copy link
Contributor

briansull commented Apr 12, 2020

GTF_EXCEPT flags are a bit different IMO, these are flags that get push up the tree to all of the parent nodes.
Thus it summarizes the state below, so that we don't have to walk the entire tree to look for operations that have exceptional side effects. Since when inserting new node it is easy to forget to set the GTF_EXCEPT flags , a checker is great to insure that the summery is always correct. I am sure that correctness of our optimization depend upon having correct GTF_EXCEPT flags, although having an extra flag would only disable an optimization (similar to having an extra DONT_CSE flag)

The DONT_CSE flag is a local flag that only applies to the current node, and typically the parent node is responsible for setting it on one or both of its child nodes.

That said I am not opposed to adding some kind of checker if it provides good value.

@erozenfeld
Copy link
Member

We currently have 56 places in the code where we set GTF_DONT_CSE and an additional 11 places where we propagate it. I think it would be beneficial to audit these places and do three things:

  1. For places where there are comments that indicate that it's not clear why the flag is set, try not to set it and figure out the right solution.
  2. For places that where there is a good reason to set the flag for correctness, see if it's possible to add a check to the checker to make sure we don't lose the flag.
  3. For places where we set the flag for performance reasons we probably don't want to add a check to the checker but we should make sure there is a good comment explaining why the flag is set.
    That will also make it impossible to add a check in the checker that we shouldn't see an unexpected GTF_DONT_CSE.


if ((indType == TYP_REF) && (varTypeIsStruct(elemTyp)))
{
// This whole block is over-optimistic, we don't have any information
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that if we delete that unsafe condition we will get 8 bytes regression in framework assemblies (1 method) and 2 in SPMI (pri1 and BING).
That regression will be fixed with #33225, so I think I will just delete this block before the merge.

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

if (!canCSE)
{
tree->SetDoNotCSE();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: why not set DONT_CSE without checking it first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I misread the change.

tree->AsLclVarCommon()->SetLclNum(fieldLclIndex);
tree->gtType = fieldType;
bool canCSE = tree->CanCSE();
// That doesn't make any sense, was it supposed to clean all flags except GTF_NODE_MASK here? 2012<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can help you find the change that introduced this.

Copy link
Contributor Author

@sandreenko sandreenko Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change gives us one regression in framework crossgening:

8 (12.70% of base) : Newtonsoft.Json.dasm -  Newtonsoft.Json.Serialization.JsonTypeReflector:get_DynamicCodeGeneration():bool

@sandreenko
Copy link
Contributor Author

I have updated the PR, deleted the suspicious optimization from VNApplySelectorsTypeCheck and allowed textual diffs where we forget to clean DONT_CSE.
Please take another look.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandreenko sandreenko merged commit e405ca0 into dotnet:master Apr 17, 2020
@sandreenko sandreenko deleted the crossgen2Failure branch April 17, 2020 08:37
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
@sandreenko sandreenko restored the crossgen2Failure branch September 2, 2021 05:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System.CodeAnalysis.dll crashes building with Crossgen2

3 participants