Skip to content

Conversation

@sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Jul 16, 2020

This PR fixes #39403 introduced by #38316 (first two commits).

The safest solution would be to revert eefeb7e and ceec9c8 but it will lead to significant regressions, like

  Total size diff, bytes Methods imrpoved Methods regressed
arm64 Windows crossgen 18912 0 1156

so after advising with Carol I have decided to fix the optimization rather than disable it.

The two bad things about the issue were:

  1. we did not catch it with our testing;
  2. Jit was producing bad code instead of failing with an assert;

so the main goal in the fix was to add as many relevant asserts as possible.

I recommend reviewing by commits.

Changes:
aaf2e99: Add a repro test.

69d33fb: Small ref.

Combine long chains of addr->OperGet() == GT_* || with GT_LCL_VAR_ADDR using OperIs to simplify future changes.

f8d8cac: Add an assert that would fire in the repro test.

emitter::emitHandleMemOp has special logic for contained memBase but the last else does not
expect a contained node. A contained node doesn't produce a register so it is not correct to use the result
of GetRegNum() from a contained node as a valid register.
However, adding an assert to GetRegNum() that !this->isContained is a bigger task that is out of this PR.

578d0ed: Assert that LCL_FLD_ADDR is not contained in genPutArgStk(Split)

We have contained LCL_VAR_ADDR support there but make sure that contained LCL_FLD_ADDR can't reach it.

cd9d0b3: Contain GT_LCL_FLD_ADDR under HW_INTRINSIC.

This is an additional optimization that makes future changes simpler.

7d68d3f: Add contained checks.

In all these places we expect LCL_VAR_ADDR to be contained.
If we had gotten a LCL_VAR_ADDR that is not contained we would have instantiated LCL_VAR_ADDR twice:
in the register and the parent instruction.
The register value would have been unused.

1865d29: Support FLD_ADDR where LCL_ADDR is supported.

However, fire an assert if we think that this path is unreachable for now.

69ecffc: Delete asserts in the reachable blocks.

We have coverage for this asserts in the following tests:
hwintrinsic 478: Ssse3_ro
instr 11645: Runtime_39403
instr 1028 : Aes_ro
hwintrinsic 716: pmi of Microsoft.Diagnostics.Tracing.TraceEvent

From HWIntrinsic support we have some improvements (x64 Pmi):

Total bytes of diff: -2267 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
        -604 : JIT\HardwareIntrinsics\X86\Sse2\Sse2_r\Sse2_r.dasm (-0.02% of base)
        -500 : JIT\HardwareIntrinsics\X86\Avx2\Avx2_r\Avx2_r.dasm (-0.01% of base)
        -364 : JIT\HardwareIntrinsics\X86\Avx\Avx_r\Avx_r.dasm (-0.02% of base)
        -300 : JIT\HardwareIntrinsics\X86\Sse41\Sse41_r\Sse41_r.dasm (-0.02% of base)
        -208 : JIT\HardwareIntrinsics\X86\Sse\Sse_r\Sse_r.dasm (-0.02% of base)
         -80 : JIT\HardwareIntrinsics\X86\Fma_Vector128\Fma_r\Fma_r.dasm (-0.02% of base)
         -64 : JIT\HardwareIntrinsics\X86\Ssse3\Ssse3_r\Ssse3_r.dasm (-0.01% of base)
         -48 : JIT\HardwareIntrinsics\X86\Fma_Vector256\Fma_r\Fma_r.dasm (-0.02% of base)
         -32 : JIT\HardwareIntrinsics\X86\Sse41_Overloaded\Sse41_Overloaded_r\Sse41_Overloaded_r.dasm (-0.03% of base)
         -24 : JIT\HardwareIntrinsics\X86\Aes\Aes_r\Aes_r.dasm (-0.03% of base)
         -24 : JIT\HardwareIntrinsics\X86\Sse3\Sse3_r\Sse3_r.dasm (-0.02% of base)
          -8 : JIT\HardwareIntrinsics\X86\Avx_Vector128\Avx_r\Avx_r.dasm (-0.01% of base)
          -4 : JIT\HardwareIntrinsics\X86\Avx2_Vector128\Avx2_r\Avx2_r.dasm (-0.00% of base)
          -4 : JIT\HardwareIntrinsics\X86\Sse42\Sse42_r\Sse42_r.dasm (-0.02% of base)
          -3 : JIT\Regression\JitBlue\Runtime_39403\Runtime_39403\Runtime_39403.dasm (-1.03% of base)
15 total files with Code Size differences (15 improved, 0 regressed), 3365 unchanged.

crossgen linuxnonjit:

Top file improvements (bytes):
        -662 : System.Private.CoreLib.dasm (-0.01% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 261 unchanged.
Top method improvements (bytes):
        -184 (-20.54% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_Multiply(Matrix4x4,Matrix4x4):Matrix4x4 (2 methods)
         -70 (-15.84% of base) : System.Private.CoreLib.dasm - Matrix4x4:Lerp(Matrix4x4,Matrix4x4,float):Matrix4x4 (2 methods)
         -70 (-23.33% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_Addition(Matrix4x4,Matrix4x4):Matrix4x4 (2 methods)
         -70 (-23.33% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_Subtraction(Matrix4x4,Matrix4x4):Matrix4x4 (2 methods)
         -70 (-27.34% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_Equality(Matrix4x4,Matrix4x4):bool (2 methods)
         -70 (-26.72% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_Inequality(Matrix4x4,Matrix4x4):bool (2 methods)
         -32 (-11.76% of base) : System.Private.CoreLib.dasm - Matrix4x4:Transpose(Matrix4x4):Matrix4x4 (2 methods)
         -32 (-10.06% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_UnaryNegation(Matrix4x4):Matrix4x4 (2 methods)
         -32 (-10.06% of base) : System.Private.CoreLib.dasm - Matrix4x4:op_Multiply(Matrix4x4,float):Matrix4x4 (2 methods)
         -32 (-1.24% of base) : System.Private.CoreLib.dasm - Matrix4x4:<Invert>g__SseImpl|59_0(Matrix4x4,byref):bool (2 methods)

I am planning to write repro cases for the rest of assert(!"don't expect GT_LCL_FLD_ADDR") before I finish this PR.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 16, 2020
@sandreenko sandreenko force-pushed the fixLclFldAddr branch 3 times, most recently from 4c45fe4 to e54b816 Compare July 17, 2020 08:03
@sandreenko sandreenko marked this pull request as ready for review July 17, 2020 22:55
@sandreenko
Copy link
Contributor Author

PTAL @CarolEidt @dotnet/jit-contrib

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the same pattern of code added to quite a number of places. Should we make this into a new helper method?

@sandreenko
Copy link
Contributor Author

Seems like the same pattern of code added to quite a number of places. Should we make this into a new helper method?

I like the idea. I was thinking about adding GetLclOffs to LclVarCommon returning 0 for LCL_VAR,
or

	assert(addr->isContained());
	varNum = addr->AsLclVarCommon()->GetLclNum();
	if (addr->OperIs(GT_LCL_FLD_ADDR))
	{
		offset = addr->AsLclFld()->GetLclOffs();
	}
	else
	{
		offset = 0;
	}

can be extracted (but will have to use out params to pass results).
What do you think is better? The first could be used in other places as well.

I can make a try and see how it looks.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, with a couple of questions & suggestions. I also like Andy's suggestion to extract the repeated code, if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are asserting that addrNode is not contained, and yet calling genConsumeAddress which handles a contained LEA. Is that so that it will handle a contained address even if we don't expect it? If so, a comment would be good. If not, it should be calling genConsumeReg().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here would be good - i.e. that this path is unexpected and untested, but should be correct in case it is encountered in a non-checked JIT.

Copy link
Contributor Author

@sandreenko sandreenko Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am planning to write a test that hits this path, if I won't be able to construct it I will add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the repro cases and deleted the asserts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that it could potentially be the case that we have no base or index, but in that case we should have a handle for the offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a check || (indir->Offset() != 0), I think it is unreachable (based on my analysis of GenTreeIndir::Offset()) but it is just safer for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ldloca.s   V_0 
ldflda float64 Runtime_39424/Container::Double
volatile. ldind.r8

were generating what we needed. I have disabled CSE, added more than 4 fields to avoid struct promotion and marked indir as volatile to avoid the optimizations that were hiding the issue.

Sergey Andreenko added 11 commits July 22, 2020 02:04
Combine long chains of `addr->OperGet() == GT_* ||` with `GT_LCL_VAR_ADDR` using `OperIs` to simplify future changes.
`emitter::emitHandleMemOp` has special logic for contained `memBase` and but the last block does not
expect a contained node. A contained node doesn't produce a register so it is not correct to use result
of `GetRegNum()` from a contained node as a valid register.
However, adding an assert to `GetRegNum()` that `!this->isContained` is a bigger task that is out of this PR.
We have contained `LCL_VAR_ADDR` support there but make sure that contained `LCL_FLD_ADDR` can't reach it.
This is an additional optimization that makes future changes simpler.
In all these places we expect `LCL_VAR_ADDR` to be contained.
If we had gotten a `LCL_VAR_ADDR` that is not contained we would have instantiated `LCL_VAR_ADDR` twice:
in the register and the parent instruction.
The register value would have been unused.
However, fire an assert if we think that this path is unreachable for now.
We have coverage for this asserts in the following tests:
hwintrinsic 478: Ssse3_ro
instr 11645: Runtime_39403
instr 1028 : Aes_ro
hwintrinsic 716: pmi of Microsoft.Diagnostics.Tracing.TraceEvent
Delete the rest `assert(!"don't expect GT_LCL_FLD_ADDR");`.
@sandreenko
Copy link
Contributor Author

The PR has been updated and is ready for the next round of review.

@sandreenko
Copy link
Contributor Author

ping @dotnet/jit-contrib

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, though I don't know how to be confident that all the places that needed to be changed have been changed.

@sandreenko
Copy link
Contributor Author

Changes look good, though I don't know how to be confident that all the places that needed to be changed have been changed.

I estimate the risk as low. We had had support for contained GT_CLS_VAR_ADDR on all platforms and for contained GT_LCL_VAR_ADDR on x64, so I have checked all places where we were referencing one of them and add support for the new nodes. Then I made sure that these new code paths are visited by adding a repro test for each of them.

I think the new asserts in emitHandleMemOp would help to catch similar issues in the future on x64/x86. Arm arch logic is simpler and because of that it needs less changes.

@sandreenko sandreenko merged commit 557c7af into dotnet:master Jul 24, 2020
Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020
* Add a repro test.

* Small ref.

Combine long chains of `addr->OperGet() == GT_* ||` with `GT_LCL_VAR_ADDR` using `OperIs` to simplify future changes.

* Add an assert that would fire in the repro test.

`emitter::emitHandleMemOp` has special logic for contained `memBase` and but the last block does not
expect a contained node. A contained node doesn't produce a register so it is not correct to use result
of `GetRegNum()` from a contained node as a valid register.
However, adding an assert to `GetRegNum()` that `!this->isContained` is a bigger task that is out of this PR.

* Assert that `LCL_FLD_ADDR` is not contained in `genPutArgStk(Split)`

We have contained `LCL_VAR_ADDR` support there but make sure that contained `LCL_FLD_ADDR` can't reach it.

* Contain `GT_LCL_FLD_ADDR` under HW_INTRINSIC.

This is an additional optimization that makes future changes simpler.

* Add contained checks.

In all these places we expect `LCL_VAR_ADDR` to be contained.
If we had gotten a `LCL_VAR_ADDR` that is not contained we would have instantiated `LCL_VAR_ADDR` twice:
in the register and the parent instruction.
The register value would have been unused.

* Support `FLD_ADDR` where `LCL_ADDR` is supported.

However, fire an assert if we think that this path is unreachable for now.

* Delete asserts in the  reachable blocks.

We have coverage for this asserts in the following tests:
hwintrinsic 478: Ssse3_ro
instr 11645: Runtime_39403
instr 1028 : Aes_ro
hwintrinsic 716: pmi of Microsoft.Diagnostics.Tracing.TraceEvent

* Review response.

* Add repro cases.

Delete the rest `assert(!"don't expect GT_LCL_FLD_ADDR");`.

* Use `GetLclOffs` from `LclVarCommon`.

* missed file.
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimizations introduce null reference exception on latest preview 8 daily build

4 participants