Fold away Sse.StaticCast and Avx.StaticCast in the importer#18519
Conversation
|
|
For AVX, this gets rid of some stack shuffling on method entry, and in some cases removes the need for -vmovaps xmm8, qword ptr [rsp+100H]
-vmovaps xmm9, qword ptr [rsp+F0H]
-vzeroupper However, in some cases, it results in stack spillage where it was previously just using an extra register: lea rcx, bword ptr [rbp-*8H]
call System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
vmovupd ymm6, ymmword ptr[rax]
-vmovdqa ymm7, ymm6
+vmovupd ymmword ptr[rbp-B0H], ymm6
lea rcx, bword ptr [rbp-*0H]
-vextractf128 ymm8, ymm6, 1
-vextractf128 ymm9, ymm7, 1
+vextractf128 ymm7, ymm6, 1
call System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
-vinsertf128 ymm7, ymm9, 1
-vinsertf128 ymm6, ymm8, 1
-vmovupd ymmword ptr[rax], ymm7
+vinsertf128 ymm6, ymm7, 1
+vmovupd ymm0, ymmword ptr[rbp-B0H]
+vmovupd ymmword ptr[rax], ymm0 |
|
AFAIR last time when I tried this an assert fired. Some bit of code does not like a struct handle, most likely because vector with different base type are really different types. Not sure if anything has changed since then. |
I think we've cleaned up some of the code since then. There weren't any asserts firing locally However, there are asserts firing if I try to do the same with |
Cool. Make sure to try it with the examples from #18069, just in case. I think it's the one I used to experiment with. |
|
Seems we also get worse codegen occasionally for lea rcx, bword ptr [rbp-**H]
call System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
-vmovupd xmm6, xmmword ptr [rax]
-vmovaps xmm7, xmm6
+mov rcx, qword ptr [rax]
+mov qword ptr [rbp-60H], rcx
+mov rcx, qword ptr [rax+8]
+mov qword ptr [rbp-58H], rcx
+vmovapd xmm0, xmmword ptr [rbp-60H]
+vmovapd xmmword ptr [rbp-70H], xmm0
lea rcx, bword ptr [rbp-**H]
call System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
-vmovapd xmmword ptr [rbp-B0H], xmm7
-vmovupd xmmword ptr [rax], xmm7
+mov rcx, qword ptr [rbp-70H]
+mov rdx, qword ptr [rbp-68H]
+mov qword ptr [rax], rcx
+mov qword ptr [rax+8], rdx
mov rcx, 0xD1FFAB1EThis looks like it is part of 1st class structs, however and is due to |
|
The non-VEX diff is basically the same as #18519 (comment), just replace Will get diffs for #18069 shortly... @AndyAyersMS, @CarolEidt. Is there some smaller change that could fix the stack shuffling that happens for |
| { | ||
| // We fold away the static cast here, as it only exists to satisfy | ||
| // the type system. It is safe to do this here since the retNode type | ||
| // and the signature return type are both TYP_SIMD16. |
There was a problem hiding this comment.
Can an assert be added for this comment as well? (assert the return type)
There was a problem hiding this comment.
Probably. Will update.
| // the type system. It is safe to do this here since the retNode type | ||
| // and the signature return type are both TYP_SIMD16. | ||
| assert(sig->numArgs == 1); | ||
| retNode = impPopStack().val; |
There was a problem hiding this comment.
Does this change handle the nested StaticCast?
There was a problem hiding this comment.
Any nested StaticCast should have already been handled by this point.
There was a problem hiding this comment.
But impPopStack().val seems not working with SIMD types. Shall we use impSIMDPopStack(TYP_SIMDxx)?
There was a problem hiding this comment.
Was there a reason that you decided not to use impSIMDPopStack?
There was a problem hiding this comment.
Just an oversight. Fixing now, will push an update shortly (after build completes locally).
Also, to address @fiigii's comment:
But impPopStack().val seems not working with SIMD types.
impPopStack() works just fine with SIMD types, and is what impSIMDPopStack uses under the hood. The benefit of using impSIMDPopStack is that it performs additional validation on the type and, in certain cases, normalization of the type for existing nodes.
There was a problem hiding this comment.
impPopStack() works just fine with SIMD types
In ordinary situation (i.e., on local stack), it indeed works fine. But, IIRC, impSIMDPopStack is necessary for some complex environment, such as passing/returning or referred SIMD variables.
@tannergooding Could you take a look at the JITDump? I remember that |
I'd have to look at the dump first to better understand what's going on. If you could post a dump, that would save me having to download & build your changes. |
|
I'll get dumps put up sometime this evening. |
|
Full JitDumps for both AVX and SSE are here: StaticCast_ro.zip |
|
From what I can tell, the struct is being promoted at times. For example: |
|
@tannergooding - sorry to have taken so long to get around to looking at this. I don't think this is really related to first class structs. This is, I believe, due to problems with the way we decide when to promote SIMD types. In general, if a struct can be promoted, it will be. In the SIMD case, however, we don't necessarily want to always do that. Looking at the code, it seems that the only case that we disable promotion is when On thing would be to try modifying |
|
@CarolEidt, thanks! Would it also make sense, in the case of HardwareIntrinsics, to never promote, since there are no (public) fields to access? |
Yes, that would be even better, I would say. |
|
The simplest "fix" for the struct promotion issue is to continue calling We now get vmovupd xmm6, xmmword ptr [rax]
vmovapd xmmword ptr [rbp-80H], xmm6
lea rcx, bword ptr [rbp-50H]instead of mov rcx, qword ptr [rax]
mov qword ptr [rbp-60H], rcx
mov rcx, qword ptr [rax+8]
mov qword ptr [rbp-58H], rcx
vmovapd xmm0, xmmword ptr [rbp-60H]
vmovapd xmmword ptr [rbp-70H], xmm0
lea rcx, bword ptr [rbp-38H] |
|
Going to see if I can get a bigger/better fix as part of a separate PR. |
|
Rebased onto master. |
|
@CarolEidt, could you review when you get the chance? #18519 (comment) shows the codegen after ensuring the operands are not promoted. https://github.com/dotnet/coreclr/issues/18069 shows the better codgen for the issue this is fixing. |
CarolEidt
left a comment
There was a problem hiding this comment.
LGTM overall, but have one question.
| // the type system. It is safe to do this here since the retNode type | ||
| // and the signature return type are both TYP_SIMD16. | ||
| assert(sig->numArgs == 1); | ||
| retNode = impPopStack().val; |
There was a problem hiding this comment.
Was there a reason that you decided not to use impSIMDPopStack?
|
... and sorry for the delay in reviewing! |
|
Fixed to use |
|
JitDump is here: JitDump.zip Using The assertion is here: https://github.com/dotnet/coreclr/blob/master/src/jit/importer.cpp#L1255 I'll try to take a closer look tomorrow. |
|
NOTE: The break only happens for the |
|
Found the issue.
In this case, Updating the
@CarolEidt, thoughts? |
|
I opted for modifying |
|
Test failures for |
|
@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic @dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic @dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic |
|
Rebased onto dotnet/master to pick-up the test fixes @CarolEidt, when you get the chance, it would be nice if you could review the additional commit made since you signed-off (which passes the class handle down through |
| // Arguments: | ||
| // type - the type of value that the caller expects to be popped off the stack. | ||
| // expectAddr - if true indicates we are expecting type stack entry to be a TYP_BYREF. | ||
| // structType - the class handle to use when normalizing if it is not the same as the stack entry class handle |
There was a problem hiding this comment.
I would like for this comment to explain the scenario where this matters - even though I originally understood the context, once I came back to this for review, I couldn't imagine why it wouldn't be the same as the stack entry class handle (even though it's obvious in the context of folding away a static cast!). Anyway, it wouldn't hurt to add something like "(this can happen, e.g., when folding away a static cast, where we want the value popped to have the type that would have been returned)"
… the returned node.
|
Thanks @CarolEidt! I am not going to rerun the full suite (of HWIntrinsic tests) since the delta was just clarifying the parameter documentation comment (and all tests were passing). |
FYI. @CarolEidt, @eerhardt, @fiigii
This improves the codegen for code using
StaticCastas the rest of the JIT no longer has to care that it exists at all.