-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: extend forward sub to local fields #73719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Optimize cases where we copy a struct and then immedately read just one field to read from the original struct instead.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsOptimize cases where we copy a struct and then immedately read just one field
|
|
cc @dotnet/jit-contrib Partially inspired by #66776. Just a small simple piece of struct copy prop. BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.856/21H2) PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=5000 IterationTime=250.0000 ms
Still a rough. Some things to sort through:
Some nice looking diffs, eg ;;; BEFORE
; Assembly listing for method Microsoft.CodeAnalysis.ExternalAccess.VSTypeScript.Api.VSTypeScriptCodeFixContextExtensions:IsBlocking(Microsoft.CodeAnalysis.CodeFixes.CodeFixContext):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 6 ) byref -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T01] ( 2, 4 ) struct (56) [rsp+00H] do-not-enreg[SF] class-hnd exact "Single-def Box Helper"
;
; Lcl frame size = 56
G_M40065_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
sub rsp, 56
vzeroupper
;; size=7 bbWeight=1 PerfScore 1.25
G_M40065_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, nogc
; byrRegs +[rcx]
vmovdqu ymm0, ymmword ptr[rcx]
vmovdqu ymmword ptr[rsp], ymm0
vmovdqu xmm0, xmmword ptr [rcx+32]
vmovdqu xmmword ptr [rsp+20H], xmm0
mov rax, qword ptr [rcx+48]
mov qword ptr [rsp+30H], rax
;; size=29 bbWeight=1 PerfScore 14.00
G_M40065_IG03: ; , extend
movzx rax, byte ptr [rsp+18H]
;; size=5 bbWeight=1 PerfScore 1.00
G_M40065_IG04: ; , epilog, nogc, extend
add rsp, 56
ret becomes ;;;; AFTER
; Assembly listing for method Microsoft.CodeAnalysis.ExternalAccess.VSTypeScript.Api.VSTypeScriptCodeFixContextExtensions:IsBlocking(Microsoft.CodeAnalysis.CodeFixes.CodeFixContext):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 6 ) byref -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V02 tmp1 [V02 ] ( 0, 0 ) struct (56) zero-ref do-not-enreg[SF] class-hnd exact "Single-def Box Helper"
;
; Lcl frame size = 0
G_M40065_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
;; size=0 bbWeight=1 PerfScore 0.00
G_M40065_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref
; byrRegs +[rcx]
movzx rax, byte ptr [rcx+24]
;; size=4 bbWeight=1 PerfScore 2.00
G_M40065_IG03: ; , epilog, nogc, extend
ret |
|
Failure is from folding to |
|
Probably need some vestige of the old handle check back, layout compat isn't enough. Looks like in crossgen Vector4 and Vector128 aren't ABI compatible, but we need to wait until after folding obj(addr(lcl)) to spot this. |
|
@AndyAyersMS, is this PR for .NET 7 or 8? Please put milestone accordingly. |
|
It is a draft PR, would be for 8 if anything. |
Yes, this is one case where we still need the handle check. It will be fixed "soon" (#72887 is the prelude).
I notice in #66776 we have a tree characteristic of struct arguments to inlinees, with an unnecessary copy made due to the
Probably not worth it? If we see a
Forwarding enregisterable/promoted structs, it would be the conservative option. The worry is of course that we'd pessimize all other defs/uses.
We only have layout on Overall, I am wondering whether we should slot the |
You may be right. I'll probably give this a try. |
I've been looking into this alternative, and it's not working anywhere near as well as forward sub (despite being potentially more capable). We call assertion gen after morphing the trees, and the expansions we do for many struct copes aren't recognizable as struct copies anymore. One idea I may try is to invoke assertion gen on the pre-morphed tree (at least for struct copies). For example, will not generate a |
|
Are the issues caused by promotion expansions or something else? (Edit: I see it is) Generating assertions in pre-order is "wrong" as the copying hasn't happened at that point yet (thus assertions created would not be valid for subtrees); creating them just before block morphing should be fine though. |
This is assuming we have appropriate heuristics in propagation itself, and will not try to replace something with a struct that has just been eliminated through decomposition. Subtitution also has to have these heuristic (the aforementioned DNER question), but it's easier for it to not get into questionable territories because the use/def count is known. |
Yeah, seems to be caused by the expansions. It would be impractical to pattern match these later, I think.
Right, was looking for a good place to try this. |
|
May be a hack, but |
|
Abandoning this in favor of #74384 |
Optimize cases where we copy a struct and then immedately read just one field
to read from the original struct instead.