JIT: Clean up and optimize StackLevelSetter#112561
Conversation
The stack levels computed are not actually used for anything outside x86, so avoid computing them in that case. Also, the max stack level does not actually get used by the backend, even for x86, so no need to save it in `Compiler`.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
| // Check that we have place for the push. | ||
| assert(compiler->fgGetPtrArgCntMax() >= 1); |
There was a problem hiding this comment.
I am not sure what the comment is referring to. This value has not been used to allocate anything by the backend. The only thing it is used for is deciding between EBP/ESP frame and whether we need to switch to partial interruptible mode, but that's all done during StackLevelSetter.
| #endif | ||
|
|
||
| if (compCanEncodePtrArgCntMax() && fgHasCycleWithoutGCSafePoint()) | ||
| if (fgHasCycleWithoutGCSafePoint()) |
There was a problem hiding this comment.
fgPtrArgCntMax is computed much after this, so the check was always true.
|
Actually |
|
cc @dotnet/jit-contrib PTAL @AndyAyersMS No diffs except for throughput improvements since we no longer run this phase outside x86 in the vast majority of cases. I think in the future we might want to rethink the heuristics for when a frame pointer is allocated and remove the quirk added here. I think for linux-x64 it might be necessary for diagnostics, but for win-x64 it would free up a GP register quite often, and I'm not sure if there are any significant downsides there. |
|
ping @AndyAyersMS |
AndyAyersMS
left a comment
There was a problem hiding this comment.
LGTM, though perhaps you can make it a bit faster still.
src/coreclr/jit/stacklevelsetter.cpp
Outdated
| void StackLevelSetter::ProcessBlock(BasicBlock* block) | ||
| { | ||
| assert(currentStackLevel == 0); | ||
| #ifndef TARGET_X86 |
There was a problem hiding this comment.
Couldn't we host this and skip the whole process blocks loop in the caller?
There was a problem hiding this comment.
Ah, good catch. Addressed.
StackLevelSetteris used to compute the max number of pushed stack slots that the function is going to have. Today we use that as a breadcrumb for whether a frame pointer is going to be required or not.It turns out that we have already computed this information outside x86: it is basically the same thing as the size of the fixed outgoing arg area. Hence this PR changes things to avoid running
StackLevelSetterand instead reuse the information computed earlier.This is not always possible as
StackLevelSetteris also used to optimize throw helpers. But we can detect this situation ahead of time and skip it in most cases.StackLevelSetterwas running in MinOpts before, so this shows up as a nice MinOpts TP win.On a side note: it seems unlikely that we actually need to switch to frame pointer outside x86 here, but I'm leaving the logic intact for now (hence this is no-diff).