-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: yet another OSR stress mode #62980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsExisting OSR stress use existing patchpoint placement logic and alter the This new version of OSR stress alters the patchpoint placement logic, to Any IL offset in the method with an empty stack (and not in a handler) is fair The new mode is enabled by setting Various values are interesting because once a method reaches a patchpoint This PR also includes a couple of fixes exposed by local testing of this new
A new test leg is added to
|
|
cc @dotnet/jit-contrib |
src/coreclr/jit/importer.cpp
Outdated
| CLRRandom* const random = impInlineRoot()->m_inlineStrategy->GetRandom(doRandomOSR); | ||
| const int randomValue = (int)random->Next(101); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be random->Next(100) for 0x64 to be 100% probability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
jakobbotsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from the above.
|
/azp run runtime-jit-experimental |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Looks like a couple of new failures on arm64 from the new random stress. Have started poking at these.
|
OSR wasn't aggressive enough in importing the original method entry, so if an inlinee introduced a recursive tail call that we wanted to turn into a loop, we might find that the target block for the loop branch never got created. Update the logic so that we import the entry if we're in the root method and we have an inlineable call in tail position. This will over-import in many cases but if those blocks turn out to be unreachable they will usually be removed without impacting final code gen. Fixes one of the OSR stress mode failures seen in dotnet#62980.
OSR wasn't aggressive enough in importing the original method entry, so if an inlinee introduced a recursive tail call that we wanted to turn into a loop, we might find that the target block for the loop branch never got created. Update the logic so that we import the entry if we're in the root method and we have an inlineable call in tail position. This will over-import in many cases but if those blocks turn out to be unreachable they will usually be removed without impacting final code gen. Fixes one of the OSR stress mode failures seen in #62980.
Existing OSR stress use existing patchpoint placement logic and alter the policy settings so that OSR methods are created more eagerly. This new version of OSR stress alters the patchpoint placement logic, to add patchpoints to more places. In conjunction with the eager policy stress above this leads to creation and execution of large numbers of OSR methods. Any IL offset in the method with an empty stack (and not in a handler) is fair game for a patchpoint, so this new method randomly adds patchpoints to the starts of blocks when stack empty (future work may extend this to mid-block stack empty points). The new mode is enabled by setting `COMPlus_JitRandomOnStackReplacement` to a non-zero value; this value represents the likelihood of adding a patchpoint at a stack-empty block start, and also factors into the random seed. (Recall this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the start of every stack empty block). Various values are interesting because once a method reaches a patchpoint and triggers OSR, the remainder of that method's execution is in the OSR method, so later patchpoints in the original method may never be reached. So some sort of random/selective patchpoint approach (in conjunction with varying policy settings) is needed to ensure that we create an execute as many different OSR variants as possible. This PR also includes a couple of fixes exposed by local testing of this new stress mode: * The OSR prolog may end up empty, which gcencoder doesn't like. Detect this and add a `nop` for the prolog. * If we're importing the `fgEntryBB` during OSR, we don't need to schedule it for re-importation. This happens if we put a patchpoint at IL offset zero. * Update the selective dumping policy `COMPlus_JitDumpAtOSROoffset` to only dump OSR method compilations. A new test leg is added to `jit-experimental` to run with this mode enabled with a probability of 21% (0x15) and quick OSR triggers.
e814901 to
038b4b7
Compare
|
Updated with fixes. May still be one or two failures left. |
|
/azp run runtime-jit-experimental |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Fix interaction of stress patchpoints and profile instrumentation. We need to force block-based instrumentation if we might have stress patchpoints.
|
/azp run runtime-jit-experimental |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@dotnet/jit-contrib PTAL. There is still one OSR stress failure I'd like to fix, but it can wait for a future PR. |
| // For OSR we may have a zero-length prolog. That's not supported | ||
| // when the method must report a generics context,/ so add a nop if so. | ||
| // | ||
| if (compiler->opts.IsOSR() && (GetEmitter()->emitGetPrologOffsetEstimate() == 0) && | ||
| (compiler->lvaReportParamTypeArg() || compiler->lvaKeepAliveAndReportThis())) | ||
| { | ||
| JITDUMP("OSR: prolog was zero length and has generic context to report: adding nop to pad prolog.\n"); | ||
| instGen(INS_nop); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my curiosity, why/where is this not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may well be a superfluous assert:
runtime/src/coreclr/gcinfo/gcinfoencoder.cpp
Lines 706 to 715 in 0266f03
| void GcInfoEncoder::SetPrologSize( UINT32 prologSize ) | |
| { | |
| _ASSERTE(prologSize != 0); | |
| _ASSERTE(m_GSCookieValidRangeStart == 0 || m_GSCookieValidRangeStart == prologSize); | |
| _ASSERTE(m_GSCookieValidRangeEnd == (UINT32)(-1) || m_GSCookieValidRangeEnd == prologSize+1); | |
| m_GSCookieValidRangeStart = prologSize; | |
| // satisfy asserts that assume m_GSCookieValidRangeStart != 0 ==> m_GSCookieValidRangeStart < m_GSCookieValidRangeEnd | |
| m_GSCookieValidRangeEnd = prologSize+1; | |
| } |
The jit only uses SetPrologSize if we're reporting a generics context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, perhaps something to relax in another PR.
Existing OSR stress use existing patchpoint placement logic and alter the
policy settings so that OSR methods are created more eagerly.
This new version of OSR stress alters the patchpoint placement logic, to
add patchpoints to more places. In conjunction with the eager policy stress
above this leads to creation and execution of large numbers of OSR methods.
Any IL offset in the method with an empty stack (and not in a handler) is fair
game for a patchpoint, so this new method randomly adds patchpoints to the
starts of blocks when stack empty (future work may extend this to mid-block
stack empty points).
The new mode is enabled by setting
COMPlus_JitRandomOnStackReplacementto anon-zero value; this value represents the likelihood of adding a patchpoint
at a stack-empty block start, and also factors into the random seed. (Recall
this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the
start of every stack empty block).
Various values are interesting because once a method reaches a patchpoint
and triggers OSR, the remainder of that method's execution is in the OSR method,
so later patchpoints in the original method may never be reached. So some sort
of random/selective patchpoint approach (in conjunction with varying policy
settings) is needed to ensure that we create an execute as many different OSR
variants as possible.
This PR also includes a couple of fixes exposed by local testing of this new
stress mode:
and add a
nopfor the prolog.fgEntryBBduring OSR, we don't need to schedule itfor re-importation. This happens if we put a patchpoint at IL offset zero.
COMPlus_JitDumpAtOSROoffsetto onlydump OSR method compilations.
A new test leg is added to
jit-experimentalto run with this mode enabledwith a probability of 21% (0x15) and quick OSR triggers.