Skip to content

Conversation

@AndyAyersMS
Copy link
Member

Existing OSR stress use existing patchpoint placement logic and alter the
policy settings so that OSR methods are created more eagerly.

This new version of OSR stress alters the patchpoint placement logic, to
add patchpoints to more places. In conjunction with the eager policy stress
above this leads to creation and execution of large numbers of OSR methods.

Any IL offset in the method with an empty stack (and not in a handler) is fair
game for a patchpoint, so this new method randomly adds patchpoints to the
starts of blocks when stack empty (future work may extend this to mid-block
stack empty points).

The new mode is enabled by setting COMPlus_JitRandomOnStackReplacement to a
non-zero value; this value represents the likelihood of adding a patchpoint
at a stack-empty block start, and also factors into the random seed. (Recall
this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the
start of every stack empty block).

Various values are interesting because once a method reaches a patchpoint
and triggers OSR, the remainder of that method's execution is in the OSR method,
so later patchpoints in the original method may never be reached. So some sort
of random/selective patchpoint approach (in conjunction with varying policy
settings) is needed to ensure that we create an execute as many different OSR
variants as possible.

This PR also includes a couple of fixes exposed by local testing of this new
stress mode:

  • The OSR prolog may end up empty, which gcencoder doesn't like. Detect this
    and add a nop for the prolog.
  • If we're importing the fgEntryBB during OSR, we don't need to schedule it
    for re-importation. This happens if we put a patchpoint at IL offset zero.
  • Update the selective dumping policy COMPlus_JitDumpAtOSROoffset to only
    dump OSR method compilations.

A new test leg is added to jit-experimental to run with this mode enabled
with a probability of 21% (0x15) and quick OSR triggers.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2021
@ghost ghost assigned AndyAyersMS Dec 18, 2021
@ghost
Copy link

ghost commented Dec 18, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Existing OSR stress use existing patchpoint placement logic and alter the
policy settings so that OSR methods are created more eagerly.

This new version of OSR stress alters the patchpoint placement logic, to
add patchpoints to more places. In conjunction with the eager policy stress
above this leads to creation and execution of large numbers of OSR methods.

Any IL offset in the method with an empty stack (and not in a handler) is fair
game for a patchpoint, so this new method randomly adds patchpoints to the
starts of blocks when stack empty (future work may extend this to mid-block
stack empty points).

The new mode is enabled by setting COMPlus_JitRandomOnStackReplacement to a
non-zero value; this value represents the likelihood of adding a patchpoint
at a stack-empty block start, and also factors into the random seed. (Recall
this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the
start of every stack empty block).

Various values are interesting because once a method reaches a patchpoint
and triggers OSR, the remainder of that method's execution is in the OSR method,
so later patchpoints in the original method may never be reached. So some sort
of random/selective patchpoint approach (in conjunction with varying policy
settings) is needed to ensure that we create an execute as many different OSR
variants as possible.

This PR also includes a couple of fixes exposed by local testing of this new
stress mode:

  • The OSR prolog may end up empty, which gcencoder doesn't like. Detect this
    and add a nop for the prolog.
  • If we're importing the fgEntryBB during OSR, we don't need to schedule it
    for re-importation. This happens if we put a patchpoint at IL offset zero.
  • Update the selective dumping policy COMPlus_JitDumpAtOSROoffset to only
    dump OSR method compilations.

A new test leg is added to jit-experimental to run with this mode enabled
with a probability of 21% (0x15) and quick OSR triggers.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

Comment on lines 11804 to 11805
CLRRandom* const random = impInlineRoot()->m_inlineStrategy->GetRandom(doRandomOSR);
const int randomValue = (int)random->Next(101);
Copy link
Member

@jakobbotsch jakobbotsch Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be random->Next(100) for 0x64 to be 100% probability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the above.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Dec 20, 2021

Looks like a couple of new failures on arm64 from the new random stress. Have started poking at these.

  • Runtime_1241 -- we're asserting that an osr local can't be a multireg arg; this seems unnecessary as OSR methods are not passed args, they read homed values from the Tier0 frame. Think we can just delete the assert.
  • Runtime_33529 -- jit crash: a recursive tail call gets introduced by inlining, making the original entry block reachable, but OSR didn't import the original entry block. Seems like if we see a tail call that's an inline candidate, we may have to assume it could eventually introduce a tail recursive call and do speculative importation of the entry path for the root method.
  • GitHub_17777 -- Enormous frame plus lots of args. OSR prolog overflows the prolog IG as we materialize a lot of big constants for addressing. Not padding the OSR frame might help some, seems like we also might want to consider CSEing some of the addressing math. Even that might eventually lead to prolog overflow, though it might not, as there can only be so many enregistered args and locals. We might need a more general fix like moving this initialization outside the prolog...?

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jan 5, 2022
OSR wasn't aggressive enough in importing the original method entry,
so if an inlinee introduced a recursive tail call that we wanted to
turn into a loop, we might find that the target block for the loop
branch never got created.

Update the logic so that we import the entry if we're in the root
method and we have an inlineable call in tail position. This will
over-import in many cases but if those blocks turn out to be unreachable
they will usually be removed without impacting final code gen.

Fixes one of the OSR stress mode failures seen in dotnet#62980.
AndyAyersMS added a commit that referenced this pull request Jan 11, 2022
OSR wasn't aggressive enough in importing the original method entry,
so if an inlinee introduced a recursive tail call that we wanted to
turn into a loop, we might find that the target block for the loop
branch never got created.

Update the logic so that we import the entry if we're in the root
method and we have an inlineable call in tail position. This will
over-import in many cases but if those blocks turn out to be unreachable
they will usually be removed without impacting final code gen.

Fixes one of the OSR stress mode failures seen in #62980.
Existing OSR stress use existing patchpoint placement logic and alter the
policy settings so that OSR methods are created more eagerly.

This new version of OSR stress alters the patchpoint placement logic, to
add patchpoints to more places. In conjunction with the eager policy stress
above this leads to creation and execution of large numbers of OSR methods.

Any IL offset in the method with an empty stack (and not in a handler) is fair
game for a patchpoint, so this new method randomly adds patchpoints to the
starts of blocks when stack empty (future work may extend this to mid-block
stack empty points).

The new mode is enabled by setting `COMPlus_JitRandomOnStackReplacement` to a
non-zero value; this value represents the likelihood of adding a patchpoint
at a stack-empty block start, and also factors into the random seed. (Recall
this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the
start of every stack empty block).

Various values are interesting because once a method reaches a patchpoint
and triggers OSR, the remainder of that method's execution is in the OSR method,
so later patchpoints in the original method may never be reached. So some sort
of random/selective patchpoint approach (in conjunction with varying policy
settings) is needed to ensure that we create an execute as many different OSR
variants as possible.

This PR also includes a couple of fixes exposed by local testing of this new
stress mode:
* The OSR prolog may end up empty, which gcencoder doesn't like. Detect this
and add a `nop` for the prolog.
* If we're importing the `fgEntryBB` during OSR, we don't need to schedule it
for re-importation. This happens if we put a patchpoint at IL offset zero.
* Update the selective dumping policy `COMPlus_JitDumpAtOSROoffset` to only
dump OSR method compilations.

A new test leg is added to `jit-experimental` to run with this mode enabled
with a probability of 21% (0x15) and quick OSR triggers.
@AndyAyersMS
Copy link
Member Author

Updated with fixes. May still be one or two failures left.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Fix interaction of stress patchpoints and profile instrumentation.
We need to force block-based instrumentation if we might have stress
patchpoints.
@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib PTAL.

There is still one OSR stress failure I'd like to fix, but it can wait for a future PR.

Comment on lines +7539 to +7547
// For OSR we may have a zero-length prolog. That's not supported
// when the method must report a generics context,/ so add a nop if so.
//
if (compiler->opts.IsOSR() && (GetEmitter()->emitGetPrologOffsetEstimate() == 0) &&
(compiler->lvaReportParamTypeArg() || compiler->lvaKeepAliveAndReportThis()))
{
JITDUMP("OSR: prolog was zero length and has generic context to report: adding nop to pad prolog.\n");
instGen(INS_nop);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my curiosity, why/where is this not supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may well be a superfluous assert:

void GcInfoEncoder::SetPrologSize( UINT32 prologSize )
{
_ASSERTE(prologSize != 0);
_ASSERTE(m_GSCookieValidRangeStart == 0 || m_GSCookieValidRangeStart == prologSize);
_ASSERTE(m_GSCookieValidRangeEnd == (UINT32)(-1) || m_GSCookieValidRangeEnd == prologSize+1);
m_GSCookieValidRangeStart = prologSize;
// satisfy asserts that assume m_GSCookieValidRangeStart != 0 ==> m_GSCookieValidRangeStart < m_GSCookieValidRangeEnd
m_GSCookieValidRangeEnd = prologSize+1;
}

The jit only uses SetPrologSize if we're reporting a generics context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, perhaps something to relax in another PR.

@AndyAyersMS AndyAyersMS merged commit cde93b6 into dotnet:main Jan 19, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants