JIT: yet another OSR stress mode #62980

AndyAyersMS · 2021-12-18T07:10:38Z

Existing OSR stress use existing patchpoint placement logic and alter the
policy settings so that OSR methods are created more eagerly.

This new version of OSR stress alters the patchpoint placement logic, to
add patchpoints to more places. In conjunction with the eager policy stress
above this leads to creation and execution of large numbers of OSR methods.

Any IL offset in the method with an empty stack (and not in a handler) is fair
game for a patchpoint, so this new method randomly adds patchpoints to the
starts of blocks when stack empty (future work may extend this to mid-block
stack empty points).

The new mode is enabled by setting COMPlus_JitRandomOnStackReplacement to a
non-zero value; this value represents the likelihood of adding a patchpoint
at a stack-empty block start, and also factors into the random seed. (Recall
this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the
start of every stack empty block).

Various values are interesting because once a method reaches a patchpoint
and triggers OSR, the remainder of that method's execution is in the OSR method,
so later patchpoints in the original method may never be reached. So some sort
of random/selective patchpoint approach (in conjunction with varying policy
settings) is needed to ensure that we create an execute as many different OSR
variants as possible.

This PR also includes a couple of fixes exposed by local testing of this new
stress mode:

The OSR prolog may end up empty, which gcencoder doesn't like. Detect this
and add a nop for the prolog.
If we're importing the fgEntryBB during OSR, we don't need to schedule it
for re-importation. This happens if we put a patchpoint at IL offset zero.
Update the selective dumping policy COMPlus_JitDumpAtOSROoffset to only
dump OSR method compilations.

A new test leg is added to jit-experimental to run with this mode enabled
with a probability of 21% (0x15) and quick OSR triggers.

ghost · 2021-12-18T07:10:44Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Existing OSR stress use existing patchpoint placement logic and alter the
policy settings so that OSR methods are created more eagerly.

This new version of OSR stress alters the patchpoint placement logic, to
add patchpoints to more places. In conjunction with the eager policy stress
above this leads to creation and execution of large numbers of OSR methods.

Any IL offset in the method with an empty stack (and not in a handler) is fair
game for a patchpoint, so this new method randomly adds patchpoints to the
starts of blocks when stack empty (future work may extend this to mid-block
stack empty points).

The new mode is enabled by setting COMPlus_JitRandomOnStackReplacement to a
non-zero value; this value represents the likelihood of adding a patchpoint
at a stack-empty block start, and also factors into the random seed. (Recall
this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the
start of every stack empty block).

Various values are interesting because once a method reaches a patchpoint
and triggers OSR, the remainder of that method's execution is in the OSR method,
so later patchpoints in the original method may never be reached. So some sort
of random/selective patchpoint approach (in conjunction with varying policy
settings) is needed to ensure that we create an execute as many different OSR
variants as possible.

This PR also includes a couple of fixes exposed by local testing of this new
stress mode:

The OSR prolog may end up empty, which gcencoder doesn't like. Detect this
and add a nop for the prolog.
If we're importing the fgEntryBB during OSR, we don't need to schedule it
for re-importation. This happens if we put a patchpoint at IL offset zero.
Update the selective dumping policy COMPlus_JitDumpAtOSROoffset to only
dump OSR method compilations.

A new test leg is added to jit-experimental to run with this mode enabled
with a probability of 21% (0x15) and quick OSR triggers.

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2021-12-18T07:10:51Z

cc @dotnet/jit-contrib

jakobbotsch · 2021-12-18T10:13:58Z

src/coreclr/jit/importer.cpp

+            CLRRandom* const random      = impInlineRoot()->m_inlineStrategy->GetRandom(doRandomOSR);
+            const int        randomValue = (int)random->Next(101);


Shouldn't this be random->Next(100) for 0x64 to be 100% probability?

jakobbotsch

LGTM apart from the above.

AndyAyersMS · 2021-12-18T16:33:45Z

/azp run runtime-jit-experimental

azure-pipelines · 2021-12-18T16:33:59Z

Azure Pipelines successfully started running 1 pipeline(s).

AndyAyersMS · 2021-12-20T01:50:35Z

Looks like a couple of new failures on arm64 from the new random stress. Have started poking at these.

Runtime_1241 -- we're asserting that an osr local can't be a multireg arg; this seems unnecessary as OSR methods are not passed args, they read homed values from the Tier0 frame. Think we can just delete the assert.
Runtime_33529 -- jit crash: a recursive tail call gets introduced by inlining, making the original entry block reachable, but OSR didn't import the original entry block. Seems like if we see a tail call that's an inline candidate, we may have to assume it could eventually introduce a tail recursive call and do speculative importation of the entry path for the root method.
GitHub_17777 -- Enormous frame plus lots of args. OSR prolog overflows the prolog IG as we materialize a lot of big constants for addressing. Not padding the OSR frame might help some, seems like we also might want to consider CSEing some of the addressing math. Even that might eventually lead to prolog overflow, though it might not, as there can only be so many enregistered args and locals. We might need a more general fix like moving this initialization outside the prolog...?

OSR wasn't aggressive enough in importing the original method entry, so if an inlinee introduced a recursive tail call that we wanted to turn into a loop, we might find that the target block for the loop branch never got created. Update the logic so that we import the entry if we're in the root method and we have an inlineable call in tail position. This will over-import in many cases but if those blocks turn out to be unreachable they will usually be removed without impacting final code gen. Fixes one of the OSR stress mode failures seen in dotnet#62980.

OSR wasn't aggressive enough in importing the original method entry, so if an inlinee introduced a recursive tail call that we wanted to turn into a loop, we might find that the target block for the loop branch never got created. Update the logic so that we import the entry if we're in the root method and we have an inlineable call in tail position. This will over-import in many cases but if those blocks turn out to be unreachable they will usually be removed without impacting final code gen. Fixes one of the OSR stress mode failures seen in #62980.

Existing OSR stress use existing patchpoint placement logic and alter the policy settings so that OSR methods are created more eagerly. This new version of OSR stress alters the patchpoint placement logic, to add patchpoints to more places. In conjunction with the eager policy stress above this leads to creation and execution of large numbers of OSR methods. Any IL offset in the method with an empty stack (and not in a handler) is fair game for a patchpoint, so this new method randomly adds patchpoints to the starts of blocks when stack empty (future work may extend this to mid-block stack empty points). The new mode is enabled by setting `COMPlus_JitRandomOnStackReplacement` to a non-zero value; this value represents the likelihood of adding a patchpoint at a stack-empty block start, and also factors into the random seed. (Recall this is parsed in hex, so 0x64 == 100 or larger will put a patchpoint at the start of every stack empty block). Various values are interesting because once a method reaches a patchpoint and triggers OSR, the remainder of that method's execution is in the OSR method, so later patchpoints in the original method may never be reached. So some sort of random/selective patchpoint approach (in conjunction with varying policy settings) is needed to ensure that we create an execute as many different OSR variants as possible. This PR also includes a couple of fixes exposed by local testing of this new stress mode: * The OSR prolog may end up empty, which gcencoder doesn't like. Detect this and add a `nop` for the prolog. * If we're importing the `fgEntryBB` during OSR, we don't need to schedule it for re-importation. This happens if we put a patchpoint at IL offset zero. * Update the selective dumping policy `COMPlus_JitDumpAtOSROoffset` to only dump OSR method compilations. A new test leg is added to `jit-experimental` to run with this mode enabled with a probability of 21% (0x15) and quick OSR triggers.

AndyAyersMS · 2022-01-11T20:32:55Z

Updated with fixes. May still be one or two failures left.

AndyAyersMS · 2022-01-11T20:32:59Z

/azp run runtime-jit-experimental

azure-pipelines · 2022-01-11T20:33:15Z

Azure Pipelines successfully started running 1 pipeline(s).

Fix interaction of stress patchpoints and profile instrumentation. We need to force block-based instrumentation if we might have stress patchpoints.

AndyAyersMS · 2022-01-15T17:53:30Z

/azp run runtime-jit-experimental

azure-pipelines · 2022-01-15T17:53:44Z

Azure Pipelines successfully started running 1 pipeline(s).

AndyAyersMS · 2022-01-18T01:05:46Z

@dotnet/jit-contrib PTAL.

There is still one OSR stress failure I'd like to fix, but it can wait for a future PR.

jakobbotsch · 2022-01-18T20:20:56Z

src/coreclr/jit/codegencommon.cpp

+    // For OSR we may have a zero-length prolog. That's not supported
+    // when the method must report a generics context,/ so add a nop if so.
+    //
+    if (compiler->opts.IsOSR() && (GetEmitter()->emitGetPrologOffsetEstimate() == 0) &&
+        (compiler->lvaReportParamTypeArg() || compiler->lvaKeepAliveAndReportThis()))
+    {
+        JITDUMP("OSR: prolog was zero length and has generic context to report: adding nop to pad prolog.\n");
+        instGen(INS_nop);
+    }


For my curiosity, why/where is this not supported?

I think it may well be a superfluous assert:

runtime/src/coreclr/gcinfo/gcinfoencoder.cpp

Lines 706 to 715 in 0266f03

void GcInfoEncoder::SetPrologSize( UINT32 prologSize )

{

_ASSERTE(prologSize != 0);

_ASSERTE(m_GSCookieValidRangeStart == 0 || m_GSCookieValidRangeStart == prologSize);

_ASSERTE(m_GSCookieValidRangeEnd == (UINT32)(-1) || m_GSCookieValidRangeEnd == prologSize+1);

m_GSCookieValidRangeStart = prologSize;

// satisfy asserts that assume m_GSCookieValidRangeStart != 0 ==> m_GSCookieValidRangeStart < m_GSCookieValidRangeEnd

m_GSCookieValidRangeEnd = prologSize+1;

}

The jit only uses SetPrologSize if we're reporting a generics context.

Ok, perhaps something to relax in another PR.

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2021

ghost assigned AndyAyersMS Dec 18, 2021

jakobbotsch reviewed Dec 18, 2021

View reviewed changes

jakobbotsch approved these changes Dec 18, 2021

View reviewed changes

This was referenced Jan 5, 2022

JIT: rework logic for when OSR imports method entry #63406

Merged

On Stack Replacement Next Steps #33658

Open

AndyAyersMS added 6 commits January 11, 2022 12:31

fix probability calc

9fb2bb4

remove obsolete assert

fb0488f

tweak

9ce139d

osr exposed does not apply to simd

c637486

only pad zero-sized OSR prolog if we're reporting generic context

038b4b7

AndyAyersMS force-pushed the RandomOSRStress branch from e814901 to 038b4b7 Compare January 11, 2022 20:32

Add ability to have patchpoint at specified offset.

ed8f67f

Fix interaction of stress patchpoints and profile instrumentation. We need to force block-based instrumentation if we might have stress patchpoints.

JulieLeeMSFT requested a review from BruceForstall January 18, 2022 19:06

jakobbotsch reviewed Jan 18, 2022

View reviewed changes

jakobbotsch approved these changes Jan 18, 2022

View reviewed changes

AndyAyersMS merged commit cde93b6 into dotnet:main Jan 19, 2022

AndyAyersMS mentioned this pull request Jan 19, 2022

Enable QJFL and OSR by default for x64 and arm64 #63642

Closed

ghost locked as resolved and limited conversation to collaborators Feb 18, 2022

		CLRRandom* const random = impInlineRoot()->m_inlineStrategy->GetRandom(doRandomOSR);
		const int randomValue = (int)random->Next(101);

	void GcInfoEncoder::SetPrologSize( UINT32 prologSize )
	{
	_ASSERTE(prologSize != 0);
	_ASSERTE(m_GSCookieValidRangeStart == 0 \|\| m_GSCookieValidRangeStart == prologSize);
	_ASSERTE(m_GSCookieValidRangeEnd == (UINT32)(-1) \|\| m_GSCookieValidRangeEnd == prologSize+1);

	m_GSCookieValidRangeStart = prologSize;
	// satisfy asserts that assume m_GSCookieValidRangeStart != 0 ==> m_GSCookieValidRangeStart < m_GSCookieValidRangeEnd
	m_GSCookieValidRangeEnd = prologSize+1;
	}

JIT: yet another OSR stress mode #62980

JIT: yet another OSR stress mode #62980

Uh oh!

Conversation

AndyAyersMS commented Dec 18, 2021

Uh oh!

ghost commented Dec 18, 2021

Uh oh!

AndyAyersMS commented Dec 18, 2021

Uh oh!

jakobbotsch Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS Dec 18, 2021

Choose a reason for hiding this comment

Uh oh!

jakobbotsch left a comment

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS commented Dec 18, 2021

Uh oh!

azure-pipelines bot commented Dec 18, 2021

Uh oh!

AndyAyersMS commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndyAyersMS commented Jan 11, 2022

Uh oh!

AndyAyersMS commented Jan 11, 2022

Uh oh!

azure-pipelines bot commented Jan 11, 2022

Uh oh!

AndyAyersMS commented Jan 15, 2022

Uh oh!

azure-pipelines bot commented Jan 15, 2022

Uh oh!

AndyAyersMS commented Jan 18, 2022

Uh oh!

jakobbotsch Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakobbotsch Dec 18, 2021 •

edited

Loading

AndyAyersMS commented Dec 20, 2021 •

edited

Loading