JIT: Add a runtime async optimization to skip saving unmutated locals into reused continuations#125615
JIT: Add a runtime async optimization to skip saving unmutated locals into reused continuations#125615jakobbotsch wants to merge 25 commits intodotnet:mainfrom
Conversation
This reverts commit 8e54df1.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR adds a runtime async optimization to skip saving locals that haven't been mutated since the last resumption point when reusing continuation objects. It builds on PR #125556 (which added continuation reuse) by leveraging the knowledge that a reused continuation already holds correct values for unmutated locals, thus eliminating unnecessary write barriers and improving performance by ~10%.
Changes:
- Introduces
PreservedValueAnalysis, a forward dataflow analysis that computes which tracked locals may have been mutated since the previous resumption point, enabling the optimization to skip saving unchanged locals. - Restructures continuation layout handling: replaces the old per-call
ContinuationLayoutwith aContinuationLayoutBuilder/ContinuationLayoutsplit where a shared layout can be computed across all suspension points, and switches flag encoding fromHAS_*bitmasks to index-based encoding of exception/context/result offsets. - Splits
CreateSuspensionandCreateResumptioninto block-creation and IR-population phases, with the newCreateResumptionsAndSuspensionsmethod driving the two-phase approach and handling shared vs per-call layouts.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/inc/corinfo.h | Replaces HAS_* flag bits with index-based encoding for exception, context, and result offsets |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Updates managed ContinuationFlags enum and access methods to use new index-based encoding |
| src/coreclr/vm/object.h | Updates GetResultStorage and GetExceptionObjectStorage to use index-based decoding |
| src/coreclr/vm/interpexec.cpp | Updates interpreter suspension/resumption to use index-based flag encoding |
| src/coreclr/interpreter/compiler.cpp | Updates interpreter compiler to emit index-based flag encoding |
| src/coreclr/jit/async.h | Introduces ReturnTypeInfo, ReturnInfo, ContinuationLayoutBuilder, AsyncState, SaveSet; restructures ContinuationLayout and AsyncTransformation |
| src/coreclr/jit/async.cpp | Core implementation: PreservedValueAnalysis, CreateSharedLayout, continuation reuse logic, split save sets |
| src/coreclr/jit/jitconfigvalues.h | Adds JitAsyncReuseContinuations and JitAsyncPreservedValueAnalysisRange config knobs |
| src/coreclr/jit/jitstd/vector.h | Adds const overload of data() to support ContainsLocal const method |
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR extends the CoreCLR JIT’s runtime-async transformation to avoid re-saving locals into a reused continuation when they are guaranteed not to have been mutated since the prior resumption, reducing unnecessary stores/write barriers on suspension-heavy async code.
Changes:
- Added a preserved-value analysis to compute “mutated since previous resumption” tracked-local sets and “resume-reachable” regions.
- Extended async suspension/state tracking to carry resumption reachability and mutation information into suspension codegen.
- Introduced a new config range knob for selectively enabling the preserved-value analysis (debug gating).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/coreclr/jit/jitconfigvalues.h | Adds a new config range for preserved value analysis enablement. |
| src/coreclr/jit/async.h | Extends async state and suspension plumbing with preserved-value analysis inputs; introduces SaveSet. |
| src/coreclr/jit/async.cpp | Implements preserved value analysis and uses it to selectively skip saving locals when reusing continuations. |
There was a problem hiding this comment.
Pull request overview
This PR extends CoreCLR JIT’s runtime-async transformation to reduce suspension overhead when reusing continuation instances by skipping stores for locals proven not to have changed since the prior resumption point.
Changes:
- Add a new preserved-value dataflow analysis to compute “mutated since previous resumption” local sets.
- Thread the analysis results into async suspension generation and selectively store only mutated locals when a continuation is being reused.
- Add a new JIT config knob for enabling/debugging the preserved-value analysis by method-hash range.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/coreclr/jit/jitconfigvalues.h |
Adds JitAsyncPreservedValueAnalysisRange config to gate/debug preserved-value analysis by method hash range. |
src/coreclr/jit/async.h |
Extends AsyncState with resumption reachability + mutation set; introduces SaveSet to control selective saving. |
src/coreclr/jit/async.cpp |
Implements preserved-value analysis and uses it to split suspension save paths, avoiding stores for unmutated locals on continuation reuse. |
Comments suppressed due to low confidence (1)
src/coreclr/jit/async.cpp:1868
mutatedSinceResumptionis captured fromlife.GetMutatedSinceResumption()before the call is canonicalized and beforelife->Update(call)runs. SinceAsyncLiveness::Updatedeliberately marks call defs/address-taken as mutated-since-resumption, the set stored inAsyncStatecan miss mutations introduced by the await call itself (or its associated definition store), which can causeSaveSet::MutatedLocalsto skip saving a local into a reused continuation. Consider capturingmutatedSinceResumptionafterCanonicalizeCallDefinition(..., &life)(or otherwise ensuringlife.Update(call)has run) so the saved set reflects mutations up to the actual suspension point.
bool resumeReachable = life.IsResumeReachable();
VARSET_TP mutatedSinceResumption(VarSetOps::MakeCopy(m_compiler, life.GetMutatedSinceResumption()));
JITDUMP(" This suspension point is%s resume-reachable\n", resumeReachable ? "" : " NOT");
if (resumeReachable)
{
JITDUMP(" Locals mutated since previous resumption: ");
DBEXEC(VERBOSE, PrintVarSet(m_compiler, mutatedSinceResumption));
}
ContinuationLayoutBuilder* layoutBuilder = new (m_compiler, CMK_Async) ContinuationLayoutBuilder(m_compiler);
CreateLiveSetForSuspension(block, call, defs, life, layoutBuilder);
BuildContinuation(block, call, ContinuationNeedsKeepAlive(life), layoutBuilder);
CallDefinitionInfo callDefInfo = CanonicalizeCallDefinition(block, call, &life);
src/coreclr/jit/async.cpp
Outdated
| if ((saveSet != SaveSet::All) && | ||
| ((saveSet == SaveSet::UnmutatedLocals) != IsLocalUnmutatedSinceLastResumption(dsc, mutatedSinceResumption))) | ||
| { | ||
| continue; |
There was a problem hiding this comment.
The SaveSet filter in FillInDataOnSuspension uses a boolean inequality check to decide whether to store a local: ((saveSet == SaveSet::UnmutatedLocals) != IsLocalUnmutatedSinceLastResumption(...)). This is correct but fairly hard to reason about and easy to accidentally invert during future edits. Consider rewriting this as an explicit switch/if with clearly named predicates for the MutatedLocals vs UnmutatedLocals cases to reduce maintenance risk.
| if ((saveSet != SaveSet::All) && | |
| ((saveSet == SaveSet::UnmutatedLocals) != IsLocalUnmutatedSinceLastResumption(dsc, mutatedSinceResumption))) | |
| { | |
| continue; | |
| if (saveSet != SaveSet::All) | |
| { | |
| const bool isUnmutatedLocal = IsLocalUnmutatedSinceLastResumption(dsc, mutatedSinceResumption); | |
| switch (saveSet) | |
| { | |
| case SaveSet::UnmutatedLocals: | |
| if (!isUnmutatedLocal) | |
| { | |
| continue; | |
| } | |
| break; | |
| case SaveSet::MutatedLocals: | |
| if (isUnmutatedLocal) | |
| { | |
| continue; | |
| } | |
| break; | |
| default: | |
| break; | |
| } |
With #125556 we learn something whenever we reuse a continuation -- specifically that the continuation was created at one of the other suspension points that can reach the current suspension point. We can use that knowledge to skip saving all locals that cannot possibly have been mutated since any previous suspension point. This saves a lot of write barriers when we reuse continuations.
Micro benchmark with warmup
(with
DOTNET_TC_OnStackReplacement=0due to #120865) this improves performance by about 10%.Codegen diff