Optimize runtime async suspend/resume machinery#127336
Optimize runtime async suspend/resume machinery#127336jakobbotsch merged 8 commits intodotnet:mainfrom
Conversation
32b5875 to
c3b85d5
Compare
|
Tagging subscribers to this area: @agocke |
Several optimizations around suspension/resumption: - Reduce number of TLS accesses by storing `Thread.CurrentThread` and `&AsyncDispatcherInfo.t_current` inside `RuntimeAsyncAwaitState`, and only accessing `RuntimeAsyncAwaitState` - Remove a number of write barriers by moving TLS object fields into a `ref struct`. Allocate this ref struct on the stack in the two places that initiate runtime async chains: task-returning thunks and `DispatchContinuations`. Keep a pointer to this in the TLS. - Use `Unsafe` in a couple of places to avoid unnecessary cast checks on the hot path For a suspension heavy benchmark this improves performance by around 25%.
c3b85d5 to
12d2817
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes CoreCLR “runtime async” suspension/resumption by reducing TLS traffic, minimizing write barriers on hot paths, and consolidating context-handling work into new helpers used by the JIT’s async transformation.
Changes:
- Refactors runtime-async state to cache TLS-derived values and to move notifier/context references into a stack-allocated “stack state” accessed via the thread-static await state.
- Extends
CORINFO_ASYNC_INFOand JIT/EE plumbing with new “finish suspension” helper method handles, and updates the JIT async transform to use them. - Updates task-returning thunk emission (VM + ILCompiler stubs) and related tooling (SuperPMI, R2R/AOT scanners) to match the new runtime-async APIs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs | Exposes InstanceIsFlowSuppressed for optimized flow-suppression checks. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.cs | Updates leaf await helpers to use stack-backed runtime-async state. |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Introduces stack-based runtime-async state, new finish-suspension helpers, and updates dispatch/suspension logic. |
| src/coreclr/inc/corinfo.h | Extends CORINFO_ASYNC_INFO with new helper method handles. |
| src/coreclr/vm/jitinterface.cpp | Populates new async helper handles in CEEInfo::getAsyncInfo. |
| src/coreclr/vm/metasig.h | Adds metasigs for updated thunk finalization signatures. |
| src/coreclr/vm/corelib.h | Updates CoreLib binder entries for new helpers, thunk signatures, and runtime-async nested types/field. |
| src/coreclr/vm/asyncthunks.cpp | Updates task-returning thunk IL emission to push/pop the new await state and pass it to finalizers. |
| src/coreclr/jit/async.h | Declares new JIT helper routines to finish suspension context handling. |
| src/coreclr/jit/async.cpp | Reworks suspension context handling to use new finish-suspension helpers and adjusts capture/restore sequence. |
| src/coreclr/tools/superpmi/superpmi-shared/agnostic.h | Extends SuperPMI agnostic async-info struct for new handles. |
| src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp | Records/replays new async-info handles. |
| src/coreclr/tools/Common/JitInterface/CorInfoTypes.cs | Extends managed projection of CORINFO_ASYNC_INFO. |
| src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs | Emits new helper handles for the managed JIT interface implementation. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs | Adds R2R references to new finish-suspension helpers. |
| src/coreclr/tools/aot/ILCompiler.Compiler/IL/ILImporter.Scanner.cs | Ensures AOT scanning adds dependencies on new finish-suspension helpers. |
| src/coreclr/tools/Common/TypeSystem/IL/Stubs/AsyncThunks.cs | Updates IL stub emission for task-returning thunks to use new await-state push/pop + finalizer signatures. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes CoreCLR “runtime async” suspend/resume paths by reducing TLS accesses, lowering GC write barrier traffic, and avoiding some hot-path cast checks. It does so by introducing a stack-allocated state container that’s referenced via a per-thread TLS struct and by updating the thunk emitters to pass the TLS state byref.
Changes:
- Introduces a stack-allocated
RuntimeAsyncStackStateand threads it through runtime-async chains viaRuntimeAsyncAwaitState.Push/Pop. - Updates runtime-emitted task-returning thunks (VM + IL emitter) to initialize/teardown the new TLS stack state and pass
ref RuntimeAsyncAwaitStateinto finalize helpers. - Switches a few hot-path casts to
Unsafe.Asand refactors await helpers to write notifier/context data into the stack state.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.cs | Updates public await helper intrinsics to write notifier into stack state via TLS. |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Replaces ExecutionAndSyncBlockStore with stack state + TLS Push/Pop, updates dispatch/finalize/handle-suspend flow. |
| src/coreclr/vm/metasig.h | Adds metasigs for finalize helpers that now take ref RuntimeAsyncAwaitState. |
| src/coreclr/vm/corelib.h | Updates CoreLib binder entries for new TLS field, nested types, and finalize helper signatures. |
| src/coreclr/vm/asyncthunks.cpp | Updates IL stub emission to Push/Pop TLS stack state and pass ref state into finalize helpers. |
| src/coreclr/tools/Common/TypeSystem/IL/Stubs/AsyncThunks.cs | Mirrors VM thunk emission changes in the managed IL emitter. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes CoreCLR “runtime async” suspension/resumption by reducing TLS lookups and write barriers, primarily by introducing a stack-allocated async state block that’s referenced from TLS during async-chain execution.
Changes:
- Rework async await state handling to route notifier/context storage through a stack-allocated
RuntimeAsyncStackStatelinked via TLSRuntimeAsyncAwaitState. - Update CoreCLR async thunk IL emission (VM + managed emitter) to
Push/Popthe new TLS state and to pass the TLS state byref intoFinalize*ReturningThunk. - Adjust CoreLib binder/metasig definitions to match the new helper signatures and types.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.cs | Update leaf await helpers to write notifier data into stack state via t_runtimeAsyncAwaitState.StackState. |
| src/coreclr/vm/metasig.h | Add new metasigs for Finalize*ReturningThunk(ref RuntimeAsyncAwaitState) (Task/ValueTask, generic and non-generic). |
| src/coreclr/vm/corelib.h | Update binder definitions: remove ExecutionAndSyncBlockStore, add TLS field and nested async state types/methods. |
| src/coreclr/vm/asyncthunks.cpp | Update VM-emitted task-returning thunk IL to initialize and push/pop the new runtime async state and pass it to finalizers. |
| src/coreclr/tools/Common/TypeSystem/IL/Stubs/AsyncThunks.cs | Mirror VM thunk emission updates in the managed IL emitter (push/pop + updated finalizer signatures). |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Implement new stack/TLS async state structs, update dispatch/suspension logic, and adjust hot-path casts using Unsafe. |
|
PTAL @VSadov |
There was a problem hiding this comment.
Pull request overview
This PR optimizes CoreCLR’s runtime-async suspend/resume path by reducing TLS traffic and GC write barriers, primarily by moving per-suspension state into a stack-allocated ref struct and caching Thread.CurrentThread in the TLS state.
Changes:
- Introduce stack-allocated runtime-async state (
RuntimeAsyncStackState) and keep only a pointer to it in TLS (RuntimeAsyncAwaitState). - Update task-returning thunk emission (VM + managed typesystem) to
Push/Popruntime-async state and passref RuntimeAsyncAwaitStateinto finalization helpers. - Update CoreLib binder signatures (
corelib.h/metasig.h) to match the new helper method signatures and new nested types/fields.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.cs | Switch await helpers to use stack-state via t_runtimeAsyncAwaitState.StackState and reduce TLS accesses. |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Add RuntimeAsyncStackState + new TLS layout; adjust suspension/dispatch/finalization paths; use Unsafe.As for hot casts. |
| src/coreclr/vm/asyncthunks.cpp | Update VM-emitted task-returning thunk IL to Push/Pop runtime-async state and call new finalize signatures. |
| src/coreclr/tools/Common/TypeSystem/IL/Stubs/AsyncThunks.cs | Mirror VM thunk emission changes in the managed typesystem IL stub emitter. |
| src/coreclr/vm/metasig.h | Add metasig variants for finalize helpers that take ref RuntimeAsyncAwaitState. |
| src/coreclr/vm/corelib.h | Bind new nested types/field and update method signatures used by the VM binder. |
Several optimizations around suspension/resumption:
Thread.CurrentThreadinsideRuntimeAsyncAwaitState, and only accessingRuntimeAsyncAwaitStateref struct. Allocate this ref struct on the stack in the two places that initiate runtime async chains: task-returning thunks andDispatchContinuations. Keep a pointer to this in the TLS.Unsafein a couple of places to avoid unnecessary cast checks on the hot pathFor a suspension heavy benchmark this improves performance by around 17%.
Example benchmark
Before: Took 350.3 ms
After: Took 291.3 ms