[Async v2] Support Environment.StackTrace#125396
[Async v2] Support Environment.StackTrace#125396tommcdon wants to merge 34 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @steveisok, @dotnet/area-system-diagnostics |
There was a problem hiding this comment.
Pull request overview
Adds support for reconstructing the logical async v2 continuation chain in Environment.StackTrace / StackTrace so stack traces taken during continuation dispatch include awaiting callers and hide internal dispatch machinery.
Changes:
- CoreCLR: capture async v2 continuation resume points during stack walking and inject them as
STEF_CONTINUATIONframes while filtering outDispatchContinuations. - NativeAOT: augment the collected IP array by truncating dispatch frames and appending continuation
DiagnosticIPs whenAsyncDispatcherInfo.t_currentis present. - Tests: update existing reflection stack-frame expectations and add a new
env-stacktraceregression test.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/tests/async/reflection/reflection.cs | Updates StackFrame.GetMethod() expectations to reflect injected logical async caller frames. |
| src/tests/async/env-stacktrace/env-stacktrace.csproj | Adds a new async test project for Environment.StackTrace async-v2 behavior. |
| src/tests/async/env-stacktrace/env-stacktrace.cs | New test validating that logical async callers appear and DispatchContinuations is filtered. |
| src/coreclr/vm/debugdebugger.h | Extends stack-walk data to track presence of async frames and collect continuation resume points. |
| src/coreclr/vm/debugdebugger.cpp | Extracts continuation chain from AsyncDispatcherInfo.t_current and injects continuation frames during stack trace collection. |
| src/coreclr/nativeaot/System.Private.CoreLib/src/System/Diagnostics/StackTrace.NativeAot.cs | Adds NativeAOT stack trace augmentation to append async continuation frames and truncate dispatch machinery. |
You can also share your feedback on Copilot code review. Take the survey.
92fe972 to
682ce22
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
- Rename DOTNET_HideAsyncDispatchFrames to DOTNET_StackTraceAsyncBehavior - Update config description to cover both Environment.StackTrace and System.Diagnostics.StackTrace - Revert pDispatchContinuationsMD caching (CoreLibBinder already caches) - Remove unnecessary null checks: CoreLibBinder::GetField throws on failure, GetThread() is never null from managed code - Remove left-over mode description comment in NativeAOT - Pin DOTNET_StackTraceAsyncBehavior in all env-stacktrace test projects with RequiresProcessIsolation - Cache env var read in NativeAOT GetHideAsyncDispatchMode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… comparison - Cache DispatchContinuations method start address in StackTraceMetadata during first metadata resolution instead of matching names at runtime - Add IsAsyncDispatchBoundaryMethod virtual callback for address comparison - Add fallback in NativeAOT: if boundary not found (e.g. inlined), append continuation frames at end of stack trace - Use fully-qualified const strings for boundary method identification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Exception stack traces should not be modified by DOTNET_StackTraceAsyncBehavior to maintain consistency with CoreCLR, where exception traces go through a separate path that does not apply async frame hiding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| /// | ||
| /// Async Stack Traces | ||
| /// | ||
| RETAIL_CONFIG_DWORD_INFO(EXTERNAL_StackTraceAsyncBehavior, W("StackTraceAsyncBehavior"), 1, "Controls async stack trace behavior for Environment.StackTrace and System.Diagnostics.StackTrace: 0 = show all frames above the dispatch boundary with async continuation stitching, 1 (default) = hide non-async frames below the first runtime async frame, 2 = keep non-async frames between runtime async frames but truncate trailing ones, 3 = disable async stitching entirely and show only physical call stack frames.") |
There was a problem hiding this comment.
Nit: #125417 (comment) proposes 0 as the default value. 0 means enable everything (that you probably do not want) in the current scheme. I think it would be more intuitive for 0 to mean default or to mean disable everything.
noahfalk
left a comment
There was a problem hiding this comment.
Mostly minor nits. The only thing that seemed potentially more serious is if the NativeAOT dispatcher IP caching still works when multiple generic instantiations are present.
| /// | ||
| /// Async Stack Traces | ||
| /// | ||
| RETAIL_CONFIG_DWORD_INFO(EXTERNAL_StackTraceAsyncBehavior, W("StackTraceAsyncBehavior"), 1, "Controls async stack trace behavior for Environment.StackTrace and System.Diagnostics.StackTrace: 0 = show all frames above the dispatch boundary with async continuation stitching, 1 (default) = hide non-async frames below the first runtime async frame, 2 = keep non-async frames between runtime async frames but truncate trailing ones, 3 = disable async stitching entirely and show only physical call stack frames.") |
There was a problem hiding this comment.
As we mentioned earlier I'd suggest we get rid of option (2) just to lower the number of things we are asking folks for feedback on.
| /// </summary> | ||
| internal bool IsAsyncDispatchBoundary() | ||
| { | ||
| if (!_isStackTraceHidden || _ipAddress == IntPtr.Zero) |
There was a problem hiding this comment.
| if (!_isStackTraceHidden || _ipAddress == IntPtr.Zero) | |
| if (_ipAddress == IntPtr.Zero) |
I'm not sure why we'd need to check _isStackTraceHidden? I assume the dispatch boundary will be hidden, it just seems superfluous to check unless this is a perf optimization?
| // Read the config to determine async behavior before collecting continuations. | ||
| int hideMode = GetHideAsyncDispatchMode(); | ||
|
|
||
| // Mode 3 (physical only): skip continuation collection entirely. |
There was a problem hiding this comment.
Nit: Creating a private enum seems clearer than having to annotate magic values everywhere they appear.
| return null; | ||
|
|
||
| // Only collect continuations from the innermost (first) dispatcher in the chain. | ||
| // Outer dispatchers represent already-completed async scopes and are not displayed. |
There was a problem hiding this comment.
| // Outer dispatchers represent already-completed async scopes and are not displayed. | |
| // Outer dispatchers represent logically distinct async stacks and are not displayed. |
I suspect in many cases the outer dispatchers aren't complete but it doesn't matter because they aren't (necessarily) part of the same async stack.
| } | ||
| Debug.Assert(outputFrameIndex == outputFrameCount); | ||
|
|
||
| // Fallback: if we have continuation IPs but didn't find the dispatch boundary |
There was a problem hiding this comment.
I'd suggest lets not have this fallback. I don't think stitching the continuations in the wrong place on the stack will necessarily be better than not stitching them at all. Ideally we just ensure that we don't get into this buggy situation in the first place.
|
|
||
| // Identify and cache the async dispatch boundary method on first encounter. | ||
| if (_asyncDispatchBoundaryMethodAddress == IntPtr.Zero && | ||
| isStackTraceHidden && |
There was a problem hiding this comment.
Not sure why hidden is part of the check?
| if (_asyncDispatchBoundaryMethodAddress == IntPtr.Zero && | ||
| isStackTraceHidden && | ||
| methodName is AsyncDispatchBoundaryMethodName && | ||
| owningTypeName is AsyncDispatchBoundaryOwningType) |
There was a problem hiding this comment.
Since the owning type is generic is there anything that guarantees us that only one copy of the code exists and it hasn't been specialized based on the generic type argument?
|
|
||
| string stackTrace = AsyncStitchingOuter().GetAwaiter().GetResult(); | ||
|
|
||
| Assert.Contains(nameof(AsyncStitchingMiddle), stackTrace); |
There was a problem hiding this comment.
Nit: It would be nice if the tests verified the order of the frames in addition to their presence. We could split the stack trace by line and assert the method names show up on the expected line.
| @@ -746,5 +746,101 @@ private static void VerifyFrames(StackTrace stackTrace, bool hasFileInfo, int sk | |||
| Assert.NotNull(stackFrame.GetMethod()); | |||
| } | |||
| } | |||
|
|
|||
| [ConditionalFact(typeof(RemoteExecutor), nameof(RemoteExecutor.IsSupported))] | |||
| public void EnvironmentStackTrace_AsyncContinuationStitching() | |||
There was a problem hiding this comment.
Was there any test which verified that synchronous frames at the leaf are always included? I may have missed it.
| { | ||
| IntPtr[] stackIPs = exception.GetStackIPs(); | ||
| InitializeForIpAddressArray(stackIPs, skipFrames, stackIPs.Length, needFileInfo); | ||
| InitializeForIpAddressArray(stackIPs, skipFrames, stackIPs.Length, needFileInfo, null, 0); |
There was a problem hiding this comment.
It is interesting that we changed exception stacktraces unconditionally without a configuration switch, but we are introducing a configuration switch for current StackTrace and Environment.StackTrace. What's different between the two that we treat them differently?
(If we have discussed it, could you please remind me what's the difference?)
There was a problem hiding this comment.
(Our default stance in .NET Core is to avoid introducing config switch proactively unless we have strong evidence that it is going to be needed.)
There was a problem hiding this comment.
Our default stance in .NET Core is to avoid introducing config switch
Its for short-term user feedback, not for permanent options. #125417 (comment)
What's different between the two that we treat them differently?
I would say Exception.StackTrace had a pre-existing design invariant that had a natural well-defined extension to async frames we agreed upon. Environment.StackTrace extends more ambiguously and we haven't reached a consensus which design is preferable. Lacking agreement on the dev team we hoped some user feedback would help.
As a minimal example:
async Task Main()
{
await NonAsyncHelper();
}
Task NonAsyncHelper()
{
return AsyncHelper();
}
async Task AsyncHelper()
{
Console.WriteLine("BEFORE\n" + Environment.StackTrace);
await Task.Yield();
Console.WriteLine("AFTER\n" + Environment.StackTrace);
}We expect the AFTER stack trace to be:
AsyncHelper
Main
Opinions differ on whether the 'NonAsyncHelper' frame should appear in the BEFORE stack or if the two stacks should be identical. Including it is truer to the physical stack abstraction and exposes the differences between async calls that complete with/without suspension. Excluding it gives more consistent stack traces that don't change behavior based on whether suspension occurred in child method calls.
|
I wanted to see what's the actual experience eventually going to look like. I have done the following:
I got the following with this setup (this is expected): Then, I have replaced the runtime with the runtime from this PR and I got the following: Is it expected behavior? It has the threadpool boilerplate removed, but it does not show where the call actually came from. I would expect the stacktrace to show the path all the way to the kestrel. |
Simplify StackTraceAsyncBehavior config to 3 modes: 0 = show all frames with async continuation stitching 1 = hide non-async frames below first async frame (default) 2 = physical only (no async stitching) The old mode 2 (keep interim non-async frames, truncate trailing) added complexity without clear value. Old mode 3 is renumbered to mode 2. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| { | ||
| "0" => 0, | ||
| "2" => 2, | ||
| "3" => 3, |
| AppDomain *pDomain; | ||
| BOOL fDoWeHaveAnyFramesFromForeignStackTrace; | ||
| BOOL fAsyncFramesPresent; // True if async frames were present in the stack | ||
| DWORD hideAsyncDispatchMode; // 0 = show all above dispatch boundary (with stitching), 1 = hide non-async, 2 = truncate trailing, 3 = physical only (no stitching) |
| GetThread()->StackWalkFrames(GetStackFramesCallback, pData, FUNCTIONSONLY | QUICKUNWIND, NULL); | ||
|
|
||
| // Do a 2nd pass outside of any locks. | ||
| // Do a 2nd passoutside of any locks. |
|
|
||
| if (pResumeInfo != NULL && pResumeInfo->Resume != NULL) | ||
| { | ||
| MethodDesc* pMD = NonVirtualEntry2MethodDesc((PCODE)pResumeInfo->Resume); |
There was a problem hiding this comment.
Is there a reason this is not using DiagnosticIP directly? The code below seems to be an indirect way to get to the method that contains DiagnosticIP anyway. It also would not work for interpreter continuations.
|
|
||
| // We'll init the IL offsets outside the TSL lock. | ||
| ++pData->cElements; | ||
| } |
There was a problem hiding this comment.
Does this inject all the async frames at the beginning of the frame list? It seems like odd behavior that is not going to make much sense when there are multiple dispatchers.
It would make more sense to me to do the walking in tandem, walking the native frames and replacing all dispatcher frames by continuation frames from that particular dispatcher frame's dispatcher information.
Pseudocode something like:
AsyncDispatcherInfo* dispatcherInfo = AsyncDispatcherInfo.t_current;
frame = InitiateStackWalk();
while (frame != null)
{
if (IsInFrame(frame, dispatcherInfo))
{
AppendContinuationFrames(dispatcherInfo);
dispatcherInfo = dispatcherInfo.Next;
}
else
{
AppendFrame(frame);
}
frame = NextFrame(frame);
}
Implementation for #125417
Summary
Enhance
Environment.StackTrace(and the underlyingStackTrace/StackFrameAPIs) to reconstruct the logical async call chain when executing inside a runtime async (v2) method that has been resumed via continuation dispatch. Today, the stack trace shows only physical stack frames — thread pool internals and the immediately executing method — losing the caller context that is useful for logging.Motivation
Current behavior (without this change)
When a runtime async v2 method yields and is later resumed by the thread pool,
Environment.StackTraceshows only what is physically on the stack:The logical callers — which async methods are awaiting
MiddleMethod— are completely absent. A developer has no way to determine whyMiddleMethodis running.Desired behavior
Internal dispatch machinery (
DispatchContinuations, thread pool frames) is hidden, and the logical async caller chain is reconstructed from the runtime's continuation data.Why this matters
Environment.StackTraceare a useful diagnostic tool. Without the async caller chain, developers cannot trace the origin of a call.STEF_CONTINUATIONinexcep.cpp.Environment.StackTraceshould have equivalent behavior.StackTraceobjects need the logical call chain to build meaningful traces.Background: Runtime Async v2 Continuation Infrastructure
Continuation chain
When a runtime async method suspends (e.g., at an
awaitthat doesn't complete synchronously), the JIT creates aContinuationobject that captures:NextContinuation?ResumeInfoResumeInfo*The
ResumeInfostruct contains:Resumedelegate*<Continuation, ref byte, Continuation?>DiagnosticIPvoid*Dispatch
When an async method completes,
RuntimeAsyncTaskCore.DispatchContinuations()iterates the continuation chain. It maintains a thread-localAsyncDispatcherInfopointer (AsyncDispatcherInfo.t_current) that tracks the current dispatch state, including theNextContinuationto be processed.IsAsyncMethod()A
MethodDescis considered an async v2 method if it hasAsyncMethodDatawith theAsyncCallflag set. This is populated during JIT compilation when the method uses runtime async calling conventions.Existing precedent: Exception stack traces
In
excep.cpp, when building exception stack traces, continuation frames are already injected withSTEF_CONTINUATION: