[cDAC] Stack walk GC reference scanning and bug fixes (1/5)#127395
[cDAC] Stack walk GC reference scanning and bug fixes (1/5)#127395max-charlamb wants to merge 4 commits intodotnet:mainfrom
Conversation
Implement GC reference scanning for stub/transition frames and fix stack walker state machine bugs: - PromoteCallerStack/PromoteCallerStackUsingGCRefMap for transition frames - GCRefMap decoder for ReadyToRun import section resolution - FindGCRefMap with FindReadyToRunModule fallback - SOSDacImpl.GetStackReferences using cDAC contract - Fix IsFirst preserved for skipped frames - Fix skipped frame handling moved to UpdateState - GCInfoDecoder goto removal (ReportUntrackedAndSucceed local function) - RequiresInstArg, IsAsyncMethod, HasRetBuffArg on IRuntimeTypeSystem - ExceptionInfo ClauseForCatch fields for catch handler detection - Data descriptor additions for frame types and TransitionBlock layout Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
There was a problem hiding this comment.
Pull request overview
Implements the first slice of cDAC stack-walk GC reference scanning, extending data descriptors/contracts and wiring SOSDacImpl.GetStackReferences to use the cDAC StackWalk contract (including transition-frame root scanning via GCRefMap and signature-based fallback).
Changes:
- Add/extend cDAC data descriptors and contract data types for transition frames, ReadyToRun import sections, and exception clause ranges needed for stack GC root enumeration.
- Refactor StackWalk filtering/state transitions and implement managed/live-slot + frame-based GC root scanning paths.
- Extend contract APIs (ExecutionManager/GCInfo/RuntimeTypeSystem) and update solution/test project structure to include StressTests.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs | Updates mock ReadyToRunInfo descriptor layout with import section fields. |
| src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj | Treats StressTests as a separate test project and excludes it from unit-test compilation. |
| src/native/managed/cdac/cdac.slnx | Adds StressTests project to the cdac solution filter. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs | Implements GetStackReferences using cDAC StackWalk contract, with DEBUG cross-check. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ReadyToRunInfo.cs | Reads/imports R2R import section pointer + count into the contract data. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/TransitionBlock.cs | Exposes descriptor-provided layout offsets used for GCRefMap/arg-reg scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/StubDispatchFrame.cs | Adds GCRefMap + lazy-resolution fields for stub dispatch frame root scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/ExternalMethodFrame.cs | New contract data type for ExternalMethodFrame GCRefMap-based scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/DynamicHelperFrame.cs | New contract data type for DynamicHelperFrame flag-based scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ExceptionInfo.cs | Adds catch-handler clause range fields for interruptible-offset override logic. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs | Refactors stack-walk filtering/Next integration and adds GC root enumeration behaviors. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcSignatureTypeProvider.cs | Adds signature-type classifier used by signature decoding for GC scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs | Adds managed decoder for the compact GCRefMap bitstream. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/FrameHandling/FrameIterator.cs | Adds return-address retrieval and frame-type-specific GC root scanning (GCRefMap/MetaSig paths). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/RuntimeTypeSystem_1.cs | Implements RequiresInstArg and IsAsyncMethod for TransitionFrame argument layout decisions. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs | Extends decoder surface to expose interruptible ranges. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs | Plumbs GetInterruptibleRanges through the GCInfo contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs | Removes goto, fixes untracked-slot reporting behavior, and keeps decoding accessible. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManager_2.cs | Exposes FindReadyToRunModule in ExecutionManager v2 contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManager_1.cs | Exposes FindReadyToRunModule in ExecutionManager v1 contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManagerCore.cs | Implements R2R module resolution via RangeSection lookup. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/DataType.cs | Adds DataType entries for new frame data types. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IRuntimeTypeSystem.cs | Adds RequiresInstArg and IsAsyncMethod to abstractions interface. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs | Adds InterruptibleRange and GetInterruptibleRanges to the public abstraction contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IExecutionManager.cs | Adds FindReadyToRunModule to the abstraction contract. |
| src/coreclr/vm/readytoruninfo.h | Adds cdac_data offsets for import sections in ReadyToRunInfo. |
| src/coreclr/vm/frames.h | Adds cdac_data for ExternalMethodFrame/DynamicHelperFrame and fields for StubDispatchFrame GCRefMap resolution. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Adds/extends descriptors for new frame fields, transition block offsets, and catch clause ranges. |
| docs/design/datacontracts/StackWalk.md | Documents new descriptors used by stack walking + GC reference scanning. |
| docs/design/datacontracts/RuntimeTypeSystem.md | Documents new RuntimeTypeSystem contract methods. |
| docs/design/datacontracts/GCInfo.md | Documents new GCInfo contract API for interruptible ranges. |
|
|
||
| /// <summary> | ||
| /// Finds the R2R module that contains the given address. | ||
| /// Used by FindGCRefMap to resolve m_pZapModule when it's null. |
There was a problem hiding this comment.
Can we do this unconditionally and drop ZapModule from the constract? ZapModule seems to be a nice-to-have cache.
(Also, ZapModule can be renamed to Module or ReadyToRunModule. Zap is a very old codename name for crossgen/ngen that we have almost eradicated from the codebase.)
7c419ec to
6c234d9
Compare
6c234d9 to
0ee4452
Compare
- Use ex.HResult instead of E_FAIL in GetStackReferences catch block - Remove ZapModule references from StackWalk.md documentation - Replace magic 0x11 with (byte)SignatureTypeKind.ValueType - Remove StressTests csproj reference from cdac.slnx (PR5 only) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
davidwrighton
left a comment
There was a problem hiding this comment.
In general this looks good, but there are LOTS of details missing their contract documentation.
| | `StubDispatchFrame` | `MethodDescPtr` | Pointer to Frame's method desc | | ||
| | `StubDispatchFrame` | `RepresentativeMTPtr` | Pointer to Frame's method table pointer | | ||
| | `StubDispatchFrame` | `RepresentativeSlot` | Frame's method table slot | | ||
| | `StubDispatchFrame` | `GCRefMap` | Cached pointer to GC reference map blob for caller stack promotion | |
There was a problem hiding this comment.
Where is the description of how the GCRefMap works, and its layout?
| /// Returns TargetPointer.Null if the Frame has no return address (e.g., non-active ICF, | ||
| /// base Frame types, FuncEvalFrame during exception eval). | ||
| /// </summary> | ||
| public TargetPointer GetReturnAddress() |
There was a problem hiding this comment.
This needs documentation.
0ee4452 to
d9d8bb7
Compare
d9d8bb7 to
762145c
Compare
|
|
||
| GC_SPBASE_FIRST = GC_CALLER_SP_REL, | ||
| GC_SPBASE_LAST = GC_FRAMEREG_REL, | ||
| } | ||
|
|
||
| public readonly record struct InterruptibleRange(uint StartOffset, uint EndOffset); | ||
|
|
||
| public readonly record struct GcSlotDesc | ||
| { |
There was a problem hiding this comment.
InterruptibleRange was moved out of this file, but GcInfoDecoder<TTraits> still references InterruptibleRange (e.g., List<InterruptibleRange>). Since this file is in ...Contracts.GCInfoHelpers and the type now lives in Microsoft.Diagnostics.DataContractReader.Contracts, the simple name won’t resolve without adding a using Microsoft.Diagnostics.DataContractReader.Contracts; (or fully-qualifying the type). As-is, this should fail to compile.
| _ => throw new InvalidOperationException($"Unknown stack slot base: {spBase}"), | ||
| }; | ||
|
|
||
| TargetPointer addr = new(baseAddr.Value + (ulong)(long)spOffset); |
There was a problem hiding this comment.
When computing the stack slot address, baseAddr.Value + (ulong)(long)spOffset relies on unsigned wraparound for negative offsets. That can produce incorrect addresses on 32-bit targets (high bits become set) and is generally fragile. Prefer computing with signed arithmetic (e.g., cast baseAddr.Value to long, add spOffset, then cast back to ulong) so negative offsets subtract correctly without wrap issues.
| TargetPointer addr = new(baseAddr.Value + (ulong)(long)spOffset); | |
| TargetPointer addr = new((ulong)((long)baseAddr.Value + spOffset)); |
| // InlinedCallFrame: returns 0 if inactive, else m_pCallerReturnAddress | ||
| case FrameType.InlinedCallFrame: | ||
| Data.InlinedCallFrame icf = target.ProcessedData.GetOrAdd<Data.InlinedCallFrame>(currentFramePointer); | ||
| return InlinedCallFrameHasActiveCall(icf) ? new TargetPointer(icf.CallerReturnAddress) : TargetPointer.Null; |
There was a problem hiding this comment.
In GetReturnAddress(), the InlinedCallFrame case wraps an existing TargetPointer in new TargetPointer(icf.CallerReturnAddress). Since CallerReturnAddress is already a TargetPointer, this extra conversion is unnecessary; returning icf.CallerReturnAddress directly would be simpler and avoids implicit conversions.
| return InlinedCallFrameHasActiveCall(icf) ? new TargetPointer(icf.CallerReturnAddress) : TargetPointer.Null; | |
| return InlinedCallFrameHasActiveCall(icf) ? icf.CallerReturnAddress : TargetPointer.Null; |
| CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, MethodDescPtr, cdac_data<FramedMethodFrame>::MethodDescPtr) | ||
| CDAC_TYPE_FIELD(StubDispatchFrame, T_UINT32, RepresentativeSlot, cdac_data<StubDispatchFrame>::RepresentativeSlot) | ||
| CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, GCRefMap, cdac_data<StubDispatchFrame>::GCRefMap) | ||
| CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, ZapModule, cdac_data<StubDispatchFrame>::ZapModule) |
There was a problem hiding this comment.
Can we delete ZapModule from the cdac contracts?
I see my comment about this got resolved, but it is still here - is it needed for some reason?
There was a problem hiding this comment.
Sorry Claude resolved that. I'm planning to push another iteration. I'll include this and the gcrefmap removal
| CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, RepresentativeMTPtr, cdac_data<StubDispatchFrame>::RepresentativeMTPtr) | ||
| CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, MethodDescPtr, cdac_data<FramedMethodFrame>::MethodDescPtr) | ||
| CDAC_TYPE_FIELD(StubDispatchFrame, T_UINT32, RepresentativeSlot, cdac_data<StubDispatchFrame>::RepresentativeSlot) | ||
| CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, GCRefMap, cdac_data<StubDispatchFrame>::GCRefMap) |
There was a problem hiding this comment.
GCRefMap field can be deleted too. It is a nice to have cache as well. It should be fine to do the full lookup in cdac.
Summary
Part 1 of 5 stacked PRs splitting #126408 into reviewable pieces.
What this PR contains
PromoteCallerStack/PromoteCallerStackUsingGCRefMapfor transition framesGCRefMapDecoder+FindGCRefMapwith ReadyToRun import section resolutionSOSDacImpl.GetStackReferencesusing cDAC contractGCInfoDecoder.EnumerateLiveSlotsgoto removalGcSignatureTypeProviderfor GC type classificationRequiresInstArg,IsAsyncMethod,HasRetBuffArgon IRuntimeTypeSystemStack overview
Testing
Note
This PR description was created with AI assistance from Copilot.