Skip to content

[cDAC] Stack walk GC stress verification and fixes#126408

Draft
max-charlamb wants to merge 23 commits intodotnet:mainfrom
max-charlamb:cdac-stackreferences-5
Draft

[cDAC] Stack walk GC stress verification and fixes#126408
max-charlamb wants to merge 23 commits intodotnet:mainfrom
max-charlamb:cdac-stackreferences-5

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

@max-charlamb max-charlamb commented Apr 1, 2026

Summary

cDAC GC stress verification tool (DOTNET_CdacStress) that compares stack GC references between the cDAC and the runtime at allocation stress points. Includes stack walker fixes, GC reference scanning implementation, custom signature decoding, and calling convention argument iteration.

Note

This PR description was updated with AI assistance from Copilot.

Stack walker fixes

  • Fix SW_SKIPPED_FRAME: do not call UpdateContextFromFrame (matches native SFITER_SKIPPED_FRAME_FUNCTION which does not call UpdateRegDisplay)
  • Fix skipped-frame loop: check for more skipped frames before yielding managed method (prevents duplicate EnumGcRefs between consecutive skipped frames)
  • Restructure Filter() to drive Next() directly, matching native Filter()+NextRaw() integration (prevents funclet-to-parent walk cycles)
  • Remove SkipActiveICFOnce/SkipCurrentFrameInCheck — active ICF double-yield is natural and harmless
  • Remove IsAtFirstPassExceptionThrowSite — native does not suppress first-pass refs
  • Fix IsFirst not preserved for skipped frames (was causing IsActiveFrame=false for the topmost managed frame)

GC reference scanning

  • Implement PromoteCallerStack for stub frames (GCRefMap + MetaSig paths)
  • Implement SOSDacImpl.GetStackReferences using cDAC contract (was falling back to legacy DAC)
  • Read FilterContext for stack walk starting context
  • Three-way cDAC/DAC/RT comparison with InProcessDataTarget
  • DOTNET_CdacStress bit flags: ALLOC/INSTR/REFS/WALK/USE_DAC/UNIQUE

RuntimeSignatureDecoder

Custom signature decoder that handles ELEMENT_TYPE_INTERNAL (0x21) and ELEMENT_TYPE_CMOD_INTERNAL (0x22) which SRM's SignatureDecoder cannot parse. These occur in IL stubs, marshalling stubs, and unsafe accessor frames.

  • IRuntimeSignatureTypeProvider<TType, TGenericContext>: superset of SRM's ISignatureTypeProvider, adds GetInternalType and GetInternalModifiedType
  • ISignatureReader + SpanSignatureReader: endianness-aware reader abstraction
  • Correct ECMA-335 compliance: TypeDefOrRefOrSpecEncoded token decoding, sign-extension-by-width for compressed signed ints, bounds validation

ArgIterator (ported from crossgen2)

Proper calling convention analysis replacing the simplified 1-slot-per-param approach.

  • CallingConventionInfo: hybrid data descriptor layout values + ABI invariant constants for all architectures (x86, x64 Windows/Unix, ARM32, ARM64, LoongArch64, RISC-V64)
  • ArgIterator.GetNextOffset(): maps each argument to its actual register or stack offset
  • OffsetOfFloatArgumentRegisters added to TransitionBlock data descriptor
  • Handles multi-slot args, forced-byref params, return buffer placement, async continuation

CI infrastructure

  • cDAC stress tests run in Helix via cdac-stress-helix.proj
  • Extended runtime-diagnostics.yml CdacDumpTests buildArgs with +tools.cdacstresstests

Test results

Allocation-level stress (9 debuggees, ~46K verifications):

Debuggee Verifications Pass Fail
BasicAlloc 4,936 4,936 0
DeepStack 4,964 4,964 0
Generics 4,936 4,936 0
MultiThread 4,996 4,996 0
Comprehensive 4,994 4,992 2*
ExceptionHandling 4,958 4,958 0
StructScenarios 4,940 4,940 0
DynamicMethods 6,520 6,520 0
PInvoke 4,936 4,936 0

Instruction-level stress (9 debuggees, ~226K verifications):
All 9 debuggees pass with zero failures across 226,452 verifications.

*Comprehensive's 2 failures are pre-existing legacy DAC issues (DAC returns 0 refs for background threads in kernel waits; cDAC and runtime agree).

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a cDAC GC-stress verification harness and extends the cDAC stack-walk / stack-GC-ref pipeline so cDAC stack reference enumeration can be compared against runtime scanning at stress points.

Changes:

  • Implements/extends cDAC stack reference enumeration (including Frame-based scanning paths like PromoteCallerStack via GCRefMap / MetaSig) and wires SOS GetStackReferences to the cDAC contract.
  • Introduces a new GC stress integration test project with debuggee apps and orchestration targets.
  • Extends CoreCLR cDAC stress/GC stress plumbing and data descriptors to support the new stack-walk and frame-scanning capabilities.

Reviewed changes

Copilot reviewed 46 out of 46 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs Updates mock type layout for execution manager-related data.
src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs Updates mock type layouts (ExceptionInfo/Thread) for contract tests.
src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj Excludes new GCStressTests folder from the existing unit test project compilation.
src/native/managed/cdac/tests/GCStressTests/README.md Documents how to build/run the new GC stress tests.
src/native/managed/cdac/tests/GCStressTests/Microsoft.Diagnostics.DataContractReader.GCStressTests.csproj Adds a dedicated GC stress test project.
src/native/managed/cdac/tests/GCStressTests/GCStressTests.targets MSBuild orchestration to discover/build debuggee projects.
src/native/managed/cdac/tests/GCStressTests/GCStressTestBase.cs Test harness to run debuggees under corerun and parse verification logs.
src/native/managed/cdac/tests/GCStressTests/GCStressResults.cs Parses the native verification log into structured pass/fail/skip counts.
src/native/managed/cdac/tests/GCStressTests/Debuggees/PInvoke/Program.cs Adds a P/Invoke-focused debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/PInvoke/PInvoke.csproj Debuggee project file for PInvoke scenario.
src/native/managed/cdac/tests/GCStressTests/Debuggees/MultiThread/Program.cs Adds a multi-threaded debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/MultiThread/MultiThread.csproj Debuggee project file for MultiThread scenario.
src/native/managed/cdac/tests/GCStressTests/Debuggees/Generics/Program.cs Adds a generics/interface/delegate debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/Generics/Generics.csproj Debuggee project file for Generics scenario.
src/native/managed/cdac/tests/GCStressTests/Debuggees/ExceptionHandling/Program.cs Adds an exception-handling/funclet debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/ExceptionHandling/ExceptionHandling.csproj Debuggee project file for ExceptionHandling scenario.
src/native/managed/cdac/tests/GCStressTests/Debuggees/Directory.Build.props Shared build props for debuggee projects (output layout, TFM, etc.).
src/native/managed/cdac/tests/GCStressTests/Debuggees/DeepStack/Program.cs Adds deep-recursion debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/DeepStack/DeepStack.csproj Debuggee project file for DeepStack scenario.
src/native/managed/cdac/tests/GCStressTests/Debuggees/Comprehensive/Program.cs Adds comprehensive “all scenarios” debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/Comprehensive/Comprehensive.csproj Debuggee project file for Comprehensive scenario.
src/native/managed/cdac/tests/GCStressTests/Debuggees/BasicAlloc/Program.cs Adds basic allocation/live-ref debuggee workload.
src/native/managed/cdac/tests/GCStressTests/Debuggees/BasicAlloc/BasicAlloc.csproj Debuggee project file for BasicAlloc scenario.
src/native/managed/cdac/tests/GCStressTests/BasicGCStressTests.cs Theory-based test suite that runs the debuggees and asserts pass rate.
src/native/managed/cdac/tests/gcstress/known-issues.md Captures known mismatch classes and limitations.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs Implements GetStackReferences using the cDAC contract rather than legacy DAC fallback.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/StubDispatchFrame.cs Extends StubDispatchFrame data with GCRefMap pointer.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/ExternalMethodFrame.cs Adds ExternalMethodFrame contract data type (GCRefMap).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/DynamicHelperFrame.cs Adds DynamicHelperFrame contract data type (flags).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ExceptionInfo.cs Adds clause-range fields used for catch-handler resumption offset selection.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs Refactors stack-walk filtering and adds Frame-based GC root scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs Adds optional relOffset override support for GC ref enumeration.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs Implements GCRefMap decoding for transition-block scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/CorSigParser.cs Adds minimal signature parsing to classify parameters for MetaSig-based scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs Adds FindFirstInterruptiblePoint API to GCInfo decoders.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs Implements FindFirstInterruptiblePoint using decoded interruptible ranges.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManagerCore.EEJitManager.cs Fixes code-start lookup for exception clause enumeration and adds minor flow adjustments.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Constants.cs Adds TransitionBlock-related global names used by frame scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/DataType.cs Adds new DataType enum values for new frame contracts.
src/native/managed/cdac/cdac.slnx Adds the new GC stress test project to the cDAC solution.
src/coreclr/vm/gccover.cpp Adds step-based skipping to reduce overhead when throttling verification.
src/coreclr/vm/frames.h Exposes additional frame fields to cDAC via cdac_data<> descriptors.
src/coreclr/vm/datadescriptor/datadescriptor.inc Adds new contract fields/globals for frames and TransitionBlock layout.
src/coreclr/vm/cdacstress.cpp Updates in-process cDAC/DAC verification logic, logging, and step behavior.
eng/Subsets.props Adds an on-demand subset for running GC stress tests.
docs/design/datacontracts/StackWalk.md Documents the new frame fields and TransitionBlock globals in the StackWalk contract.
Comments suppressed due to low confidence (1)

src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs:228

  • ThreadFields contains ProfilerFilterContext twice (lines 224 and 228). Duplicate field entries will skew offsets and make the mock descriptors inconsistent with the real data descriptor. Keep a single ProfilerFilterContext entry in the correct order.

Comment thread src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs Outdated
Comment thread src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs Outdated
Comment thread src/coreclr/vm/cdacstress.cpp Outdated
Comment thread src/coreclr/vm/cdacstress.cpp Outdated
Comment thread src/coreclr/vm/cdacstress.cpp
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs Outdated
Comment thread src/native/managed/cdac/tests/GCStressTests/README.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 50 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (2)

src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs:229

  • ThreadFields now includes DebuggerFilterContext/ProfilerFilterContext twice (duplicate entries at the end of the list). This can cause ambiguous/incorrect field offsets in the mock type layout. Keep each field only once.
    src/coreclr/vm/cdacstress.cpp:963
  • CompareRefSets uses a fixed-size matched[MAX_COLLECTED_REFS] buffer but no longer validates that countA/countB are <= MAX_COLLECTED_REFS. Since CollectStackRefs appends without a hard cap, this can lead to out-of-bounds access when countA or countB exceeds 4096. Reintroduce a guard or allocate the match state sized to the counts.
        return true;

    bool matched[MAX_COLLECTED_REFS] = {};

    for (int i = 0; i < countA; i++)

Comment thread src/coreclr/vm/cdacstress.cpp Outdated
Comment thread src/coreclr/vm/cdacstress.cpp Outdated
Comment thread src/coreclr/vm/gccover.cpp Outdated
Comment thread src/native/managed/cdac/tests/GCStressTests/GCStressTestBase.cs Outdated
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs Outdated
@max-charlamb max-charlamb force-pushed the cdac-stackreferences-5 branch from 75ae7fe to 5419d15 Compare April 13, 2026 15:30
Copilot AI review requested due to automatic review settings April 13, 2026 15:47
@max-charlamb max-charlamb force-pushed the cdac-stackreferences-5 branch from 5419d15 to 8ef9b22 Compare April 13, 2026 15:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated 7 comments.

Comment thread src/coreclr/vm/cdacstress.cpp
Comment thread src/native/managed/cdac/tests/StressTests/StressTests.targets
Comment thread src/native/managed/cdac/tests/StressTests/README.md
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs Outdated
Copy link
Copy Markdown
Member Author

@max-charlamb max-charlamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback for local copilot

Comment thread src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs Outdated
Comment thread src/native/managed/cdac/tests/StressTests/analysis/analyze-refs.cs Outdated
Comment thread src/native/managed/cdac/tests/StressTests/README.md
Comment thread src/native/managed/cdac/tests/StressTests/StressTests.targets Outdated
Comment thread src/coreclr/vm/datadescriptor/datadescriptor.inc Outdated
Copilot AI review requested due to automatic review settings April 13, 2026 17:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 51 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

src/coreclr/vm/cdacstress.cpp:625

  • CollectStackRefs appends to pRefs without any cap. Later comparisons allocate fixed-size arrays sized MAX_COLLECTED_REFS (e.g., matched[MAX_COLLECTED_REFS], aUsed/bUsed) and assume counts fit. If the DAC returns > MAX_COLLECTED_REFS refs, this will lead to out-of-bounds writes/reads. Add an explicit limit/overflow handling in CollectStackRefs (stop at MAX_COLLECTED_REFS and mark overflow / SKIP), or change the later comparison logic to handle arbitrary counts safely.
    SOSStackRefData refData;
    unsigned int fetched = 0;
    while (true)
    {
        hr = pEnum->Next(1, &refData, &fetched);
        if (FAILED(hr) || fetched == 0)
            break;

        StackRef ref;
        ref.Address = refData.Address;
        ref.Object = refData.Object;
        ref.Flags = refData.Flags;
        ref.Source = refData.Source;
        ref.SourceType = refData.SourceType;
        ref.Register = refData.Register;
        ref.Offset = refData.Offset;
        ref.StackPointer = refData.StackPointer;
        pRefs->Append(ref);
    }

Comment thread src/coreclr/vm/cdacstress.cpp
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs Outdated
Comment thread src/native/managed/cdac/tests/StressTests/BasicCdacStressTests.cs
Comment thread src/coreclr/vm/cdacstress.cpp
Comment thread src/native/managed/cdac/tests/StressTests/README.md
Comment thread src/native/managed/cdac/tests/StressTests/known-issues.md
Comment thread docs/design/datacontracts/StackWalk.md
Comment thread src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs
Copilot AI review requested due to automatic review settings April 13, 2026 19:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 52 changed files in this pull request and generated 3 comments.

Comment thread src/coreclr/vm/cdacstress.cpp
Comment thread src/coreclr/vm/cdacstress.cpp
Copilot AI review requested due to automatic review settings April 13, 2026 20:24
Max Charlamb and others added 12 commits April 22, 2026 15:23
- Add FindReadyToRunModule to IExecutionManager to resolve R2R modules
  from import section addresses when ExternalMethodFrame.ZapModule is null.
  Uses RangeSection.Find since R2R range sections cover the full PE image.
- Refactor IsInterrupted detection from Next() to UpdateState(), tracking
  LastProcessedFrameType on StackWalkData so the exception frame detection
  happens at the state transition point rather than inline in SW_FRAME.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- All 9 debuggees now pass 100% at allocation-level stress
- Document instruction-level failure root causes:
  - 3 FRAME_DAC_ONLY: missing Frame in cDAC walk (not GCRefMap)
  - 4 FRAME_DIFF: GcInfo register mismatch at QCall boundaries
- Update test results table to reflect 0 failures across all debuggees

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 3 instruction-level stress failures where the DAC reports refs from an
explicit Frame that the cDAC misses are caused by ELEMENT_TYPE_INTERNAL (0x21)
in runtime-internal method signatures. PromoteCallerStack uses the SRM
SignatureDecoder which doesn't handle this runtime-internal type code, causing
BadImageFormatException that is silently caught by the per-frame exception handler.

The native DAC handles this via MetaSig which natively understands
ELEMENT_TYPE_INTERNAL. The Legacy cDAC also handles it in SigFormat.cs.
A follow-up PR should replace the SRM SignatureDecoder usage with a custom
signature walker following the SigFormat.cs pattern.

Added catch (BadImageFormatException) in PromoteCallerStack as a workaround.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion in PromoteCallerStack

- Log register number, offset, and SP for unmatched refs in FRAME_DIFF output
  to aid diagnosis of GcInfoDecoder liveness bitmap mismatches.
- Catch BadImageFormatException in PromoteCallerStack when the SRM
  SignatureDecoder encounters ELEMENT_TYPE_INTERNAL (0x21) in runtime-internal
  method signatures. Gracefully skip the frame instead of silently swallowing
  the exception in the outer catch-all handler.

Investigation findings:
- The 4 FRAME_DIFF failures involve scratch registers (RAX=0, R8=8) that
  the DAC reports as live but the cDAC's GcInfoDecoder does not.
- For the topmost frame (AddProviderEnumKind), ActiveStackFrame should be
  set, so scratch registers should be reported. The missing slot suggests
  the GcInfoDecoder's safe point liveness bitmap differs from the native
  decoder's at this specific code offset.
- Further investigation needed to determine if the safe point index or
  the liveness bitmap data itself differs between the two decoders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Log both sides' StackPointer values in the FRAME_DIFF header to detect
unwinder divergence. Results show SP values match perfectly for all
FRAME_DIFF failures, confirming the issue is not in the unwinder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… filtering

The native StackFrameIterator does NOT modify isFirst when processing
SFITER_SKIPPED_FRAME_FUNCTION (stackwalk.cpp:2086-2128). It stays true
from Init() so the subsequent managed frame gets IsActiveFunc()=true and
scratch registers are reported. The cDAC's AdvanceIsFirst was incorrectly
updating IsFirst for SW_SKIPPED_FRAME based on the Frame's resumable
attribute, causing the managed frame to get IsActiveFrame=false.

This fix makes AdvanceIsFirst skip the IsFirst update for SW_SKIPPED_FRAME,
matching the native behavior. This fixes all 4 FRAME_DIFF instruction-level
stress failures where scratch registers (RAX, R8) were not reported.

Results: 25,512 pass / 3 fail (0 FRAME_DIFF, 3 FRAME_DAC_ONLY remaining)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instruction-level stress now: 25,512 pass / 3 fail (99.988%)
- 0 FRAME_DIFF (all fixed)
- 3 FRAME_DAC_ONLY (ELEMENT_TYPE_INTERNAL — follow-up)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix platform-specific ctx.Rip usage: use GetIP(&ctx) instead
- Fix if( style: add space after if keyword in gccover.cpp
- Add RVA bounds validation in FindGCRefMap before uint cast
- Remove stale analysis file (eh-throwhelper-report.md)
- Update AssertHighPassRate comment to reflect current state

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Run GC stress verification tests in the runtime-diagnostics pipeline by
piggybacking on the existing CdacDumpTests Checked runtime build. The
stress tests run as a second Helix submission after the dump tests,
using the testhost shared framework as CORE_ROOT.

Runs on all cdacDumpPlatforms (windows_x64, linux_x64, etc.). On
non-Windows platforms, only instruction-level stress (via DoGcStress)
is supported; allocation-level stress (VerifyAtAllocPoint) is skipped
because it requires Windows-only RtlCaptureContext/RtlVirtualUnwind.

Infrastructure:
- cdac-stress-helix.proj: Helix SDK project that sends testhost as
  correlation payload and stress test debuggees + test assembly as
  work item payload. Sets CORE_ROOT env var for the test harness.
- prepare-cdac-stress-helix-steps.yml: Pipeline template that builds
  debuggees, prepares Helix payload, and finds testhost directory.
- StressTests.targets: Added PrepareHelixPayload and BuildDebuggeesOnly
  targets for CI payload preparation.
- CdacStressTestBase.cs: Added HELIX_WORKITEM_PAYLOAD support for
  finding debuggees in Helix environment.
- runtime-diagnostics.yml: Extended CdacDumpTests buildArgs to include
  tools.cdacstresstests, added stress test Helix submission steps.
- cdacstress.cpp: Guard VerifyAtAllocPoint with TARGET_WINDOWS for
  RtlCaptureContext/RtlVirtualUnwind APIs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a custom signature decoder that handles runtime-internal type codes
(ELEMENT_TYPE_INTERNAL 0x21, ELEMENT_TYPE_CMOD_INTERNAL 0x22) which
SRM's SignatureDecoder cannot parse.

- IRuntimeSignatureTypeProvider<TType, TGenericContext>: superset of
  SRM's ISignatureTypeProvider, adding GetInternalType for resolving
  embedded TypeHandle pointers via the runtime type system.
- ISignatureReader + SpanSignatureReader: abstraction for reading
  signature bytes from different sources (spans, target memory).
- RuntimeSignatureDecoder<TType, TGenericContext, TReader>: ref struct
  decoder that handles all standard ECMA-335 types plus internal types.
- GcSignatureTypeProvider: implements IRuntimeSignatureTypeProvider to
  classify types for GC scanning, resolving internal types via
  RuntimeTypeSystem.GetSignatureCorElementType.

Key correctness details vs SRM's SignatureDecoder:
- CLASS/VALUETYPE tokens decoded as TypeDefOrRefOrSpecEncoded per
  ECMA-335 II.23.2.8 (tag in low 2 bits, RID in upper bits).
- CMOD_INTERNAL correctly skips the required/optional flag byte before
  the TypeHandle pointer, matching sigparser.h layout.
- ReadCompressedSignedInt uses ECMA sign-extension-by-width, not zigzag.
- ReadCompressedUInt rejects invalid 111xxxxx prefix.
- Unknown type codes throw BadImageFormatException instead of silently
  returning Object (which would create false-positive GC refs).
- Method signature header kind is validated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port crossgen2's ArgIterator, TransitionBlock, and GC scanning logic
into the cDAC contracts for correct per-architecture argument placement.

- Add OffsetOfFloatArgumentRegisters to TransitionBlock data descriptor
- CallingConventionInfo: hybrid data descriptor + ABI invariant constants
  for x86, x64 (Windows/Unix), ARM32, ARM64, LoongArch64, RISC-V64
- ArgTypeInfo: pre-computed type info replacing crossgen2's TypeHandle
- ArgIteratorData: parsed method signature holder
- ArgIterator: maps each argument to register/stack offsets via
  GetNextOffset() with per-architecture register allocation
- Integrate into FrameIterator.PromoteCallerStackHelper replacing the
  simplified 1-slot-per-param approach with proper offset computation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Quote debuggee DLL path in ProcessStartInfo.Arguments (dotnet#31)
- Fix timeout: use async stdout/stderr reads so WaitForExit works (dotnet#32)
- Update stale comment about ELEMENT_TYPE_INTERNAL limitation (dotnet#35)
- Move CallingConvention types to StackWalkHelpers.CallingConvention namespace (dotnet#37)
- Simplify x86 register eligibility check in ArgIterator (dotnet#38)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 22, 2026 19:25
@max-charlamb max-charlamb force-pushed the cdac-stackreferences-5 branch from 1a92e2c to b85dbac Compare April 22, 2026 19:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 65 out of 65 changed files in this pull request and generated 5 comments.

Comment on lines +16 to +20
// FirstThreadLink is an embedded SLink struct. Read the SLink.Next pointer
// from the field's address to get the first thread link pointer.
Target.TypeInfo slinkType = target.GetTypeInfo(DataType.SLink);
TargetPointer slinkAddr = address + (ulong)type.Fields[nameof(FirstThreadLink)].Offset;
FirstThreadLink = target.ReadPointerField(slinkAddr, slinkType, "Next");
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadStore now calls target.GetTypeInfo(DataType.SLink) and reads field "Next", but the CoreCLR data descriptor in this PR doesn't define a SLink type/field (could not find any CDAC_TYPE_BEGIN/FIELD for SLink). This will cause GetTypeInfo(DataType.SLink) to fail at runtime. Either add an SLink descriptor (with a "Next" pointer) on the runtime side or avoid needing type info here (e.g., treat the embedded SLink as a single pointer at offset 0).

Copilot uses AI. Check for mistakes.
Comment on lines 28 to 29
SLink,
ThreadLocalData,
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataType adds SLink, but there is no corresponding runtime data descriptor definition (no CDAC_TYPE_BEGIN/FIELD(SLink, ...) found). Any attempt to read this type via Target.GetTypeInfo(DataType.SLink) will fail. Either add the runtime descriptor entry for SLink or remove this enum value and read the embedded link without type metadata.

Suggested change
SLink,
ThreadLocalData,
ThreadLocalData = 17,

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +69
string stderr = "";
string stdout = "";
process.ErrorDataReceived += (_, e) =>
{
if (e.Data is not null)
stderr += e.Data + Environment.NewLine;
};
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These DataReceived event handlers append to strings via "+=" from background threads. That is not thread-safe and can lead to lost/garbled output under contention. Consider using a StringBuilder with locking (or ConcurrentQueue) for stderr/stdout aggregation.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +74
process.OutputDataReceived += (_, e) =>
{
if (e.Data is not null)
stdout += e.Data + Environment.NewLine;
};
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thread-safety issue as stderr: appending to stdout with += from OutputDataReceived is racy. Use a synchronized StringBuilder/collector to avoid missing output and to keep logs reliable when tests fail.

Copilot uses AI. Check for mistakes.
Comment on lines +732 to +751
/// Converts a <see cref="GcTypeKind"/> to a minimal <see cref="ArgTypeInfo"/> for
/// ArgIterator consumption. This is a bridge until we have a full type-aware provider.
/// </summary>
private static ArgTypeInfo GcTypeKindToArgTypeInfo(GcTypeKind kind, int pointerSize)
{
return kind switch
{
GcTypeKind.None => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize),
GcTypeKind.Ref => ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize),
GcTypeKind.Interior => ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize),
GcTypeKind.Other => new ArgTypeInfo
{
CorElementType = CorElementType.ValueType,
Size = pointerSize, // Conservative: assume pointer-sized for now
IsValueType = true,
},
_ => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize),
};
}

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GcTypeKindToArgTypeInfo collapses all non-GC-ref types to CorElementType.I/pointer-sized and doesn't preserve float vs integer vs exact primitive sizes. ArgIterator offset calculation depends on the real signature shape (especially on Unix x64/ARM64 where float regs and argument sizing affect later argument placement), so this can produce incorrect offsets and cause missed/incorrect GC ref reporting for signatures with floats or non-pointer-sized primitives. Consider decoding into a richer type representation (e.g., CorElementType + size) and building ArgTypeInfo from that rather than from GcTypeKind.

Suggested change
/// Converts a <see cref="GcTypeKind"/> to a minimal <see cref="ArgTypeInfo"/> for
/// ArgIterator consumption. This is a bridge until we have a full type-aware provider.
/// </summary>
private static ArgTypeInfo GcTypeKindToArgTypeInfo(GcTypeKind kind, int pointerSize)
{
return kind switch
{
GcTypeKind.None => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize),
GcTypeKind.Ref => ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize),
GcTypeKind.Interior => ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize),
GcTypeKind.Other => new ArgTypeInfo
{
CorElementType = CorElementType.ValueType,
Size = pointerSize, // Conservative: assume pointer-sized for now
IsValueType = true,
},
_ => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize),
};
}
/// Converts a <see cref="GcTypeKind"/> to a conservative fallback <see cref="ArgTypeInfo"/>
/// when an exact signature cannot be decoded. Callers should prefer decoding the real
/// managed signature via <see cref="TryGetArgTypeInfosFromMethodSignature(ReadOnlySpan{byte}, int, out ArgTypeInfo[])"/>
/// so that floating-point and non-pointer-sized primitives retain their real layout.
/// </summary>
private static ArgTypeInfo GcTypeKindToArgTypeInfo(GcTypeKind kind, int pointerSize)
{
return kind switch
{
GcTypeKind.Ref => ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize),
GcTypeKind.Interior => ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize),
GcTypeKind.Other => new ArgTypeInfo
{
CorElementType = CorElementType.ValueType,
Size = pointerSize,
IsValueType = true,
},
_ => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize),
};
}
private static bool TryGetArgTypeInfosFromMethodSignature(ReadOnlySpan<byte> signatureBytes, int pointerSize, out ArgTypeInfo[] argTypes)
{
argTypes = Array.Empty<ArgTypeInfo>();
if (signatureBytes.IsEmpty)
return false;
BlobReader reader = new(signatureBytes);
SignatureHeader header = reader.ReadSignatureHeader();
if (header.IsGeneric)
reader.ReadCompressedInteger();
int parameterCount = reader.ReadCompressedInteger();
if (parameterCount < 0)
return false;
if (!TrySkipSignatureType(ref reader))
return false;
ArgTypeInfo[] decodedArgTypes = new ArgTypeInfo[parameterCount];
for (int i = 0; i < parameterCount; i++)
{
if (!TryReadArgTypeInfo(ref reader, pointerSize, out decodedArgTypes[i]))
return false;
}
argTypes = decodedArgTypes;
return true;
}
private static bool TryReadArgTypeInfo(ref BlobReader reader, int pointerSize, out ArgTypeInfo argTypeInfo)
{
while (reader.RemainingBytes > 0)
{
CorElementType elementType = (CorElementType)reader.ReadByte();
switch (elementType)
{
case CorElementType.CmodOpt:
case CorElementType.CmodReqd:
reader.ReadCompressedInteger();
continue;
case CorElementType.Pinned:
continue;
case CorElementType.Boolean:
case CorElementType.I1:
case CorElementType.U1:
argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 1);
return true;
case CorElementType.Char:
case CorElementType.I2:
case CorElementType.U2:
argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 2);
return true;
case CorElementType.I4:
case CorElementType.U4:
case CorElementType.R4:
argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 4);
return true;
case CorElementType.I8:
case CorElementType.U8:
case CorElementType.R8:
argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 8);
return true;
case CorElementType.I:
case CorElementType.U:
case CorElementType.Ptr:
case CorElementType.FnPtr:
if (elementType is CorElementType.Ptr or CorElementType.FnPtr)
{
if (!TrySkipSignatureType(ref reader))
{
argTypeInfo = default;
return false;
}
}
argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize);
return true;
case CorElementType.Byref:
if (!TrySkipSignatureType(ref reader))
{
argTypeInfo = default;
return false;
}
argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize);
return true;
case CorElementType.String:
case CorElementType.Object:
case CorElementType.Class:
if (elementType is CorElementType.Class)
reader.ReadCompressedInteger();
argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize);
return true;
case CorElementType.SzArray:
if (!TrySkipSignatureType(ref reader))
{
argTypeInfo = default;
return false;
}
argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize);
return true;
case CorElementType.ValueType:
reader.ReadCompressedInteger();
argTypeInfo = new ArgTypeInfo
{
CorElementType = CorElementType.ValueType,
Size = pointerSize,
IsValueType = true,
};
return true;
case CorElementType.GenericInst:
{
if (!TrySkipSignatureType(ref reader))
{
argTypeInfo = default;
return false;
}
int genericArgCount = reader.ReadCompressedInteger();
for (int i = 0; i < genericArgCount; i++)
{
if (!TrySkipSignatureType(ref reader))
{
argTypeInfo = default;
return false;
}
}
argTypeInfo = new ArgTypeInfo
{
CorElementType = CorElementType.ValueType,
Size = pointerSize,
IsValueType = true,
};
return true;
}
default:
argTypeInfo = default;
return false;
}
}
argTypeInfo = default;
return false;
}
private static bool TrySkipSignatureType(ref BlobReader reader)
{
while (reader.RemainingBytes > 0)
{
CorElementType elementType = (CorElementType)reader.ReadByte();
switch (elementType)
{
case CorElementType.CmodOpt:
case CorElementType.CmodReqd:
reader.ReadCompressedInteger();
continue;
case CorElementType.Pinned:
continue;
case CorElementType.Void:
case CorElementType.Boolean:
case CorElementType.Char:
case CorElementType.I1:
case CorElementType.U1:
case CorElementType.I2:
case CorElementType.U2:
case CorElementType.I4:
case CorElementType.U4:
case CorElementType.I8:
case CorElementType.U8:
case CorElementType.R4:
case CorElementType.R8:
case CorElementType.String:
case CorElementType.Object:
case CorElementType.TypedByref:
case CorElementType.I:
case CorElementType.U:
return true;
case CorElementType.Class:
case CorElementType.ValueType:
case CorElementType.Var:
case CorElementType.MVar:
reader.ReadCompressedInteger();
return true;
case CorElementType.Byref:
case CorElementType.Ptr:
case CorElementType.SzArray:
return TrySkipSignatureType(ref reader);
case CorElementType.GenericInst:
{
if (!TrySkipSignatureType(ref reader))
return false;
int genericArgCount = reader.ReadCompressedInteger();
for (int i = 0; i < genericArgCount; i++)
{
if (!TrySkipSignatureType(ref reader))
return false;
}
return true;
}
case CorElementType.Array:
{
if (!TrySkipSignatureType(ref reader))
return false;
int rank = reader.ReadCompressedInteger();
int sizes = reader.ReadCompressedInteger();
for (int i = 0; i < sizes; i++)
reader.ReadCompressedInteger();
int lowerBounds = reader.ReadCompressedInteger();
for (int i = 0; i < lowerBounds; i++)
reader.ReadCompressedInteger();
return rank >= 0;
}
case CorElementType.FnPtr:
{
SignatureHeader header = reader.ReadSignatureHeader();
if (header.IsGeneric)
reader.ReadCompressedInteger();
int parameterCount = reader.ReadCompressedInteger();
if (parameterCount < 0)
return false;
if (!TrySkipSignatureType(ref reader))
return false;
for (int i = 0; i < parameterCount; i++)
{
if (!TrySkipSignatureType(ref reader))
return false;
}
return true;
}
default:
return false;
}
}
return false;
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 64 out of 64 changed files in this pull request and generated 3 comments.

Comment on lines +582 to 586
static bool CollectStackRefs(ISOSDacInterface* pSosDac, DWORD osThreadId, SArray<StackRef>* pRefs,
const char* label = nullptr)
{
if (pSosDac == nullptr)
return false;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectStackRefs currently appends every ref returned by ISOSStackRefEnum::Next with no upper bound. Later comparison helpers allocate fixed-size arrays sized by MAX_COLLECTED_REFS (e.g., matched[MAX_COLLECTED_REFS], aUsed[MAX_COLLECTED_REFS]), so if enumeration returns more than MAX_COLLECTED_REFS it can lead to out-of-bounds writes. Please cap collection to MAX_COLLECTED_REFS (and record an overflow/skip reason) or make the comparison logic handle arbitrarily large ref sets safely.

Copilot uses AI. Check for mistakes.
Comment on lines +1359 to +1362
if (rtOverflow && s_logFile != nullptr)
{
fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - RT overflow (>%d refs)\n",
osThreadId, (void*)GetIP(regs), MAX_COLLECTED_REFS);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When CollectRuntimeStackRefs overflows, the code logs a [SKIP] line but continues and still computes rtMatch/pass (and does not increment s_verifySkip). If CDACSTRESS_USE_DAC is not set, this can produce false failures based on a truncated runtime ref set. Consider treating runtime overflow as an actual skip (increment skip counter + return) when RT comparison is used for pass/fail, or otherwise avoid using rtMatch in the presence of overflow.

Suggested change
if (rtOverflow && s_logFile != nullptr)
{
fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - RT overflow (>%d refs)\n",
osThreadId, (void*)GetIP(regs), MAX_COLLECTED_REFS);
if (rtOverflow)
{
InterlockedIncrement(&s_verifySkip);
if (s_logFile != nullptr)
{
fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - RT overflow (>%d refs)\n",
osThreadId, (void*)GetIP(regs), MAX_COLLECTED_REFS);
}
return;

Copilot uses AI. Check for mistakes.
Comment on lines +823 to +824
ReportSlot(slotIndex, reportScratchSlots: true, reportFpBasedSlotsOnly, reportSlot);
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ReportUntrackedAndSucceed, ReportSlot is called with reportScratchSlots: true, which forces reporting scratch registers/stack slots even when CodeManagerFlags.ActiveStackFrame is not set (reportScratchSlots local is false). This changes GC root enumeration behavior and can introduce extra roots for non-leaf frames. It looks like this should pass the reportScratchSlots variable instead of always true.

Suggested change
ReportSlot(slotIndex, reportScratchSlots: true, reportFpBasedSlotsOnly, reportSlot);
}
ReportSlot(slotIndex, reportScratchSlots, reportFpBasedSlotsOnly, reportSlot);
}

Copilot uses AI. Check for mistakes.
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1)
- Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract
  ReportUntrackedAndSucceed local function (#2)
- Move CheckForSkippedFrames from Next() to UpdateState (#6)
- Add XUnitConsoleRunner package reference for Helix payload (#9)
- Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native
  CorSigUncompressToken behavior (#10)
- Fix IsAppleArm64ABI: set to false until Apple platform detection is
  available (filed dotnet#127282) (#11)
- Fix Unix x64 float register stride: use FloatRegisterSize instead of
  hardcoded 8 (#12)
- Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo
  version that handles x86 reversed register layout (dotnet#13)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the cdac-stackreferences-5 branch from 49c8435 to de5cb46 Compare April 24, 2026 15:56
- Include RuntimeInfoOperatingSystem.Apple in Unix x64 ABI check
  (macOS x64 uses SysV ABI, not Windows ABI)
- Thread MetadataReader from GetMethodSignatureBytes through
  RuntimeSignatureDecoder to provider methods (instead of null!)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 16:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 64 out of 64 changed files in this pull request and generated 2 comments.

Comment on lines +960 to 964
// Compare two ref sets using two-phase matching (for RT comparison where we
// don't have Source info). Returns true if all refs match.
static bool CompareRefSetsFlat(StackRef* refsA, int countA, StackRef* refsB, int countB)
{
if (countA != countB)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompareRefSetsFlat uses a fixed-size matched[MAX_COLLECTED_REFS] buffer, but cDAC/DAC ref collection (CollectStackRefs) is unbounded. Without a guard/cap, a large ref set (>4096) can cause out-of-bounds writes during matching. Consider capping cDAC/DAC collection to MAX_COLLECTED_REFS (and treating overflow as a [SKIP]) or using dynamically sized bookkeeping here.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +76
// Read both stdout and stderr asynchronously to avoid deadlock
// when pipe buffers fill, and to allow WaitForExit timeout to work.
string stderr = "";
string stdout = "";
process.ErrorDataReceived += (_, e) =>
{
if (e.Data is not null)
stderr += e.Data + Environment.NewLine;
};
process.OutputDataReceived += (_, e) =>
{
if (e.Data is not null)
stdout += e.Data + Environment.NewLine;
};
process.BeginErrorReadLine();
process.BeginOutputReadLine();
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async stdout/stderr collection uses stdout += ... / stderr += ... inside DataReceived event handlers. These callbacks can run concurrently, and string concatenation is not thread-safe; it can also be costly for large output. Consider buffering with a thread-safe collector (e.g., ConcurrentQueue<string> or StringBuilder with a lock) and call process.WaitForExit() (or await stream completion) after WaitForExit(timeout) to ensure all async output has been drained before asserting/logging.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants