Skip to content

[cDAC] Add GC stress verification infrastructure and stack walk fixes#6

Closed
max-charlamb wants to merge 6 commits intocdac-stackreferences-2-with-stressfrom
cdac-stackreferences-4
Closed

[cDAC] Add GC stress verification infrastructure and stack walk fixes#6
max-charlamb wants to merge 6 commits intocdac-stackreferences-2-with-stressfrom
cdac-stackreferences-4

Conversation

@max-charlamb
Copy link
Copy Markdown
Owner

Summary

Add comprehensive cDAC stress verification infrastructure (DOTNET_CdacStress) that compares the cDAC's stack reference enumeration and stack walk against the legacy DAC and runtime.

Changes

Stack Walk Fixes

  • PromoteCallerStack: GCRefMap + MetaSig + DynamicHelperFrame scanning for stub frame GC roots
  • GetExceptionClauses: Fix code start offset calculation and AMD64Unwinder null check
  • ParentOfFuncletStackFrame: Wire up funclet parent frame flag for GC reporting
  • SkipCurrentFrameInCheck: Fix regression from 650ffb5 that permanently lost InlinedCallFrames from the iterator
  • SW_SKIPPED_FRAME context restoration: Call UpdateContextFromFrame for skipped Frames
  • FilterContext: Read DebuggerFilterContext/ProfilerFilterContext matching native DAC behavior
  • IsAtFirstPassExceptionThrowSite: Suppress throw-site refs during EH first-pass

Dead Code Removal

CdacStress Infrastructure

  • DOTNET_CdacStress config with CdacStress<T>::MaybeVerify template pattern (compiles to no-op without HAVE_GCCOVER)
  • Bit flags: trigger points (ALLOC=0x1, INSTR=0x4), validation types (REFS=0x10, WALK=0x20, USE_DAC=0x40), modifiers (UNIQUE=0x100)
  • Three-way comparison: Load legacy DAC in-process via InProcessDataTarget, compare cDAC vs DAC vs RT
  • CompareStackWalks: Frame-by-frame IXCLRDataStackWalk IP+SP+FrameAddr comparison
  • 7 debuggee test apps: BasicAlloc, DeepStack, Generics, ExceptionHandling, PInvoke, MultiThread, Comprehensive

Test Results

Mode Non-EH debuggees ExceptionHandling
INSTR (0x14 + GCStress=0x4) 0 failures 0-2 failures
ALLOC+REFS+UNIQUE (0x111) 0 failures 0 failures
ALLOC+REFS (0x11) 0 failures 2-4 failures (known issue)
Walk comparison (0x21) 0 mismatches N/A

Known Issue

See known-issues.md — during EH first-pass dispatch, m_pFrame can be FRAME_TOP when the cDAC's AMD64Unwinder cannot unwind native frames (it only handles managed code ranges). The legacy DAC succeeds via OS-level VirtualUnwindToFirstManagedCallFrame.

Max Charlamb and others added 6 commits March 25, 2026 15:27
Add GCRefMap-based and MetaSig-based scanning for stub frames in the cDAC
stack walker. This implements Frame::GcScanRoots dispatch for:

- StubDispatchFrame: GCRefMap path (when cached) + MetaSig fallback
- ExternalMethodFrame: GCRefMap path
- PrestubMethodFrame / CallCountingHelperFrame: MetaSig path
- DynamicHelperFrame: Flag-based register scanning

Key components:
- GCRefMapDecoder: managed port of native gcrefmap.h bitstream decoder
- CorSigParser: ECMA-335 signature parser with GC type classification,
  including ELEMENT_TYPE_INTERNAL for dynamic method signatures
- OffsetFromGCRefMapPos: maps GCRefMap positions to TransitionBlock offsets
- Platform-guarded TransitionBlock offset globals in datadescriptor.inc

Bug fixes found during implementation:
- ScanFrameRoots was passing frame address to GetFrameName instead of the
  frame's VTable identifier, causing all frames to hit the no-op default
- Added per-frame error isolation so one bad frame doesn't abort the walk

Reduces GC stress failure delta from 3 to 1 for all 55 remaining failures.
The remaining delta is from RangeList-based code heap resolution (separate issue).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix GetExceptionClauses to use code start for offset calculation.
Wire up ParentOfFuncletStackFrame and unwind-target-PC override
for catch handler GC reporting. Fix AMD64Unwinder null check.

Add GC stress verification infrastructure that compares cDAC stack
reference enumeration against the runtime at GC stress points:
- DAC-like callback for runtime stack ref collection
- xUnit test framework with 7 debuggees (BasicAlloc, DeepStack,
  Generics, ExceptionHandling, PInvoke, MultiThread, Comprehensive)
- Step throttling, allocation-point hooks, and reentrancy guard
- On-demand build subset and project exclusion from main test project

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove code referencing runtime features that were removed in PR dotnet#119863
(Move coreclr EH second pass to native code):

- ForceGcReportingStage enum and related TODO comments
- ShouldSaveFuncletInfo, ShouldParentToFuncletReportSavedFuncletSlots,
  IsFilterFunclet, IsFilterFuncletCached fields from GCFrameData
- funcletNotSeen, foundFirstFunclet variables
- Unreachable ExInfo block gated by '&& false'
- Dead PeekByte() and ClassifyElementType() from CorSigParser
- Inner try/catch around ScanFrameRoots (outer catch suffices)
- Exclude GCStressTests from main cDAC test project

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce a separate DOTNET_CdacStress config with bit flags for
controlling cDAC stack reference verification independently of GCStress:
  0x1 ALLOC  - verify at allocation points (fast, no JIT overhead)
  0x2 GC     - verify at GC trigger points (future)
  0x4 UNIQUE - deduplicate by (IP, SP) hash
  0x8 INSTR  - verify at instruction traps (needs GCStress=0x4)

Follow the GCStress<T> template pattern with CdacStress<T>::MaybeVerify
that compiles to nothing when HAVE_GCCOVER is not defined, eliminating
#ifdef guards at call sites.

Rename CdacGcStress -> CdacStress (class, files, config vars) to reflect
that this verifies the cDAC's stack walk, not GC behavior.

Legacy DOTNET_GCStress=0x20 continues to work (maps to CDACSTRESS_ALLOC).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the native DAC behavior for both ClrDataStackWalk::Init and
DacStackReferenceWalker::WalkStack: check the thread's
DebuggerFilterContext and ProfilerFilterContext before falling back
to TryGetThreadContext. During debugger breaks or profiler stack
walks, these contexts hold the correct managed frame state.

Add DebuggerFilterContext and ProfilerFilterContext fields to the
Thread data descriptor and Data.Thread class.

Add diagnostic logging for unique Source IPs in cDAC stress failures
to show which frames the cDAC actually walked.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix SkipDuplicateActiveICF regression from base branch commit 650ffb5:
restore one-shot SkipCurrentFrameInCheck behavior so InlinedCallFrames
are not permanently lost from the FrameIterator.

Fix SW_SKIPPED_FRAME context restoration: call UpdateContextFromFrame
for skipped Frames so SoftwareExceptionFrame context is restored.

Add IsAtFirstPassExceptionThrowSite to suppress throw-site refs during
exception first-pass dispatch, matching legacy DAC behavior.

Restructure CdacStress flags into trigger points (ALLOC/GC/INSTR),
validation types (REFS/WALK/USE_DAC), and modifiers (UNIQUE).

Add three-way comparison infrastructure:
- Load legacy DAC (mscordaccore.dll) in-process via InProcessDataTarget
- CompareStackWalks: frame-by-frame IXCLRDataStackWalk IP+SP+FrameAddr
- CompareRefSets: two-phase ref matching (stack + register refs)
- CollectStackRefs: merged cDAC/DAC collection into single function
- FilterAndDedup: combined interior pointer filter + dedup

Refactor VerifyAtStressPoint into clean 5-step flow:
1. Collect raw refs (cDAC always, DAC if USE_DAC, RT always)
2. Compare cDAC vs DAC raw (before filtering)
3. Filter cDAC refs and compare vs RT
4. Pass/fail based on RT match; DAC mismatch logged separately
5. Log all three ref sets on failure

Update known-issues.md with current findings: single remaining issue
is m_pFrame=FRAME_TOP during EH first-pass dispatch where the cDAC
cannot unwind through native frames.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit that referenced this pull request Apr 22, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1)
- Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract
  ReportUntrackedAndSucceed local function (#2)
- Move CheckForSkippedFrames from Next() to UpdateState (#6)
- Add XUnitConsoleRunner package reference for Helix payload (#9)
- Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native
  CorSigUncompressToken behavior (#10)
- Fix IsAppleArm64ABI: set to false until Apple platform detection is
  available (filed dotnet#127282) (#11)
- Fix Unix x64 float register stride: use FloatRegisterSize instead of
  hardcoded 8 (#12)
- Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo
  version that handles x86 reversed register layout (dotnet#13)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit that referenced this pull request Apr 22, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1)
- Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract
  ReportUntrackedAndSucceed local function (#2)
- Move CheckForSkippedFrames from Next() to UpdateState (#6)
- Add XUnitConsoleRunner package reference for Helix payload (#9)
- Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native
  CorSigUncompressToken behavior (#10)
- Fix IsAppleArm64ABI: set to false until Apple platform detection is
  available (filed dotnet#127282) (#11)
- Fix Unix x64 float register stride: use FloatRegisterSize instead of
  hardcoded 8 (#12)
- Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo
  version that handles x86 reversed register layout (dotnet#13)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit that referenced this pull request Apr 23, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1)
- Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract
  ReportUntrackedAndSucceed local function (#2)
- Move CheckForSkippedFrames from Next() to UpdateState (#6)
- Add XUnitConsoleRunner package reference for Helix payload (#9)
- Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native
  CorSigUncompressToken behavior (#10)
- Fix IsAppleArm64ABI: set to false until Apple platform detection is
  available (filed dotnet#127282) (#11)
- Fix Unix x64 float register stride: use FloatRegisterSize instead of
  hardcoded 8 (#12)
- Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo
  version that handles x86 reversed register layout (dotnet#13)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit that referenced this pull request Apr 23, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1)
- Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract
  ReportUntrackedAndSucceed local function (#2)
- Move CheckForSkippedFrames from Next() to UpdateState (#6)
- Add XUnitConsoleRunner package reference for Helix payload (#9)
- Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native
  CorSigUncompressToken behavior (#10)
- Fix IsAppleArm64ABI: set to false until Apple platform detection is
  available (filed dotnet#127282) (#11)
- Fix Unix x64 float register stride: use FloatRegisterSize instead of
  hardcoded 8 (#12)
- Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo
  version that handles x86 reversed register layout (dotnet#13)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb pushed a commit that referenced this pull request Apr 24, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1)
- Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract
  ReportUntrackedAndSucceed local function (#2)
- Move CheckForSkippedFrames from Next() to UpdateState (#6)
- Add XUnitConsoleRunner package reference for Helix payload (#9)
- Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native
  CorSigUncompressToken behavior (#10)
- Fix IsAppleArm64ABI: set to false until Apple platform detection is
  available (filed dotnet#127282) (#11)
- Fix Unix x64 float register stride: use FloatRegisterSize instead of
  hardcoded 8 (#12)
- Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo
  version that handles x86 reversed register layout (dotnet#13)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant