[cDAC] Add GC stress verification infrastructure and stack walk fixes#6
Closed
max-charlamb wants to merge 6 commits intocdac-stackreferences-2-with-stressfrom
Closed
[cDAC] Add GC stress verification infrastructure and stack walk fixes#6max-charlamb wants to merge 6 commits intocdac-stackreferences-2-with-stressfrom
max-charlamb wants to merge 6 commits intocdac-stackreferences-2-with-stressfrom
Conversation
Add GCRefMap-based and MetaSig-based scanning for stub frames in the cDAC stack walker. This implements Frame::GcScanRoots dispatch for: - StubDispatchFrame: GCRefMap path (when cached) + MetaSig fallback - ExternalMethodFrame: GCRefMap path - PrestubMethodFrame / CallCountingHelperFrame: MetaSig path - DynamicHelperFrame: Flag-based register scanning Key components: - GCRefMapDecoder: managed port of native gcrefmap.h bitstream decoder - CorSigParser: ECMA-335 signature parser with GC type classification, including ELEMENT_TYPE_INTERNAL for dynamic method signatures - OffsetFromGCRefMapPos: maps GCRefMap positions to TransitionBlock offsets - Platform-guarded TransitionBlock offset globals in datadescriptor.inc Bug fixes found during implementation: - ScanFrameRoots was passing frame address to GetFrameName instead of the frame's VTable identifier, causing all frames to hit the no-op default - Added per-frame error isolation so one bad frame doesn't abort the walk Reduces GC stress failure delta from 3 to 1 for all 55 remaining failures. The remaining delta is from RangeList-based code heap resolution (separate issue). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix GetExceptionClauses to use code start for offset calculation. Wire up ParentOfFuncletStackFrame and unwind-target-PC override for catch handler GC reporting. Fix AMD64Unwinder null check. Add GC stress verification infrastructure that compares cDAC stack reference enumeration against the runtime at GC stress points: - DAC-like callback for runtime stack ref collection - xUnit test framework with 7 debuggees (BasicAlloc, DeepStack, Generics, ExceptionHandling, PInvoke, MultiThread, Comprehensive) - Step throttling, allocation-point hooks, and reentrancy guard - On-demand build subset and project exclusion from main test project Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove code referencing runtime features that were removed in PR dotnet#119863 (Move coreclr EH second pass to native code): - ForceGcReportingStage enum and related TODO comments - ShouldSaveFuncletInfo, ShouldParentToFuncletReportSavedFuncletSlots, IsFilterFunclet, IsFilterFuncletCached fields from GCFrameData - funcletNotSeen, foundFirstFunclet variables - Unreachable ExInfo block gated by '&& false' - Dead PeekByte() and ClassifyElementType() from CorSigParser - Inner try/catch around ScanFrameRoots (outer catch suffices) - Exclude GCStressTests from main cDAC test project Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce a separate DOTNET_CdacStress config with bit flags for controlling cDAC stack reference verification independently of GCStress: 0x1 ALLOC - verify at allocation points (fast, no JIT overhead) 0x2 GC - verify at GC trigger points (future) 0x4 UNIQUE - deduplicate by (IP, SP) hash 0x8 INSTR - verify at instruction traps (needs GCStress=0x4) Follow the GCStress<T> template pattern with CdacStress<T>::MaybeVerify that compiles to nothing when HAVE_GCCOVER is not defined, eliminating #ifdef guards at call sites. Rename CdacGcStress -> CdacStress (class, files, config vars) to reflect that this verifies the cDAC's stack walk, not GC behavior. Legacy DOTNET_GCStress=0x20 continues to work (maps to CDACSTRESS_ALLOC). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the native DAC behavior for both ClrDataStackWalk::Init and DacStackReferenceWalker::WalkStack: check the thread's DebuggerFilterContext and ProfilerFilterContext before falling back to TryGetThreadContext. During debugger breaks or profiler stack walks, these contexts hold the correct managed frame state. Add DebuggerFilterContext and ProfilerFilterContext fields to the Thread data descriptor and Data.Thread class. Add diagnostic logging for unique Source IPs in cDAC stress failures to show which frames the cDAC actually walked. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix SkipDuplicateActiveICF regression from base branch commit 650ffb5: restore one-shot SkipCurrentFrameInCheck behavior so InlinedCallFrames are not permanently lost from the FrameIterator. Fix SW_SKIPPED_FRAME context restoration: call UpdateContextFromFrame for skipped Frames so SoftwareExceptionFrame context is restored. Add IsAtFirstPassExceptionThrowSite to suppress throw-site refs during exception first-pass dispatch, matching legacy DAC behavior. Restructure CdacStress flags into trigger points (ALLOC/GC/INSTR), validation types (REFS/WALK/USE_DAC), and modifiers (UNIQUE). Add three-way comparison infrastructure: - Load legacy DAC (mscordaccore.dll) in-process via InProcessDataTarget - CompareStackWalks: frame-by-frame IXCLRDataStackWalk IP+SP+FrameAddr - CompareRefSets: two-phase ref matching (stack + register refs) - CollectStackRefs: merged cDAC/DAC collection into single function - FilterAndDedup: combined interior pointer filter + dedup Refactor VerifyAtStressPoint into clean 5-step flow: 1. Collect raw refs (cDAC always, DAC if USE_DAC, RT always) 2. Compare cDAC vs DAC raw (before filtering) 3. Filter cDAC refs and compare vs RT 4. Pass/fail based on RT match; DAC mismatch logged separately 5. Log all three ref sets on failure Update known-issues.md with current findings: single remaining issue is m_pFrame=FRAME_TOP during EH first-pass dispatch where the cDAC cannot unwind through native frames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb
pushed a commit
that referenced
this pull request
Apr 22, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1) - Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract ReportUntrackedAndSucceed local function (#2) - Move CheckForSkippedFrames from Next() to UpdateState (#6) - Add XUnitConsoleRunner package reference for Helix payload (#9) - Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native CorSigUncompressToken behavior (#10) - Fix IsAppleArm64ABI: set to false until Apple platform detection is available (filed dotnet#127282) (#11) - Fix Unix x64 float register stride: use FloatRegisterSize instead of hardcoded 8 (#12) - Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo version that handles x86 reversed register layout (dotnet#13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb
pushed a commit
that referenced
this pull request
Apr 22, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1) - Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract ReportUntrackedAndSucceed local function (#2) - Move CheckForSkippedFrames from Next() to UpdateState (#6) - Add XUnitConsoleRunner package reference for Helix payload (#9) - Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native CorSigUncompressToken behavior (#10) - Fix IsAppleArm64ABI: set to false until Apple platform detection is available (filed dotnet#127282) (#11) - Fix Unix x64 float register stride: use FloatRegisterSize instead of hardcoded 8 (#12) - Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo version that handles x86 reversed register layout (dotnet#13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb
pushed a commit
that referenced
this pull request
Apr 23, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1) - Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract ReportUntrackedAndSucceed local function (#2) - Move CheckForSkippedFrames from Next() to UpdateState (#6) - Add XUnitConsoleRunner package reference for Helix payload (#9) - Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native CorSigUncompressToken behavior (#10) - Fix IsAppleArm64ABI: set to false until Apple platform detection is available (filed dotnet#127282) (#11) - Fix Unix x64 float register stride: use FloatRegisterSize instead of hardcoded 8 (#12) - Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo version that handles x86 reversed register layout (dotnet#13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb
pushed a commit
that referenced
this pull request
Apr 23, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1) - Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract ReportUntrackedAndSucceed local function (#2) - Move CheckForSkippedFrames from Next() to UpdateState (#6) - Add XUnitConsoleRunner package reference for Helix payload (#9) - Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native CorSigUncompressToken behavior (#10) - Fix IsAppleArm64ABI: set to false until Apple platform detection is available (filed dotnet#127282) (#11) - Fix Unix x64 float register stride: use FloatRegisterSize instead of hardcoded 8 (#12) - Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo version that handles x86 reversed register layout (dotnet#13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb
pushed a commit
that referenced
this pull request
Apr 24, 2026
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1) - Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract ReportUntrackedAndSucceed local function (#2) - Move CheckForSkippedFrames from Next() to UpdateState (#6) - Add XUnitConsoleRunner package reference for Helix payload (#9) - Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native CorSigUncompressToken behavior (#10) - Fix IsAppleArm64ABI: set to false until Apple platform detection is available (filed dotnet#127282) (#11) - Fix Unix x64 float register stride: use FloatRegisterSize instead of hardcoded 8 (#12) - Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo version that handles x86 reversed register layout (dotnet#13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add comprehensive cDAC stress verification infrastructure (
DOTNET_CdacStress) that compares the cDAC's stack reference enumeration and stack walk against the legacy DAC and runtime.Changes
Stack Walk Fixes
650ffb5that permanently lost InlinedCallFrames from the iteratorUpdateContextFromFramefor skipped FramesDebuggerFilterContext/ProfilerFilterContextmatching native DAC behaviorDead Code Removal
ForceGcReportingStageenum and related variables (no native counterpart — removed in PR Move coreclr EH second pass to native code dotnet/runtime#119863)ShouldSaveFuncletInfo,ShouldParentToFuncletReportSavedFuncletSlots(dead fields)PeekByte()andClassifyElementType()from CorSigParserCdacStress Infrastructure
DOTNET_CdacStressconfig withCdacStress<T>::MaybeVerifytemplate pattern (compiles to no-op withoutHAVE_GCCOVER)ALLOC=0x1,INSTR=0x4), validation types (REFS=0x10,WALK=0x20,USE_DAC=0x40), modifiers (UNIQUE=0x100)InProcessDataTarget, compare cDAC vs DAC vs RTIXCLRDataStackWalkIP+SP+FrameAddr comparisonTest Results
Known Issue
See
known-issues.md— during EH first-pass dispatch,m_pFramecan beFRAME_TOPwhen the cDAC'sAMD64Unwindercannot unwind native frames (it only handles managed code ranges). The legacy DAC succeeds via OS-levelVirtualUnwindToFirstManagedCallFrame.