[cDAC] Stack walk GC stress verification and fixes#126408
[cDAC] Stack walk GC stress verification and fixes#126408max-charlamb wants to merge 23 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
There was a problem hiding this comment.
Pull request overview
Adds a cDAC GC-stress verification harness and extends the cDAC stack-walk / stack-GC-ref pipeline so cDAC stack reference enumeration can be compared against runtime scanning at stress points.
Changes:
- Implements/extends cDAC stack reference enumeration (including Frame-based scanning paths like PromoteCallerStack via GCRefMap / MetaSig) and wires SOS
GetStackReferencesto the cDAC contract. - Introduces a new GC stress integration test project with debuggee apps and orchestration targets.
- Extends CoreCLR cDAC stress/GC stress plumbing and data descriptors to support the new stack-walk and frame-scanning capabilities.
Reviewed changes
Copilot reviewed 46 out of 46 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs | Updates mock type layout for execution manager-related data. |
| src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs | Updates mock type layouts (ExceptionInfo/Thread) for contract tests. |
| src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj | Excludes new GCStressTests folder from the existing unit test project compilation. |
| src/native/managed/cdac/tests/GCStressTests/README.md | Documents how to build/run the new GC stress tests. |
| src/native/managed/cdac/tests/GCStressTests/Microsoft.Diagnostics.DataContractReader.GCStressTests.csproj | Adds a dedicated GC stress test project. |
| src/native/managed/cdac/tests/GCStressTests/GCStressTests.targets | MSBuild orchestration to discover/build debuggee projects. |
| src/native/managed/cdac/tests/GCStressTests/GCStressTestBase.cs | Test harness to run debuggees under corerun and parse verification logs. |
| src/native/managed/cdac/tests/GCStressTests/GCStressResults.cs | Parses the native verification log into structured pass/fail/skip counts. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/PInvoke/Program.cs | Adds a P/Invoke-focused debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/PInvoke/PInvoke.csproj | Debuggee project file for PInvoke scenario. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/MultiThread/Program.cs | Adds a multi-threaded debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/MultiThread/MultiThread.csproj | Debuggee project file for MultiThread scenario. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/Generics/Program.cs | Adds a generics/interface/delegate debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/Generics/Generics.csproj | Debuggee project file for Generics scenario. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/ExceptionHandling/Program.cs | Adds an exception-handling/funclet debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/ExceptionHandling/ExceptionHandling.csproj | Debuggee project file for ExceptionHandling scenario. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/Directory.Build.props | Shared build props for debuggee projects (output layout, TFM, etc.). |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/DeepStack/Program.cs | Adds deep-recursion debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/DeepStack/DeepStack.csproj | Debuggee project file for DeepStack scenario. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/Comprehensive/Program.cs | Adds comprehensive “all scenarios” debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/Comprehensive/Comprehensive.csproj | Debuggee project file for Comprehensive scenario. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/BasicAlloc/Program.cs | Adds basic allocation/live-ref debuggee workload. |
| src/native/managed/cdac/tests/GCStressTests/Debuggees/BasicAlloc/BasicAlloc.csproj | Debuggee project file for BasicAlloc scenario. |
| src/native/managed/cdac/tests/GCStressTests/BasicGCStressTests.cs | Theory-based test suite that runs the debuggees and asserts pass rate. |
| src/native/managed/cdac/tests/gcstress/known-issues.md | Captures known mismatch classes and limitations. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs | Implements GetStackReferences using the cDAC contract rather than legacy DAC fallback. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/StubDispatchFrame.cs | Extends StubDispatchFrame data with GCRefMap pointer. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/ExternalMethodFrame.cs | Adds ExternalMethodFrame contract data type (GCRefMap). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/DynamicHelperFrame.cs | Adds DynamicHelperFrame contract data type (flags). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ExceptionInfo.cs | Adds clause-range fields used for catch-handler resumption offset selection. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs | Refactors stack-walk filtering and adds Frame-based GC root scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs | Adds optional relOffset override support for GC ref enumeration. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs | Implements GCRefMap decoding for transition-block scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/CorSigParser.cs | Adds minimal signature parsing to classify parameters for MetaSig-based scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs | Adds FindFirstInterruptiblePoint API to GCInfo decoders. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs | Implements FindFirstInterruptiblePoint using decoded interruptible ranges. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManagerCore.EEJitManager.cs | Fixes code-start lookup for exception clause enumeration and adds minor flow adjustments. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Constants.cs | Adds TransitionBlock-related global names used by frame scanning. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/DataType.cs | Adds new DataType enum values for new frame contracts. |
| src/native/managed/cdac/cdac.slnx | Adds the new GC stress test project to the cDAC solution. |
| src/coreclr/vm/gccover.cpp | Adds step-based skipping to reduce overhead when throttling verification. |
| src/coreclr/vm/frames.h | Exposes additional frame fields to cDAC via cdac_data<> descriptors. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Adds new contract fields/globals for frames and TransitionBlock layout. |
| src/coreclr/vm/cdacstress.cpp | Updates in-process cDAC/DAC verification logic, logging, and step behavior. |
| eng/Subsets.props | Adds an on-demand subset for running GC stress tests. |
| docs/design/datacontracts/StackWalk.md | Documents the new frame fields and TransitionBlock globals in the StackWalk contract. |
Comments suppressed due to low confidence (1)
src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs:228
- ThreadFields contains ProfilerFilterContext twice (lines 224 and 228). Duplicate field entries will skew offsets and make the mock descriptors inconsistent with the real data descriptor. Keep a single ProfilerFilterContext entry in the correct order.
e85dd2a to
305a905
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 50 out of 50 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (2)
src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.cs:229
ThreadFieldsnow includesDebuggerFilterContext/ProfilerFilterContexttwice (duplicate entries at the end of the list). This can cause ambiguous/incorrect field offsets in the mock type layout. Keep each field only once.
src/coreclr/vm/cdacstress.cpp:963CompareRefSetsuses a fixed-sizematched[MAX_COLLECTED_REFS]buffer but no longer validates thatcountA/countBare <=MAX_COLLECTED_REFS. SinceCollectStackRefsappends without a hard cap, this can lead to out-of-bounds access whencountAorcountBexceeds 4096. Reintroduce a guard or allocate the match state sized to the counts.
return true;
bool matched[MAX_COLLECTED_REFS] = {};
for (int i = 0; i < countA; i++)
75ae7fe to
5419d15
Compare
5419d15 to
8ef9b22
Compare
max-charlamb
left a comment
There was a problem hiding this comment.
feedback for local copilot
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 51 out of 51 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
src/coreclr/vm/cdacstress.cpp:625
- CollectStackRefs appends to pRefs without any cap. Later comparisons allocate fixed-size arrays sized MAX_COLLECTED_REFS (e.g., matched[MAX_COLLECTED_REFS], aUsed/bUsed) and assume counts fit. If the DAC returns > MAX_COLLECTED_REFS refs, this will lead to out-of-bounds writes/reads. Add an explicit limit/overflow handling in CollectStackRefs (stop at MAX_COLLECTED_REFS and mark overflow / SKIP), or change the later comparison logic to handle arbitrary counts safely.
SOSStackRefData refData;
unsigned int fetched = 0;
while (true)
{
hr = pEnum->Next(1, &refData, &fetched);
if (FAILED(hr) || fetched == 0)
break;
StackRef ref;
ref.Address = refData.Address;
ref.Object = refData.Object;
ref.Flags = refData.Flags;
ref.Source = refData.Source;
ref.SourceType = refData.SourceType;
ref.Register = refData.Register;
ref.Offset = refData.Offset;
ref.StackPointer = refData.StackPointer;
pRefs->Append(ref);
}
- Add FindReadyToRunModule to IExecutionManager to resolve R2R modules from import section addresses when ExternalMethodFrame.ZapModule is null. Uses RangeSection.Find since R2R range sections cover the full PE image. - Refactor IsInterrupted detection from Next() to UpdateState(), tracking LastProcessedFrameType on StackWalkData so the exception frame detection happens at the state transition point rather than inline in SW_FRAME. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- All 9 debuggees now pass 100% at allocation-level stress - Document instruction-level failure root causes: - 3 FRAME_DAC_ONLY: missing Frame in cDAC walk (not GCRefMap) - 4 FRAME_DIFF: GcInfo register mismatch at QCall boundaries - Update test results table to reflect 0 failures across all debuggees Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 3 instruction-level stress failures where the DAC reports refs from an explicit Frame that the cDAC misses are caused by ELEMENT_TYPE_INTERNAL (0x21) in runtime-internal method signatures. PromoteCallerStack uses the SRM SignatureDecoder which doesn't handle this runtime-internal type code, causing BadImageFormatException that is silently caught by the per-frame exception handler. The native DAC handles this via MetaSig which natively understands ELEMENT_TYPE_INTERNAL. The Legacy cDAC also handles it in SigFormat.cs. A follow-up PR should replace the SRM SignatureDecoder usage with a custom signature walker following the SigFormat.cs pattern. Added catch (BadImageFormatException) in PromoteCallerStack as a workaround. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion in PromoteCallerStack - Log register number, offset, and SP for unmatched refs in FRAME_DIFF output to aid diagnosis of GcInfoDecoder liveness bitmap mismatches. - Catch BadImageFormatException in PromoteCallerStack when the SRM SignatureDecoder encounters ELEMENT_TYPE_INTERNAL (0x21) in runtime-internal method signatures. Gracefully skip the frame instead of silently swallowing the exception in the outer catch-all handler. Investigation findings: - The 4 FRAME_DIFF failures involve scratch registers (RAX=0, R8=8) that the DAC reports as live but the cDAC's GcInfoDecoder does not. - For the topmost frame (AddProviderEnumKind), ActiveStackFrame should be set, so scratch registers should be reported. The missing slot suggests the GcInfoDecoder's safe point liveness bitmap differs from the native decoder's at this specific code offset. - Further investigation needed to determine if the safe point index or the liveness bitmap data itself differs between the two decoders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Log both sides' StackPointer values in the FRAME_DIFF header to detect unwinder divergence. Results show SP values match perfectly for all FRAME_DIFF failures, confirming the issue is not in the unwinder. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… filtering The native StackFrameIterator does NOT modify isFirst when processing SFITER_SKIPPED_FRAME_FUNCTION (stackwalk.cpp:2086-2128). It stays true from Init() so the subsequent managed frame gets IsActiveFunc()=true and scratch registers are reported. The cDAC's AdvanceIsFirst was incorrectly updating IsFirst for SW_SKIPPED_FRAME based on the Frame's resumable attribute, causing the managed frame to get IsActiveFrame=false. This fix makes AdvanceIsFirst skip the IsFirst update for SW_SKIPPED_FRAME, matching the native behavior. This fixes all 4 FRAME_DIFF instruction-level stress failures where scratch registers (RAX, R8) were not reported. Results: 25,512 pass / 3 fail (0 FRAME_DIFF, 3 FRAME_DAC_ONLY remaining) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instruction-level stress now: 25,512 pass / 3 fail (99.988%) - 0 FRAME_DIFF (all fixed) - 3 FRAME_DAC_ONLY (ELEMENT_TYPE_INTERNAL — follow-up) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix platform-specific ctx.Rip usage: use GetIP(&ctx) instead - Fix if( style: add space after if keyword in gccover.cpp - Add RVA bounds validation in FindGCRefMap before uint cast - Remove stale analysis file (eh-throwhelper-report.md) - Update AssertHighPassRate comment to reflect current state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Run GC stress verification tests in the runtime-diagnostics pipeline by piggybacking on the existing CdacDumpTests Checked runtime build. The stress tests run as a second Helix submission after the dump tests, using the testhost shared framework as CORE_ROOT. Runs on all cdacDumpPlatforms (windows_x64, linux_x64, etc.). On non-Windows platforms, only instruction-level stress (via DoGcStress) is supported; allocation-level stress (VerifyAtAllocPoint) is skipped because it requires Windows-only RtlCaptureContext/RtlVirtualUnwind. Infrastructure: - cdac-stress-helix.proj: Helix SDK project that sends testhost as correlation payload and stress test debuggees + test assembly as work item payload. Sets CORE_ROOT env var for the test harness. - prepare-cdac-stress-helix-steps.yml: Pipeline template that builds debuggees, prepares Helix payload, and finds testhost directory. - StressTests.targets: Added PrepareHelixPayload and BuildDebuggeesOnly targets for CI payload preparation. - CdacStressTestBase.cs: Added HELIX_WORKITEM_PAYLOAD support for finding debuggees in Helix environment. - runtime-diagnostics.yml: Extended CdacDumpTests buildArgs to include tools.cdacstresstests, added stress test Helix submission steps. - cdacstress.cpp: Guard VerifyAtAllocPoint with TARGET_WINDOWS for RtlCaptureContext/RtlVirtualUnwind APIs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a custom signature decoder that handles runtime-internal type codes (ELEMENT_TYPE_INTERNAL 0x21, ELEMENT_TYPE_CMOD_INTERNAL 0x22) which SRM's SignatureDecoder cannot parse. - IRuntimeSignatureTypeProvider<TType, TGenericContext>: superset of SRM's ISignatureTypeProvider, adding GetInternalType for resolving embedded TypeHandle pointers via the runtime type system. - ISignatureReader + SpanSignatureReader: abstraction for reading signature bytes from different sources (spans, target memory). - RuntimeSignatureDecoder<TType, TGenericContext, TReader>: ref struct decoder that handles all standard ECMA-335 types plus internal types. - GcSignatureTypeProvider: implements IRuntimeSignatureTypeProvider to classify types for GC scanning, resolving internal types via RuntimeTypeSystem.GetSignatureCorElementType. Key correctness details vs SRM's SignatureDecoder: - CLASS/VALUETYPE tokens decoded as TypeDefOrRefOrSpecEncoded per ECMA-335 II.23.2.8 (tag in low 2 bits, RID in upper bits). - CMOD_INTERNAL correctly skips the required/optional flag byte before the TypeHandle pointer, matching sigparser.h layout. - ReadCompressedSignedInt uses ECMA sign-extension-by-width, not zigzag. - ReadCompressedUInt rejects invalid 111xxxxx prefix. - Unknown type codes throw BadImageFormatException instead of silently returning Object (which would create false-positive GC refs). - Method signature header kind is validated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port crossgen2's ArgIterator, TransitionBlock, and GC scanning logic into the cDAC contracts for correct per-architecture argument placement. - Add OffsetOfFloatArgumentRegisters to TransitionBlock data descriptor - CallingConventionInfo: hybrid data descriptor + ABI invariant constants for x86, x64 (Windows/Unix), ARM32, ARM64, LoongArch64, RISC-V64 - ArgTypeInfo: pre-computed type info replacing crossgen2's TypeHandle - ArgIteratorData: parsed method signature holder - ArgIterator: maps each argument to register/stack offsets via GetNextOffset() with per-architecture register allocation - Integrate into FrameIterator.PromoteCallerStackHelper replacing the simplified 1-slot-per-param approach with proper offset computation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Quote debuggee DLL path in ProcessStartInfo.Arguments (dotnet#31) - Fix timeout: use async stdout/stderr reads so WaitForExit works (dotnet#32) - Update stale comment about ELEMENT_TYPE_INTERNAL limitation (dotnet#35) - Move CallingConvention types to StackWalkHelpers.CallingConvention namespace (dotnet#37) - Simplify x86 register eligibility check in ArgIterator (dotnet#38) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1a92e2c to
b85dbac
Compare
| // FirstThreadLink is an embedded SLink struct. Read the SLink.Next pointer | ||
| // from the field's address to get the first thread link pointer. | ||
| Target.TypeInfo slinkType = target.GetTypeInfo(DataType.SLink); | ||
| TargetPointer slinkAddr = address + (ulong)type.Fields[nameof(FirstThreadLink)].Offset; | ||
| FirstThreadLink = target.ReadPointerField(slinkAddr, slinkType, "Next"); |
There was a problem hiding this comment.
ThreadStore now calls target.GetTypeInfo(DataType.SLink) and reads field "Next", but the CoreCLR data descriptor in this PR doesn't define a SLink type/field (could not find any CDAC_TYPE_BEGIN/FIELD for SLink). This will cause GetTypeInfo(DataType.SLink) to fail at runtime. Either add an SLink descriptor (with a "Next" pointer) on the runtime side or avoid needing type info here (e.g., treat the embedded SLink as a single pointer at offset 0).
| SLink, | ||
| ThreadLocalData, |
There was a problem hiding this comment.
DataType adds SLink, but there is no corresponding runtime data descriptor definition (no CDAC_TYPE_BEGIN/FIELD(SLink, ...) found). Any attempt to read this type via Target.GetTypeInfo(DataType.SLink) will fail. Either add the runtime descriptor entry for SLink or remove this enum value and read the embedded link without type metadata.
| SLink, | |
| ThreadLocalData, | |
| ThreadLocalData = 17, |
| string stderr = ""; | ||
| string stdout = ""; | ||
| process.ErrorDataReceived += (_, e) => | ||
| { | ||
| if (e.Data is not null) | ||
| stderr += e.Data + Environment.NewLine; | ||
| }; |
There was a problem hiding this comment.
These DataReceived event handlers append to strings via "+=" from background threads. That is not thread-safe and can lead to lost/garbled output under contention. Consider using a StringBuilder with locking (or ConcurrentQueue) for stderr/stdout aggregation.
| process.OutputDataReceived += (_, e) => | ||
| { | ||
| if (e.Data is not null) | ||
| stdout += e.Data + Environment.NewLine; | ||
| }; |
There was a problem hiding this comment.
Same thread-safety issue as stderr: appending to stdout with += from OutputDataReceived is racy. Use a synchronized StringBuilder/collector to avoid missing output and to keep logs reliable when tests fail.
| /// Converts a <see cref="GcTypeKind"/> to a minimal <see cref="ArgTypeInfo"/> for | ||
| /// ArgIterator consumption. This is a bridge until we have a full type-aware provider. | ||
| /// </summary> | ||
| private static ArgTypeInfo GcTypeKindToArgTypeInfo(GcTypeKind kind, int pointerSize) | ||
| { | ||
| return kind switch | ||
| { | ||
| GcTypeKind.None => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize), | ||
| GcTypeKind.Ref => ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize), | ||
| GcTypeKind.Interior => ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize), | ||
| GcTypeKind.Other => new ArgTypeInfo | ||
| { | ||
| CorElementType = CorElementType.ValueType, | ||
| Size = pointerSize, // Conservative: assume pointer-sized for now | ||
| IsValueType = true, | ||
| }, | ||
| _ => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize), | ||
| }; | ||
| } | ||
|
|
There was a problem hiding this comment.
GcTypeKindToArgTypeInfo collapses all non-GC-ref types to CorElementType.I/pointer-sized and doesn't preserve float vs integer vs exact primitive sizes. ArgIterator offset calculation depends on the real signature shape (especially on Unix x64/ARM64 where float regs and argument sizing affect later argument placement), so this can produce incorrect offsets and cause missed/incorrect GC ref reporting for signatures with floats or non-pointer-sized primitives. Consider decoding into a richer type representation (e.g., CorElementType + size) and building ArgTypeInfo from that rather than from GcTypeKind.
| /// Converts a <see cref="GcTypeKind"/> to a minimal <see cref="ArgTypeInfo"/> for | |
| /// ArgIterator consumption. This is a bridge until we have a full type-aware provider. | |
| /// </summary> | |
| private static ArgTypeInfo GcTypeKindToArgTypeInfo(GcTypeKind kind, int pointerSize) | |
| { | |
| return kind switch | |
| { | |
| GcTypeKind.None => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize), | |
| GcTypeKind.Ref => ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize), | |
| GcTypeKind.Interior => ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize), | |
| GcTypeKind.Other => new ArgTypeInfo | |
| { | |
| CorElementType = CorElementType.ValueType, | |
| Size = pointerSize, // Conservative: assume pointer-sized for now | |
| IsValueType = true, | |
| }, | |
| _ => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize), | |
| }; | |
| } | |
| /// Converts a <see cref="GcTypeKind"/> to a conservative fallback <see cref="ArgTypeInfo"/> | |
| /// when an exact signature cannot be decoded. Callers should prefer decoding the real | |
| /// managed signature via <see cref="TryGetArgTypeInfosFromMethodSignature(ReadOnlySpan{byte}, int, out ArgTypeInfo[])"/> | |
| /// so that floating-point and non-pointer-sized primitives retain their real layout. | |
| /// </summary> | |
| private static ArgTypeInfo GcTypeKindToArgTypeInfo(GcTypeKind kind, int pointerSize) | |
| { | |
| return kind switch | |
| { | |
| GcTypeKind.Ref => ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize), | |
| GcTypeKind.Interior => ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize), | |
| GcTypeKind.Other => new ArgTypeInfo | |
| { | |
| CorElementType = CorElementType.ValueType, | |
| Size = pointerSize, | |
| IsValueType = true, | |
| }, | |
| _ => ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize), | |
| }; | |
| } | |
| private static bool TryGetArgTypeInfosFromMethodSignature(ReadOnlySpan<byte> signatureBytes, int pointerSize, out ArgTypeInfo[] argTypes) | |
| { | |
| argTypes = Array.Empty<ArgTypeInfo>(); | |
| if (signatureBytes.IsEmpty) | |
| return false; | |
| BlobReader reader = new(signatureBytes); | |
| SignatureHeader header = reader.ReadSignatureHeader(); | |
| if (header.IsGeneric) | |
| reader.ReadCompressedInteger(); | |
| int parameterCount = reader.ReadCompressedInteger(); | |
| if (parameterCount < 0) | |
| return false; | |
| if (!TrySkipSignatureType(ref reader)) | |
| return false; | |
| ArgTypeInfo[] decodedArgTypes = new ArgTypeInfo[parameterCount]; | |
| for (int i = 0; i < parameterCount; i++) | |
| { | |
| if (!TryReadArgTypeInfo(ref reader, pointerSize, out decodedArgTypes[i])) | |
| return false; | |
| } | |
| argTypes = decodedArgTypes; | |
| return true; | |
| } | |
| private static bool TryReadArgTypeInfo(ref BlobReader reader, int pointerSize, out ArgTypeInfo argTypeInfo) | |
| { | |
| while (reader.RemainingBytes > 0) | |
| { | |
| CorElementType elementType = (CorElementType)reader.ReadByte(); | |
| switch (elementType) | |
| { | |
| case CorElementType.CmodOpt: | |
| case CorElementType.CmodReqd: | |
| reader.ReadCompressedInteger(); | |
| continue; | |
| case CorElementType.Pinned: | |
| continue; | |
| case CorElementType.Boolean: | |
| case CorElementType.I1: | |
| case CorElementType.U1: | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 1); | |
| return true; | |
| case CorElementType.Char: | |
| case CorElementType.I2: | |
| case CorElementType.U2: | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 2); | |
| return true; | |
| case CorElementType.I4: | |
| case CorElementType.U4: | |
| case CorElementType.R4: | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 4); | |
| return true; | |
| case CorElementType.I8: | |
| case CorElementType.U8: | |
| case CorElementType.R8: | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(elementType, 8); | |
| return true; | |
| case CorElementType.I: | |
| case CorElementType.U: | |
| case CorElementType.Ptr: | |
| case CorElementType.FnPtr: | |
| if (elementType is CorElementType.Ptr or CorElementType.FnPtr) | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| { | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| } | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.I, pointerSize); | |
| return true; | |
| case CorElementType.Byref: | |
| if (!TrySkipSignatureType(ref reader)) | |
| { | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.Byref, pointerSize); | |
| return true; | |
| case CorElementType.String: | |
| case CorElementType.Object: | |
| case CorElementType.Class: | |
| if (elementType is CorElementType.Class) | |
| reader.ReadCompressedInteger(); | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize); | |
| return true; | |
| case CorElementType.SzArray: | |
| if (!TrySkipSignatureType(ref reader)) | |
| { | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| argTypeInfo = ArgTypeInfo.ForPrimitive(CorElementType.Class, pointerSize); | |
| return true; | |
| case CorElementType.ValueType: | |
| reader.ReadCompressedInteger(); | |
| argTypeInfo = new ArgTypeInfo | |
| { | |
| CorElementType = CorElementType.ValueType, | |
| Size = pointerSize, | |
| IsValueType = true, | |
| }; | |
| return true; | |
| case CorElementType.GenericInst: | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| { | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| int genericArgCount = reader.ReadCompressedInteger(); | |
| for (int i = 0; i < genericArgCount; i++) | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| { | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| } | |
| argTypeInfo = new ArgTypeInfo | |
| { | |
| CorElementType = CorElementType.ValueType, | |
| Size = pointerSize, | |
| IsValueType = true, | |
| }; | |
| return true; | |
| } | |
| default: | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| } | |
| argTypeInfo = default; | |
| return false; | |
| } | |
| private static bool TrySkipSignatureType(ref BlobReader reader) | |
| { | |
| while (reader.RemainingBytes > 0) | |
| { | |
| CorElementType elementType = (CorElementType)reader.ReadByte(); | |
| switch (elementType) | |
| { | |
| case CorElementType.CmodOpt: | |
| case CorElementType.CmodReqd: | |
| reader.ReadCompressedInteger(); | |
| continue; | |
| case CorElementType.Pinned: | |
| continue; | |
| case CorElementType.Void: | |
| case CorElementType.Boolean: | |
| case CorElementType.Char: | |
| case CorElementType.I1: | |
| case CorElementType.U1: | |
| case CorElementType.I2: | |
| case CorElementType.U2: | |
| case CorElementType.I4: | |
| case CorElementType.U4: | |
| case CorElementType.I8: | |
| case CorElementType.U8: | |
| case CorElementType.R4: | |
| case CorElementType.R8: | |
| case CorElementType.String: | |
| case CorElementType.Object: | |
| case CorElementType.TypedByref: | |
| case CorElementType.I: | |
| case CorElementType.U: | |
| return true; | |
| case CorElementType.Class: | |
| case CorElementType.ValueType: | |
| case CorElementType.Var: | |
| case CorElementType.MVar: | |
| reader.ReadCompressedInteger(); | |
| return true; | |
| case CorElementType.Byref: | |
| case CorElementType.Ptr: | |
| case CorElementType.SzArray: | |
| return TrySkipSignatureType(ref reader); | |
| case CorElementType.GenericInst: | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| return false; | |
| int genericArgCount = reader.ReadCompressedInteger(); | |
| for (int i = 0; i < genericArgCount; i++) | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| return false; | |
| } | |
| return true; | |
| } | |
| case CorElementType.Array: | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| return false; | |
| int rank = reader.ReadCompressedInteger(); | |
| int sizes = reader.ReadCompressedInteger(); | |
| for (int i = 0; i < sizes; i++) | |
| reader.ReadCompressedInteger(); | |
| int lowerBounds = reader.ReadCompressedInteger(); | |
| for (int i = 0; i < lowerBounds; i++) | |
| reader.ReadCompressedInteger(); | |
| return rank >= 0; | |
| } | |
| case CorElementType.FnPtr: | |
| { | |
| SignatureHeader header = reader.ReadSignatureHeader(); | |
| if (header.IsGeneric) | |
| reader.ReadCompressedInteger(); | |
| int parameterCount = reader.ReadCompressedInteger(); | |
| if (parameterCount < 0) | |
| return false; | |
| if (!TrySkipSignatureType(ref reader)) | |
| return false; | |
| for (int i = 0; i < parameterCount; i++) | |
| { | |
| if (!TrySkipSignatureType(ref reader)) | |
| return false; | |
| } | |
| return true; | |
| } | |
| default: | |
| return false; | |
| } | |
| } | |
| return false; | |
| } |
b85dbac to
8eae621
Compare
8eae621 to
49c8435
Compare
| static bool CollectStackRefs(ISOSDacInterface* pSosDac, DWORD osThreadId, SArray<StackRef>* pRefs, | ||
| const char* label = nullptr) | ||
| { | ||
| if (pSosDac == nullptr) | ||
| return false; |
There was a problem hiding this comment.
CollectStackRefs currently appends every ref returned by ISOSStackRefEnum::Next with no upper bound. Later comparison helpers allocate fixed-size arrays sized by MAX_COLLECTED_REFS (e.g., matched[MAX_COLLECTED_REFS], aUsed[MAX_COLLECTED_REFS]), so if enumeration returns more than MAX_COLLECTED_REFS it can lead to out-of-bounds writes. Please cap collection to MAX_COLLECTED_REFS (and record an overflow/skip reason) or make the comparison logic handle arbitrarily large ref sets safely.
| if (rtOverflow && s_logFile != nullptr) | ||
| { | ||
| fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - RT overflow (>%d refs)\n", | ||
| osThreadId, (void*)GetIP(regs), MAX_COLLECTED_REFS); |
There was a problem hiding this comment.
When CollectRuntimeStackRefs overflows, the code logs a [SKIP] line but continues and still computes rtMatch/pass (and does not increment s_verifySkip). If CDACSTRESS_USE_DAC is not set, this can produce false failures based on a truncated runtime ref set. Consider treating runtime overflow as an actual skip (increment skip counter + return) when RT comparison is used for pass/fail, or otherwise avoid using rtMatch in the presence of overflow.
| if (rtOverflow && s_logFile != nullptr) | |
| { | |
| fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - RT overflow (>%d refs)\n", | |
| osThreadId, (void*)GetIP(regs), MAX_COLLECTED_REFS); | |
| if (rtOverflow) | |
| { | |
| InterlockedIncrement(&s_verifySkip); | |
| if (s_logFile != nullptr) | |
| { | |
| fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - RT overflow (>%d refs)\n", | |
| osThreadId, (void*)GetIP(regs), MAX_COLLECTED_REFS); | |
| } | |
| return; |
| ReportSlot(slotIndex, reportScratchSlots: true, reportFpBasedSlotsOnly, reportSlot); | ||
| } |
There was a problem hiding this comment.
In ReportUntrackedAndSucceed, ReportSlot is called with reportScratchSlots: true, which forces reporting scratch registers/stack slots even when CodeManagerFlags.ActiveStackFrame is not set (reportScratchSlots local is false). This changes GC root enumeration behavior and can introduce extra roots for non-leaf frames. It looks like this should pass the reportScratchSlots variable instead of always true.
| ReportSlot(slotIndex, reportScratchSlots: true, reportFpBasedSlotsOnly, reportSlot); | |
| } | |
| ReportSlot(slotIndex, reportScratchSlots, reportFpBasedSlotsOnly, reportSlot); | |
| } |
- Fix README test filter syntax: use FullyQualifiedName~BasicAlloc (#1) - Remove goto statements from GCInfoDecoder.EnumerateLiveSlots: extract ReportUntrackedAndSucceed local function (#2) - Move CheckForSkippedFrames from Next() to UpdateState (#6) - Add XUnitConsoleRunner package reference for Helix payload (#9) - Support TypeSpec (tag=2) in DecodeTypeDefOrRefOrSpec matching native CorSigUncompressToken behavior (#10) - Fix IsAppleArm64ABI: set to false until Apple platform detection is available (filed dotnet#127282) (#11) - Fix Unix x64 float register stride: use FloatRegisterSize instead of hardcoded 8 (#12) - Replace FrameIterator.OffsetFromGCRefMapPos with CallingConventionInfo version that handles x86 reversed register layout (dotnet#13) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
49c8435 to
de5cb46
Compare
- Include RuntimeInfoOperatingSystem.Apple in Unix x64 ABI check (macOS x64 uses SysV ABI, not Windows ABI) - Thread MetadataReader from GetMethodSignatureBytes through RuntimeSignatureDecoder to provider methods (instead of null!) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| // Compare two ref sets using two-phase matching (for RT comparison where we | ||
| // don't have Source info). Returns true if all refs match. | ||
| static bool CompareRefSetsFlat(StackRef* refsA, int countA, StackRef* refsB, int countB) | ||
| { | ||
| if (countA != countB) |
There was a problem hiding this comment.
CompareRefSetsFlat uses a fixed-size matched[MAX_COLLECTED_REFS] buffer, but cDAC/DAC ref collection (CollectStackRefs) is unbounded. Without a guard/cap, a large ref set (>4096) can cause out-of-bounds writes during matching. Consider capping cDAC/DAC collection to MAX_COLLECTED_REFS (and treating overflow as a [SKIP]) or using dynamically sized bookkeeping here.
| // Read both stdout and stderr asynchronously to avoid deadlock | ||
| // when pipe buffers fill, and to allow WaitForExit timeout to work. | ||
| string stderr = ""; | ||
| string stdout = ""; | ||
| process.ErrorDataReceived += (_, e) => | ||
| { | ||
| if (e.Data is not null) | ||
| stderr += e.Data + Environment.NewLine; | ||
| }; | ||
| process.OutputDataReceived += (_, e) => | ||
| { | ||
| if (e.Data is not null) | ||
| stdout += e.Data + Environment.NewLine; | ||
| }; | ||
| process.BeginErrorReadLine(); | ||
| process.BeginOutputReadLine(); |
There was a problem hiding this comment.
The async stdout/stderr collection uses stdout += ... / stderr += ... inside DataReceived event handlers. These callbacks can run concurrently, and string concatenation is not thread-safe; it can also be costly for large output. Consider buffering with a thread-safe collector (e.g., ConcurrentQueue<string> or StringBuilder with a lock) and call process.WaitForExit() (or await stream completion) after WaitForExit(timeout) to ensure all async output has been drained before asserting/logging.
Summary
cDAC GC stress verification tool (
DOTNET_CdacStress) that compares stack GC references between the cDAC and the runtime at allocation stress points. Includes stack walker fixes, GC reference scanning implementation, custom signature decoding, and calling convention argument iteration.Note
This PR description was updated with AI assistance from Copilot.
Stack walker fixes
SW_SKIPPED_FRAME: do not callUpdateContextFromFrame(matches nativeSFITER_SKIPPED_FRAME_FUNCTIONwhich does not callUpdateRegDisplay)EnumGcRefsbetween consecutive skipped frames)Filter()to driveNext()directly, matching nativeFilter()+NextRaw()integration (prevents funclet-to-parent walk cycles)SkipActiveICFOnce/SkipCurrentFrameInCheck— active ICF double-yield is natural and harmlessIsAtFirstPassExceptionThrowSite— native does not suppress first-pass refsIsFirstnot preserved for skipped frames (was causingIsActiveFrame=falsefor the topmost managed frame)GC reference scanning
PromoteCallerStackfor stub frames (GCRefMap + MetaSig paths)SOSDacImpl.GetStackReferencesusing cDAC contract (was falling back to legacy DAC)FilterContextfor stack walk starting contextInProcessDataTargetDOTNET_CdacStressbit flags: ALLOC/INSTR/REFS/WALK/USE_DAC/UNIQUERuntimeSignatureDecoder
Custom signature decoder that handles
ELEMENT_TYPE_INTERNAL(0x21) andELEMENT_TYPE_CMOD_INTERNAL(0x22) which SRM'sSignatureDecodercannot parse. These occur in IL stubs, marshalling stubs, and unsafe accessor frames.IRuntimeSignatureTypeProvider<TType, TGenericContext>: superset of SRM'sISignatureTypeProvider, addsGetInternalTypeandGetInternalModifiedTypeISignatureReader+SpanSignatureReader: endianness-aware reader abstractionArgIterator (ported from crossgen2)
Proper calling convention analysis replacing the simplified 1-slot-per-param approach.
CallingConventionInfo: hybrid data descriptor layout values + ABI invariant constants for all architectures (x86, x64 Windows/Unix, ARM32, ARM64, LoongArch64, RISC-V64)ArgIterator.GetNextOffset(): maps each argument to its actual register or stack offsetOffsetOfFloatArgumentRegistersadded to TransitionBlock data descriptorCI infrastructure
cdac-stress-helix.projruntime-diagnostics.ymlCdacDumpTests buildArgs with+tools.cdacstresstestsTest results
Allocation-level stress (9 debuggees, ~46K verifications):
Instruction-level stress (9 debuggees, ~226K verifications):
All 9 debuggees pass with zero failures across 226,452 verifications.
*Comprehensive's 2 failures are pre-existing legacy DAC issues (DAC returns 0 refs for background threads in kernel waits; cDAC and runtime agree).