[cDAC] Add infrastructure to run cDAC tests using CLRMD and dumps#124564
[cDAC] Add infrastructure to run cDAC tests using CLRMD and dumps#124564max-charlamb wants to merge 29 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive dump-based integration testing infrastructure for the cDAC (Contract-based Data Access Component) project. The infrastructure enables testing cDAC contracts against real crash dumps from different .NET runtime versions, using ClrMD to read dump files and validate contract implementations.
Changes:
- Adds DumpTests project with test infrastructure for analyzing crash dumps using cDAC contracts and ClrMD
- Implements 5 debuggee applications that crash intentionally to generate test dumps (BasicThreads, TypeHierarchy, ExceptionState, MultiModule, GCRoots)
- Provides PowerShell orchestration script and MSBuild targets for automated dump generation and test execution across multiple runtime versions (local build and .NET 10)
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj | Excludes DumpTests directory from main test project |
| src/native/managed/cdac/tests/DumpTests/Microsoft.Diagnostics.DataContractReader.DumpTests.csproj | New test project configuration with ClrMD and XUnit dependencies |
| src/native/managed/cdac/tests/DumpTests/DumpTests.targets | MSBuild targets for building debuggees and generating dumps |
| src/native/managed/cdac/tests/DumpTests/RunDumpTests.ps1 | PowerShell orchestration script for dump generation and test execution |
| src/native/managed/cdac/tests/DumpTests/DumpTestBase.cs | Base class providing dump loading, cDAC target creation, and version-based test skipping |
| src/native/managed/cdac/tests/DumpTests/ClrMdDumpHost.cs | ClrMD wrapper for loading dumps and finding contract descriptor symbols |
| src/native/managed/cdac/tests/DumpTests/SkipOnRuntimeVersionAttribute.cs | Attribute for conditionally skipping tests based on runtime version |
| src/native/managed/cdac/tests/DumpTests/ThreadDumpTests.cs | Tests for Thread contract using BasicThreads dump |
| src/native/managed/cdac/tests/DumpTests/RuntimeTypeSystemDumpTests.cs | Tests for RuntimeTypeSystem contract using TypeHierarchy dump |
| src/native/managed/cdac/tests/DumpTests/LoaderDumpTests.cs | Tests for Loader contract using MultiModule dump |
| src/native/managed/cdac/tests/DumpTests/ExceptionDumpTests.cs | Tests for Exception contract using ExceptionState dump |
| src/native/managed/cdac/tests/DumpTests/ObjectDumpTests.cs | Tests for Object and GC contracts using GCRoots dump |
| src/native/managed/cdac/tests/DumpTests/Debuggees/BasicThreads/* | Debuggee app that spawns named threads before crashing |
| src/native/managed/cdac/tests/DumpTests/Debuggees/TypeHierarchy/* | Debuggee app with type hierarchies, generics, and arrays |
| src/native/managed/cdac/tests/DumpTests/Debuggees/MultiModule/* | Debuggee app with multiple AssemblyLoadContexts |
| src/native/managed/cdac/tests/DumpTests/Debuggees/ExceptionState/* | Debuggee app with nested exception chain |
| src/native/managed/cdac/tests/DumpTests/Debuggees/GCRoots/* | Debuggee app with pinned objects and GC handles |
| src/native/managed/cdac/tests/DumpTests/Debuggees/Directory.Build.props | Shared configuration for debuggee projects |
| eng/Versions.props | Adds Microsoft.Diagnostics.Runtime package version |
src/native/managed/cdac/tests/DumpTests/RuntimeTypeSystemDumpTests.cs
Outdated
Show resolved
Hide resolved
The artifactFileName used $(archiveExtension) which resolves to the downloading platform's archive format (.zip on Windows, .tar.gz on Linux). When downloading cross-platform dumps, this produces the wrong filename — e.g., a Linux agent looks for CdacDumps_windows_x64.tar.gz but the artifact was uploaded from Windows as CdacDumps_windows_x64.zip. Hardcode the correct extension for each platform's artifact so the ExtractFiles glob matches regardless of which platform downloads it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dump artifacts are consumed cross-platform, but the upload used the platform-specific archive format (zip on Windows, tar.gz on Linux). This caused the Linux agent to fail extracting Windows .zip artifacts because unzip is not installed on the build image. Force tar.gz for all dump artifact uploads so both platforms can extract them without additional tooling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/native/managed/cdac/tests/DumpTests/SkipOnRuntimeVersionAttribute.cs
Outdated
Show resolved
Hide resolved
src/native/managed/cdac/tests/DumpTests/SkipOnRuntimeVersionAttribute.cs
Outdated
Show resolved
Hide resolved
| // We just assert there are multiple frames visible. | ||
| Assert.True(frameList.Count >= 1, | ||
| $"Expected multiple stack frames from the crashing thread, got {frameList.Count}"); |
There was a problem hiding this comment.
The comment states the debuggee has "Main → MethodA → MethodB → MethodC → FailFast" which suggests at least 4-5 managed frames (Main, MethodA, MethodB, MethodC), but the test only verifies >= 1 frame and the comment explains "the stack walk may include runtime helper frames and native transitions." Consider using a more appropriate minimum value (like >= 3 or >= 4) to better validate the expected call stack structure, or update the comment to clarify why only 1 frame is required.
| // We just assert there are multiple frames visible. | |
| Assert.True(frameList.Count >= 1, | |
| $"Expected multiple stack frames from the crashing thread, got {frameList.Count}"); | |
| // We assert there are at least three frames visible to allow for helper/native frames. | |
| Assert.True(frameList.Count >= 3, | |
| $"Expected at least three stack frames from the crashing thread, got {frameList.Count}"); |
| # --- Resolve paths --- | ||
| $repoRoot = Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent $PSScriptRoot))))) |
There was a problem hiding this comment.
The path calculation uses six nested Split-Path calls to go up from DumpTests directory to the repo root. This is fragile and difficult to maintain. Consider using a more robust method like searching for global.json or using $PSScriptRoot with a relative path calculation, similar to how FindRepoRoot() works in DumpTestBase.cs.
| # --- Resolve paths --- | |
| $repoRoot = Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent $PSScriptRoot))))) | |
| function Find-RepoRoot { | |
| param( | |
| [string]$StartDirectory | |
| ) | |
| $current = $StartDirectory | |
| while ($true) { | |
| if (Test-Path (Join-Path $current "global.json")) { | |
| return $current | |
| } | |
| $parent = Split-Path -Parent $current | |
| if ([string]::IsNullOrEmpty($parent) -or ($parent -eq $current)) { | |
| throw "Failed to locate repo root starting from '$StartDirectory'. global.json not found." | |
| } | |
| $current = $parent | |
| } | |
| } | |
| # --- Resolve paths --- | |
| $repoRoot = Find-RepoRoot -StartDirectory $PSScriptRoot |
| $testHostDir = Join-Path $repoRoot "artifacts\bin\testhost\$tfm-windows-$TestHostConfiguration-x64" | ||
|
|
||
| if (-not (Test-Path "$testHostDir\dotnet.exe")) { |
There was a problem hiding this comment.
The script hardcodes the assumption that local dumps are on Windows (x64) by using "windows" in the testHostDir path and ".exe" extension. This won't work on Linux or macOS. While the script documentation says "This script is Windows-only" at line 14, the MSBuild targets are designed to be cross-platform. Consider either making the script cross-platform to match the MSBuild targets or documenting more prominently that this is Windows-only and users should use the MSBuild targets on other platforms.
| // so split on both separators to extract the file name correctly when | ||
| // analyzing cross-platform dumps. | ||
| int lastSep = Math.Max(fileName.LastIndexOf('/'), fileName.LastIndexOf('\\')); | ||
| string name = lastSep >= 0 ? fileName.Substring(lastSep + 1) : fileName; |
There was a problem hiding this comment.
The string manipulation using Substring is not safe when the file name is exactly at the end of the path. Consider using Path.GetFileName with appropriate handling, or use the newer string slicing syntax [Range] which is more idiomatic in modern C#. The current code assumes lastSep >= 0 is checked before calling Substring, which is correct, but the fallback case fileName when lastSep < 0 means there's no directory separator - you could simplify this to:
string name = lastSep >= 0 ? fileName[(lastSep + 1)..] : fileName;This uses the range operator which is more modern and clearer.
| string name = lastSep >= 0 ? fileName.Substring(lastSep + 1) : fileName; | |
| string name = lastSep >= 0 ? fileName[(lastSep + 1)..] : fileName; |
| public void Dispose() | ||
| { | ||
| _dataTarget.Dispose(); | ||
| GC.SuppressFinalize(this); |
There was a problem hiding this comment.
The GC.SuppressFinalize(this) call is unnecessary here because the class doesn't define a finalizer. This pattern is typically used when a class has both a finalizer and a Dispose method to prevent the finalizer from running after Dispose has been called. Since ClrMdDumpHost doesn't have a finalizer, this line can be removed.
| GC.SuppressFinalize(this); |
| exit 1 | ||
| } | ||
| foreach ($f in $dumpFiles) { | ||
| $rel = $f.FullName.Substring($extractDir.Length + 1) |
There was a problem hiding this comment.
String substring operation can fail if $extractDir.Length + 1 exceeds the length of $f.FullName. This can happen if the file path is exactly at the root of $extractDir (though unlikely). Consider using -replace or checking the length before calling Substring. A safer approach:
$rel = $f.FullName.Replace("$extractDir\", "")or use the -replace operator with proper escaping.
| $rel = $f.FullName.Substring($extractDir.Length + 1) | |
| $rel = $f.FullName.Replace("$extractDir\", "") |
| # Download and test against dumps from each platform | ||
| - ${{ each dumpPlatform in parameters.cdacDumpPlatforms }}: | ||
| - template: /eng/pipelines/common/download-artifact-step.yml | ||
| parameters: |
There was a problem hiding this comment.
The platform naming convention appears inconsistent. The artifact name uses underscores (CdacDumps_${{ dumpPlatform }}), but typically Azure Pipelines artifact names should avoid special characters where possible. Consider using a hyphen instead of underscore for better compatibility, or document why the underscore is intentional. Additionally, ensure that ${{ dumpPlatform }} values like "windows_x64" won't cause issues when used in artifact names - though this appears to be following existing conventions in the repo.
| parameters: | |
| parameters: | |
| # NOTE: Underscores in this artifact name are intentional to match the | |
| # producing job's artifact naming and existing CdacDumps_* conventions. |
…s which does not exist on all platforms
| --logger "trx;LogFileName=CdacDumpTests_${{ dumpPlatform }}.trx" | ||
| --results-directory $(Build.SourcesDirectory)/artifacts/TestResults/$(_BuildConfig)/${{ dumpPlatform }} | ||
| displayName: 'Run cDAC Dump Tests (${{ dumpPlatform }} dumps)' | ||
| continueOnError: true |
There was a problem hiding this comment.
The dotnet test step is marked continueOnError: true, so dump-test failures won't fail the job immediately. If these tests are intended to gate the pipeline, remove continueOnError (or capture $LASTEXITCODE and fail at the end).
| continueOnError: true |
| failTaskOnFailedTests: true | ||
| publishRunAttachments: true | ||
| buildConfiguration: $(_BuildConfig) | ||
| continueOnError: true |
There was a problem hiding this comment.
With failTaskOnFailedTests: true, setting continueOnError: true on PublishTestResults@2 can allow failed dump tests to not fail the job. If failures should be enforced, remove continueOnError (or handle failures explicitly).
| continueOnError: true |
| bool created = ContractDescriptorTarget.TryCreate( | ||
| contractDescriptor, | ||
| _host.ReadFromTarget, | ||
| writeToTarget: null!, |
There was a problem hiding this comment.
ContractDescriptorTarget.TryCreate requires a non-null writeToTarget delegate. Passing null! will compile but can trigger a NullReferenceException if any contract code (or future tests) uses Target.Write* APIs. Consider passing a stub delegate that returns an error code (e.g., -1) to make this safely read-only.
| writeToTarget: null!, | |
| writeToTarget: static (_, _) => -1, |
|
|
||
| /// <summary> | ||
| /// The target operating system of the dump, resolved from the RuntimeInfo contract. | ||
| /// May be <c>null</c> if the contract is unavailable. |
There was a problem hiding this comment.
The XML docs say TargetOS “may be null”, but the property currently returns string.Empty when unknown. Either return string? to match the docs, or update the docs to say it returns an empty string when the OS can't be determined.
| /// May be <c>null</c> if the contract is unavailable. | |
| /// Returns <see cref="string.Empty"/> if the operating system cannot be determined. |
| // The debuggee has Main → MethodA → MethodB → MethodC → FailFast, | ||
| // but the stack walk may include runtime helper frames and native transitions. | ||
| // We just assert there are multiple frames visible. | ||
| Assert.True(frameList.Count >= 1, | ||
| $"Expected at least 1 stack frame from the crashing thread, got {frameList.Count}"); | ||
| } |
There was a problem hiding this comment.
This test is named "HasMultipleFrames" but only asserts frameList.Count >= 1, which doesn't validate “multiple” frames (and duplicates the earlier test that already checks for > 0). Update the assertion (and/or test name) so the expectation matches the intent (e.g., require at least 2 frames).
| # --- Helper: set dump env vars --- | ||
| function Set-DumpEnvVars($dumpFilePath) { | ||
| $env:DOTNET_DbgEnableMiniDump = "1" | ||
| $env:DOTNET_DbgMiniDumpType = "4" |
There was a problem hiding this comment.
DOTNET_DbgMiniDumpType is set to "4", which corresponds to MiniDumpWithHandleData (not full memory) and can produce dumps missing heap/memory required for analysis. Use MiniDumpWithFullMemory (string form used elsewhere in the repo) or the numeric value for full memory (2) so dump-based tests have the needed data.
| $env:DOTNET_DbgMiniDumpType = "4" | |
| $env:DOTNET_DbgMiniDumpType = "MiniDumpWithFullMemory" |
| } | ||
|
|
||
| # --- Debuggees and versions --- | ||
| $allDebugees = @("BasicThreads", "TypeHierarchy", "ExceptionState", "MultiModule", "GCRoots", "ServerGC", "StackWalk") |
There was a problem hiding this comment.
Variable name $allDebugees is misspelled (missing a “g”), which makes the script harder to read/grep and easy to mistype later. Rename to $allDebuggees (and update all references) for clarity.
| $allDebugees = @("BasicThreads", "TypeHierarchy", "ExceptionState", "MultiModule", "GCRoots", "ServerGC", "StackWalk") | |
| $allDebuggees = @("BasicThreads", "TypeHierarchy", "ExceptionState", "MultiModule", "GCRoots", "ServerGC", "StackWalk") |
| [ConditionalFact] | ||
| public void Exception_ThreadHasCurrentException() | ||
| { | ||
| IThread threadContract = Target.Contracts.Thread; | ||
| Assert.NotNull(threadContract); | ||
|
|
||
| ThreadStoreData storeData = threadContract.GetThreadStoreData(); | ||
| Assert.True(storeData.ThreadCount > 0); | ||
| } |
There was a problem hiding this comment.
The test name suggests it validates a current exception, but the body only checks that ThreadCount > 0. Either assert something exception-related (e.g., presence of LastThrownObjectHandle / nested exception data) or rename the test to reflect what it actually verifies.
| </PropertyGroup> | ||
|
|
||
| <!-- Skip if dump already exists --> | ||
| <Message Text="Dump already exists: $(_DumpFile). Skipping." Condition="Exists('$(_DumpFile)')" Importance="high" /> |
There was a problem hiding this comment.
I am not sure this should be happening as a build step - but it will also need this https://github.com/dotnet/diagnostics/blob/main/src/tests/SOS.UnitTests/DumpGenerationFixture.cs or https://github.com/dotnet/diagnostics/blob/main/eng/DisableSignatureCheck.ps1. It just depends - since this is server 2022 the latter might work. But the win10 queue has different requirements and we might need to change diff reg keys (use the legacy ones).
| buildConfig: release | ||
| platforms: ${{ parameters.cdacDumpPlatforms }} | ||
| jobParameters: | ||
| buildArgs: -s clr+libs+tools.cdac -c $(_BuildConfig) -rc $(_BuildConfig) -lc $(_BuildConfig) |
There was a problem hiding this comment.
I'm concerned this would add a fair amount of load (and potentially instability) if we are running these in every CI. Wasn't sure if that was the intent.
I think something like this might work out well:
- For validating the runtime is generating the right contract data, create the dumps during a normal test leg and immediately consume them (no persistence, no cross-plat). Ideally I'd love to see that testing running in the runtime repo CI assuming its pretty low cost/reliable.
- For validating that cDAC is correctly implemented in inner dev loops and normal CI runs, use MockDataContracts to supply the data rather than dumps. It should execute very fast, its fully deterministic, no dependencies, and can simulate all architectures X all contract versions
- Infrequently (once a week?) do this cross-arch dump verification. If we did a decent job in the first two bullets then we probably wouldn't see many issues here at all.
No description provided.