Skip to content

[cDAC] Add infrastructure to run cDAC tests using CLRMD and dumps#124564

Draft
max-charlamb wants to merge 29 commits intodotnet:mainfrom
max-charlamb:cdac-dumptests
Draft

[cDAC] Add infrastructure to run cDAC tests using CLRMD and dumps#124564
max-charlamb wants to merge 29 commits intodotnet:mainfrom
max-charlamb:cdac-dumptests

Conversation

@max-charlamb
Copy link
Member

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive dump-based integration testing infrastructure for the cDAC (Contract-based Data Access Component) project. The infrastructure enables testing cDAC contracts against real crash dumps from different .NET runtime versions, using ClrMD to read dump files and validate contract implementations.

Changes:

  • Adds DumpTests project with test infrastructure for analyzing crash dumps using cDAC contracts and ClrMD
  • Implements 5 debuggee applications that crash intentionally to generate test dumps (BasicThreads, TypeHierarchy, ExceptionState, MultiModule, GCRoots)
  • Provides PowerShell orchestration script and MSBuild targets for automated dump generation and test execution across multiple runtime versions (local build and .NET 10)

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj Excludes DumpTests directory from main test project
src/native/managed/cdac/tests/DumpTests/Microsoft.Diagnostics.DataContractReader.DumpTests.csproj New test project configuration with ClrMD and XUnit dependencies
src/native/managed/cdac/tests/DumpTests/DumpTests.targets MSBuild targets for building debuggees and generating dumps
src/native/managed/cdac/tests/DumpTests/RunDumpTests.ps1 PowerShell orchestration script for dump generation and test execution
src/native/managed/cdac/tests/DumpTests/DumpTestBase.cs Base class providing dump loading, cDAC target creation, and version-based test skipping
src/native/managed/cdac/tests/DumpTests/ClrMdDumpHost.cs ClrMD wrapper for loading dumps and finding contract descriptor symbols
src/native/managed/cdac/tests/DumpTests/SkipOnRuntimeVersionAttribute.cs Attribute for conditionally skipping tests based on runtime version
src/native/managed/cdac/tests/DumpTests/ThreadDumpTests.cs Tests for Thread contract using BasicThreads dump
src/native/managed/cdac/tests/DumpTests/RuntimeTypeSystemDumpTests.cs Tests for RuntimeTypeSystem contract using TypeHierarchy dump
src/native/managed/cdac/tests/DumpTests/LoaderDumpTests.cs Tests for Loader contract using MultiModule dump
src/native/managed/cdac/tests/DumpTests/ExceptionDumpTests.cs Tests for Exception contract using ExceptionState dump
src/native/managed/cdac/tests/DumpTests/ObjectDumpTests.cs Tests for Object and GC contracts using GCRoots dump
src/native/managed/cdac/tests/DumpTests/Debuggees/BasicThreads/* Debuggee app that spawns named threads before crashing
src/native/managed/cdac/tests/DumpTests/Debuggees/TypeHierarchy/* Debuggee app with type hierarchies, generics, and arrays
src/native/managed/cdac/tests/DumpTests/Debuggees/MultiModule/* Debuggee app with multiple AssemblyLoadContexts
src/native/managed/cdac/tests/DumpTests/Debuggees/ExceptionState/* Debuggee app with nested exception chain
src/native/managed/cdac/tests/DumpTests/Debuggees/GCRoots/* Debuggee app with pinned objects and GC handles
src/native/managed/cdac/tests/DumpTests/Debuggees/Directory.Build.props Shared configuration for debuggee projects
eng/Versions.props Adds Microsoft.Diagnostics.Runtime package version

Copilot AI review requested due to automatic review settings February 18, 2026 19:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Copilot AI review requested due to automatic review settings February 18, 2026 20:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

Copilot AI review requested due to automatic review settings February 18, 2026 21:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

The artifactFileName used $(archiveExtension) which resolves to the
downloading platform's archive format (.zip on Windows, .tar.gz on
Linux). When downloading cross-platform dumps, this produces the wrong
filename — e.g., a Linux agent looks for CdacDumps_windows_x64.tar.gz
but the artifact was uploaded from Windows as CdacDumps_windows_x64.zip.

Hardcode the correct extension for each platform's artifact so the
ExtractFiles glob matches regardless of which platform downloads it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@steveisok steveisok self-requested a review February 19, 2026 00:05
The dump artifacts are consumed cross-platform, but the upload used
the platform-specific archive format (zip on Windows, tar.gz on Linux).
This caused the Linux agent to fail extracting Windows .zip artifacts
because unzip is not installed on the build image.

Force tar.gz for all dump artifact uploads so both platforms can
extract them without additional tooling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 5 comments.

Copilot AI review requested due to automatic review settings February 19, 2026 16:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Comment on lines 62 to 64
// We just assert there are multiple frames visible.
Assert.True(frameList.Count >= 1,
$"Expected multiple stack frames from the crashing thread, got {frameList.Count}");
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states the debuggee has "Main → MethodA → MethodB → MethodC → FailFast" which suggests at least 4-5 managed frames (Main, MethodA, MethodB, MethodC), but the test only verifies >= 1 frame and the comment explains "the stack walk may include runtime helper frames and native transitions." Consider using a more appropriate minimum value (like >= 3 or >= 4) to better validate the expected call stack structure, or update the comment to clarify why only 1 frame is required.

Suggested change
// We just assert there are multiple frames visible.
Assert.True(frameList.Count >= 1,
$"Expected multiple stack frames from the crashing thread, got {frameList.Count}");
// We assert there are at least three frames visible to allow for helper/native frames.
Assert.True(frameList.Count >= 3,
$"Expected at least three stack frames from the crashing thread, got {frameList.Count}");

Copilot uses AI. Check for mistakes.
Comment on lines 86 to 87
# --- Resolve paths ---
$repoRoot = Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent $PSScriptRoot)))))
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path calculation uses six nested Split-Path calls to go up from DumpTests directory to the repo root. This is fragile and difficult to maintain. Consider using a more robust method like searching for global.json or using $PSScriptRoot with a relative path calculation, similar to how FindRepoRoot() works in DumpTestBase.cs.

Suggested change
# --- Resolve paths ---
$repoRoot = Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent (Split-Path -Parent $PSScriptRoot)))))
function Find-RepoRoot {
param(
[string]$StartDirectory
)
$current = $StartDirectory
while ($true) {
if (Test-Path (Join-Path $current "global.json")) {
return $current
}
$parent = Split-Path -Parent $current
if ([string]::IsNullOrEmpty($parent) -or ($parent -eq $current)) {
throw "Failed to locate repo root starting from '$StartDirectory'. global.json not found."
}
$current = $parent
}
}
# --- Resolve paths ---
$repoRoot = Find-RepoRoot -StartDirectory $PSScriptRoot

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +274
$testHostDir = Join-Path $repoRoot "artifacts\bin\testhost\$tfm-windows-$TestHostConfiguration-x64"

if (-not (Test-Path "$testHostDir\dotnet.exe")) {
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script hardcodes the assumption that local dumps are on Windows (x64) by using "windows" in the testHostDir path and ".exe" extension. This won't work on Linux or macOS. While the script documentation says "This script is Windows-only" at line 14, the MSBuild targets are designed to be cross-platform. Consider either making the script cross-platform to match the MSBuild targets or documenting more prominently that this is Windows-only and users should use the MSBuild targets on other platforms.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 19, 2026 16:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

// so split on both separators to extract the file name correctly when
// analyzing cross-platform dumps.
int lastSep = Math.Max(fileName.LastIndexOf('/'), fileName.LastIndexOf('\\'));
string name = lastSep >= 0 ? fileName.Substring(lastSep + 1) : fileName;
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string manipulation using Substring is not safe when the file name is exactly at the end of the path. Consider using Path.GetFileName with appropriate handling, or use the newer string slicing syntax [Range] which is more idiomatic in modern C#. The current code assumes lastSep >= 0 is checked before calling Substring, which is correct, but the fallback case fileName when lastSep < 0 means there's no directory separator - you could simplify this to:

string name = lastSep >= 0 ? fileName[(lastSep + 1)..] : fileName;

This uses the range operator which is more modern and clearer.

Suggested change
string name = lastSep >= 0 ? fileName.Substring(lastSep + 1) : fileName;
string name = lastSep >= 0 ? fileName[(lastSep + 1)..] : fileName;

Copilot uses AI. Check for mistakes.
public void Dispose()
{
_dataTarget.Dispose();
GC.SuppressFinalize(this);
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC.SuppressFinalize(this) call is unnecessary here because the class doesn't define a finalizer. This pattern is typically used when a class has both a finalizer and a Dispose method to prevent the finalizer from running after Dispose has been called. Since ClrMdDumpHost doesn't have a finalizer, this line can be removed.

Suggested change
GC.SuppressFinalize(this);

Copilot uses AI. Check for mistakes.
exit 1
}
foreach ($f in $dumpFiles) {
$rel = $f.FullName.Substring($extractDir.Length + 1)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String substring operation can fail if $extractDir.Length + 1 exceeds the length of $f.FullName. This can happen if the file path is exactly at the root of $extractDir (though unlikely). Consider using -replace or checking the length before calling Substring. A safer approach:

$rel = $f.FullName.Replace("$extractDir\", "")

or use the -replace operator with proper escaping.

Suggested change
$rel = $f.FullName.Substring($extractDir.Length + 1)
$rel = $f.FullName.Replace("$extractDir\", "")

Copilot uses AI. Check for mistakes.
# Download and test against dumps from each platform
- ${{ each dumpPlatform in parameters.cdacDumpPlatforms }}:
- template: /eng/pipelines/common/download-artifact-step.yml
parameters:
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform naming convention appears inconsistent. The artifact name uses underscores (CdacDumps_${{ dumpPlatform }}), but typically Azure Pipelines artifact names should avoid special characters where possible. Consider using a hyphen instead of underscore for better compatibility, or document why the underscore is intentional. Additionally, ensure that ${{ dumpPlatform }} values like "windows_x64" won't cause issues when used in artifact names - though this appears to be following existing conventions in the repo.

Suggested change
parameters:
parameters:
# NOTE: Underscores in this artifact name are intentional to match the
# producing job's artifact naming and existing CdacDumps_* conventions.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 19, 2026 19:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.

--logger "trx;LogFileName=CdacDumpTests_${{ dumpPlatform }}.trx"
--results-directory $(Build.SourcesDirectory)/artifacts/TestResults/$(_BuildConfig)/${{ dumpPlatform }}
displayName: 'Run cDAC Dump Tests (${{ dumpPlatform }} dumps)'
continueOnError: true
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dotnet test step is marked continueOnError: true, so dump-test failures won't fail the job immediately. If these tests are intended to gate the pipeline, remove continueOnError (or capture $LASTEXITCODE and fail at the end).

Suggested change
continueOnError: true

Copilot uses AI. Check for mistakes.
failTaskOnFailedTests: true
publishRunAttachments: true
buildConfiguration: $(_BuildConfig)
continueOnError: true
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With failTaskOnFailedTests: true, setting continueOnError: true on PublishTestResults@2 can allow failed dump tests to not fail the job. If failures should be enforced, remove continueOnError (or handle failures explicitly).

Suggested change
continueOnError: true

Copilot uses AI. Check for mistakes.
bool created = ContractDescriptorTarget.TryCreate(
contractDescriptor,
_host.ReadFromTarget,
writeToTarget: null!,
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContractDescriptorTarget.TryCreate requires a non-null writeToTarget delegate. Passing null! will compile but can trigger a NullReferenceException if any contract code (or future tests) uses Target.Write* APIs. Consider passing a stub delegate that returns an error code (e.g., -1) to make this safely read-only.

Suggested change
writeToTarget: null!,
writeToTarget: static (_, _) => -1,

Copilot uses AI. Check for mistakes.

/// <summary>
/// The target operating system of the dump, resolved from the RuntimeInfo contract.
/// May be <c>null</c> if the contract is unavailable.
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML docs say TargetOS “may be null”, but the property currently returns string.Empty when unknown. Either return string? to match the docs, or update the docs to say it returns an empty string when the OS can't be determined.

Suggested change
/// May be <c>null</c> if the contract is unavailable.
/// Returns <see cref="string.Empty"/> if the operating system cannot be determined.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +59
// The debuggee has Main → MethodA → MethodB → MethodC → FailFast,
// but the stack walk may include runtime helper frames and native transitions.
// We just assert there are multiple frames visible.
Assert.True(frameList.Count >= 1,
$"Expected at least 1 stack frame from the crashing thread, got {frameList.Count}");
}
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is named "HasMultipleFrames" but only asserts frameList.Count >= 1, which doesn't validate “multiple” frames (and duplicates the earlier test that already checks for > 0). Update the assertion (and/or test name) so the expectation matches the intent (e.g., require at least 2 frames).

Copilot uses AI. Check for mistakes.
# --- Helper: set dump env vars ---
function Set-DumpEnvVars($dumpFilePath) {
$env:DOTNET_DbgEnableMiniDump = "1"
$env:DOTNET_DbgMiniDumpType = "4"
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOTNET_DbgMiniDumpType is set to "4", which corresponds to MiniDumpWithHandleData (not full memory) and can produce dumps missing heap/memory required for analysis. Use MiniDumpWithFullMemory (string form used elsewhere in the repo) or the numeric value for full memory (2) so dump-based tests have the needed data.

Suggested change
$env:DOTNET_DbgMiniDumpType = "4"
$env:DOTNET_DbgMiniDumpType = "MiniDumpWithFullMemory"

Copilot uses AI. Check for mistakes.
}

# --- Debuggees and versions ---
$allDebugees = @("BasicThreads", "TypeHierarchy", "ExceptionState", "MultiModule", "GCRoots", "ServerGC", "StackWalk")
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name $allDebugees is misspelled (missing a “g”), which makes the script harder to read/grep and easy to mistype later. Rename to $allDebuggees (and update all references) for clarity.

Suggested change
$allDebugees = @("BasicThreads", "TypeHierarchy", "ExceptionState", "MultiModule", "GCRoots", "ServerGC", "StackWalk")
$allDebuggees = @("BasicThreads", "TypeHierarchy", "ExceptionState", "MultiModule", "GCRoots", "ServerGC", "StackWalk")

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +25
[ConditionalFact]
public void Exception_ThreadHasCurrentException()
{
IThread threadContract = Target.Contracts.Thread;
Assert.NotNull(threadContract);

ThreadStoreData storeData = threadContract.GetThreadStoreData();
Assert.True(storeData.ThreadCount > 0);
}
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name suggests it validates a current exception, but the body only checks that ThreadCount > 0. Either assert something exception-related (e.g., presence of LastThrownObjectHandle / nested exception data) or rename the test to reflect what it actually verifies.

Copilot uses AI. Check for mistakes.
</PropertyGroup>

<!-- Skip if dump already exists -->
<Message Text="Dump already exists: $(_DumpFile). Skipping." Condition="Exists('$(_DumpFile)')" Importance="high" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this should be happening as a build step - but it will also need this https://github.com/dotnet/diagnostics/blob/main/src/tests/SOS.UnitTests/DumpGenerationFixture.cs or https://github.com/dotnet/diagnostics/blob/main/eng/DisableSignatureCheck.ps1. It just depends - since this is server 2022 the latter might work. But the win10 queue has different requirements and we might need to change diff reg keys (use the legacy ones).

buildConfig: release
platforms: ${{ parameters.cdacDumpPlatforms }}
jobParameters:
buildArgs: -s clr+libs+tools.cdac -c $(_BuildConfig) -rc $(_BuildConfig) -lc $(_BuildConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this would add a fair amount of load (and potentially instability) if we are running these in every CI. Wasn't sure if that was the intent.

I think something like this might work out well:

  • For validating the runtime is generating the right contract data, create the dumps during a normal test leg and immediately consume them (no persistence, no cross-plat). Ideally I'd love to see that testing running in the runtime repo CI assuming its pretty low cost/reliable.
  • For validating that cDAC is correctly implemented in inner dev loops and normal CI runs, use MockDataContracts to supply the data rather than dumps. It should execute very fast, its fully deterministic, no dependencies, and can simulate all architectures X all contract versions
  • Infrequently (once a week?) do this cross-arch dump verification. If we did a decent job in the first two bullets then we probably wouldn't see many issues here at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments