[cDAC] Fix EEClass validation corner case by max-charlamb · Pull Request #124780 · dotnet/runtime

max-charlamb · 2026-02-24T02:18:39Z

Looked into the persistent CI failure and think I found the issue. It looks like SOS is calling GetMethodTableData on a random address that happens to pass validation because it has a pointer going back to the MethodTable. However, when we try to read the full EEClass it isn't available and we throw a different error.

This change should make sure the EEClass is validated and readable. Added unit test to verify.

CI Failure

        STDIN: 00:00.374: !runcommand !clrstack
        00:00.683: OS Thread Id: 0xb08 (0)
        00:00.692:         Child SP               IP Call Site
        00:00.692: 0000002EEDD7E9E0 00007ff99863d280 [InlinedCallFrame: 0000002eedd7e9e0] VarargPInvokeInteropMD.Interop.printf(System.String, ...)
        00:00.697: 0000002EEDD7E9E0 00007ff8e620021a [InlinedCallFrame: 0000002eedd7e9e0] VarargPInvokeInteropMD.Interop.printf(System.String, ...)
        00:00.697: 0000002EEDD7E9B0 00007FF8E620021A ILStubClass.IL_STUB_PInvoke(System.String, Int32, Double, ...)
        00:00.745: 0000002EEDD7EAD0 00007FF8E61218B0 VarargPInvokeInteropMD.Program.Main() [/_/src/tests/SOS.UnitTests/Debuggees/VarargPInvokeInteropMD/Program.cs @ 16]
        00:00.751: <END_COMMAND_OUTPUT>
        00:00.751: 0:000> 
        STDIN: 00:00.752: !runcommand !IP2MD 00007FF8E620021A
        00:00.754: MethodDesc:   00007ff8e61e7b38
        00:00.754: Method Name:          ILStubClass.IL_STUB_PInvoke(System.String, Int32, Double, ...)
        00:00.754: Class:                00007ff8e61e7ac8
        00:00.754: MethodTable:          00007ff8e61e7ac8
        00:00.754: mdToken:              0000000006000000
        00:00.754: Module:               00007ff8e61e1b00
        00:00.754: IsJitted:             yes
        00:00.754: Current CodeAddr:     00007ff8e6200040
        00:00.754: Version History:
        00:00.755:   ILCodeVersion:      0000000000000000
        00:00.755:   ReJIT ID:           0
        00:00.755:   IL Addr:            0000000000000000
        00:00.755:      CodeAddr:           00007ff8e6200040  (MinOptJitted)
        00:00.755:      NativeCodeVersion:  0000000000000000
        00:00.757: <END_COMMAND_OUTPUT>
        00:00.757: 0:000> 
        STDIN: 00:00.757: !runcommand !clru 00007ff8e61e7b38
        00:00.758: Normal JIT generated code
        00:00.758: ILStubClass.IL_STUB_PInvoke(System.String, Int32, Double, ...)
        00:00.758: Begin 00007FF8E6200040, size 279
        00:00.759: 00007ff8`e6200040 48894c2408      mov     qword ptr [rsp+8],rcx
        00:00.761: 00007ff8`e6200045 4889542410      mov     qword ptr [rsp+10h],rdx
        00:00.762: 00007ff8`e620004a 4c89442418      mov     qword ptr [rsp+18h],r8
        00:00.763: 00007ff8`e620004f 4c894c2420      mov     qword ptr [rsp+20h],r9
        00:00.764: 00007ff8`e6200054 55              push    rbp
        00:00.766: 00007ff8`e6200055 4157            push    r15
        00:00.767: 00007ff8`e6200057 4156            push    r14
        00:00.768: 00007ff8`e6200059 4155            push    r13
        00:00.769: 00007ff8`e620005b 4154            push    r12
        00:00.770: 00007ff8`e620005d 57              push    rdi
        00:00.771: 00007ff8`e620005e 56              push    rsi
        00:00.773: 00007ff8`e620005f 53              push    rbx
        00:00.774: 00007ff8`e6200060 4881ecd8000000  sub     rsp,0D8h
        00:00.775: 00007ff8`e6200067 488d6c2420      lea     rbp,[rsp+20h]
        STDERROR: 00:00.787: Process terminated. Assertion failed.
        STDERROR: 00:00.788: cDAC: 80131c49, DAC: 80070057
        STDERROR: 00:00.788:    at System.Diagnostics.DebugProvider.Fail(String, String)
        STDERROR: 00:00.788:    at System.Diagnostics.Debug.Fail(String, String)
        STDERROR: 00:00.788:    at System.Diagnostics.Debug.Assert(Boolean, String, String)
        STDERROR: 00:00.788:    at System.Diagnostics.Debug.Assert(Boolean, String)
        STDERROR: 00:00.788:    at System.Diagnostics.Debug.Assert(Boolean, Debug.AssertInterpolatedStringHandler&)
        STDERROR: 00:00.788:    at Microsoft.Diagnostics.DataContractReader.Legacy.SOSDacImpl.Microsoft.Diagnostics.DataContractReader.Legacy.ISOSDacInterface.GetMethodTableData(ClrDataAddress, DacpMethodTableData*)
        STDERROR: 00:00.788:    at <Microsoft_Diagnostics_DataContractReader_Legacy_ISOSDacInterface>F7D08DFA63EEFD39A651C932BEE9B168F60916DB84778D32AACF3004D988BD863__InterfaceImplementation.ABI_GetMethodTableData(ComWrappers.ComInterfaceDispatch*, UInt64, DacpMethodTableData*)
    }

dotnet-policy-service · 2026-02-24T02:19:26Z

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR addresses a cDAC/legacy DAC HRESULT mismatch when SOS queries GetMethodTableData for a MethodTable whose EEClass pointer relationship superficially validates but whose EEClass memory is not actually readable (observed as a persistent CI failure). The fix makes EEClass readability part of MethodTable validation, and adds a regression test to ensure E_INVALIDARG is returned (matching legacy DAC behavior) instead of CORDBG_E_READVIRTUAL_FAILURE.

Changes:

Update MethodTable validation to eagerly construct/read Data.EEClass during validation so unreadable EEClass memory fails validation early.
Add a unit test that reproduces the “partially readable EEClass” scenario and asserts GetMethodTableData returns E_INVALIDARG.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs	Make EEClass validation eagerly read all EEClass fields so unreadable EEClass memory causes validation failure (and thus `E_INVALIDARG`).
src/native/managed/cdac/tests/MethodTableTests.cs	Add regression test covering the unreadable/partial EEClass scenario for `GetMethodTableData`.

src/native/managed/cdac/tests/MethodTableTests.cs

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

...icrosoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs

src/native/managed/cdac/tests/MethodTableTests.cs

jkotas · 2026-02-24T04:14:23Z

SOS is calling GetMethodTableData on a random address that happens to pass validation

This is like 123th time we are trying to patch some hole in this validation to fix intermittent failures. The current scheme is going to produce false positives by design.

I am wondering whether we can do better and implement 100% reliable validation: get module, token and instantiation from type, and lookup the type using those. If we get back the type we started with, it is a valid type. If not, it is a random pointer that looks like valid type.

noahfalk · 2026-02-24T08:27:41Z

get module, token and instantiation from type, and lookup the type using those

This sounds like it would be reliable at detecting if the pointer was originally allocated in the debuggee as a MethodTable. It wouldn't catch memory corruption to any portion of the data structure that wasn't directly used in the lookup. To me it sounds complimentary, but it wouldn't necessarily catch the kinds of issues Max's validation would detect.

As for feasibility, triage dumps today don't contain the EETypeHashTables and there may be other gaps. I'd guess we need to add at least 50 bytes per MethodTable to capture all the data structures the validation algo would need to touch. I wouldn't expect a ton of types in a triage dump (1 per stack frame) so maybe 10s of KB on a 2MB dump? Put a big margin of error on that until someone explores in more detail.

I think we'd get a good return on doing a little more validation of the immediate MethodTable/EEClass fields and stopping there. If you think its important we go farther we can, I'm just not sure it will give us much return on the dev time and extra dump memory.

This is like 123th time we are trying to patch some hole in this validation to fix intermittent failures

Maybe I'm missing some history. My understanding is that DAC's approach to MethodTable validation has been reasonably stable over a long period of time. We check the MethodTable -> EEClass -> MethodTable loop and assume any datastructure satisfying that constraint is valid. I wasn't aware of the history of validation changes you mentioned. Any breadcrumb I should be following?

...icrosoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs

jkotas · 2026-02-24T15:54:05Z

My understanding is that DAC's approach to MethodTable validation has been reasonably stable over a long period of time.

I have been personally fighting with it number of times. Mostly in .NET framework days where we run the SOS tests in the inner loop and the non-deterministic failures were a problem. We are not running the SOS tests in the inner loop these days. If we started running them again with high frequency, I expect we would start seeing the instability again.

It wouldn't catch memory corruption to any portion of the data structure that wasn't directly used in the lookup.

For investigation of crash dumps with corrupted data structures, this sort of validation is about as harmful as it is useful. For example, I have investigated a crash a few months ago where the EEClass pointer was corrupted: #119761 (comment) . This validation was not helping with the investigation.

triage dumps

Do we really need this sort of validation for triage dumps? Can the workflows for investigating triage dumps avoid throwing random pointers against DAC APIs and hoping it to return semi-accurate answer? Most SOS commands do not work well in triage dumps. I do not think we would lose much if we stopped doing this validation in triage dumps.

we'd get a good return on doing a little more validation of the immediate MethodTable/EEClass fields and stopping there.

I expect we will want to investigate creating EEClass/MethodDesc/FieldDesc lazily at some point to further improve startup performance by making CoreCLR w/ R2R characteristics more similar to NativeAOT. Doubling down on using EEClass/MethodDesc/FieldDesc for validation of random pointers would go against that.

I do not expect that this will be solved in this PR. I wanted to mention this since I do not think the current "design" of these validations is good. Maybe create an issue about this?

max-charlamb · 2026-02-24T16:19:06Z

SOS is calling GetMethodTableData on a random address that happens to pass validation

This is like 123th time we are trying to patch some hole in this validation to fix intermittent failures. The current scheme is going to produce false positives by design.

I am wondering whether we can do better and implement 100% reliable validation: get module, token and instantiation from type, and lookup the type using those. If we get back the type we started with, it is a valid type. If not, it is a random pointer that looks like valid type.

I'm not trying to modify the DAC MethodTable validation, I'm attempting to make the cDAC follow the same scheme to prevent failures in the runtime-diagnostic pipeline.

This error occurs because the cDAC validation logic does not check that the entire method table is readable until after validation occurs. This results in a virtual read exception rather than an argument exception.

jkotas · 2026-02-24T16:37:31Z

Right, I understand you are trying to reimplement the quirks of the legacy DAC in this PR. My point was that I do not think it is the best forward-looking approach.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

noahfalk · 2026-02-24T21:31:32Z

@jkotas - thanks for all the extra info. I read your concerns as being at least as much about having more control over where validation occurs in the workflow and what the UX experience of the validation is. Thus far SOS's approach I'd say is ad-hoc and leans towards eager validation + errors rather than lazy validation + non-blocking warnings. I can see advantages for both in different circumstances but I'm certainly open to changing defaults or giving more control that could be used by sophisticated devs to get the behavior they want. I opened: #124829

In terms of triage dumps, we could certainly skip doing the validation you proposed if the various type hashtables are missing. I don't believe we have any direct info about whether a dump is or isn't a triage dump but we can make decisions based on what memory blocks we find. Depending on the scenario SOS may or may not be in control of what pointers are being analyzed as MethodTables.

...icrosoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs

noahfalk

👍

## Summary Add IsContinuation to the cDAC RuntimeTypeSystem contract, enabling the cDAC to identify and validate continuation MethodTables created by the async continuation feature. Continuations are dynamically-created MethodTables (similar to arrays) whose parent is the base `Continuation` class stored in `g_pContinuationClassIfSubTypeCreated`. Without this change, the cDAC's MT→EEClass→MT validation roundtrip would reject valid continuation MTs. Related discussion: #124780 (comment) ## Changes - **`datadescriptor.inc`** — Expose `g_pContinuationClassIfSubTypeCreated` as `ContinuationMethodTable` global pointer - **`IRuntimeTypeSystem.cs`** — Add `IsContinuation(TypeHandle)` to the contract interface - **`RuntimeTypeSystem_1.cs`** — Implement `IsContinuation` by checking `ParentMethodTable == continuationMethodTablePointer` - **`RuntimeTypeSystemFactory.cs`** — Read the continuation MT global (gracefully handles missing global via `TryReadGlobalPointer`) - **`TypeValidation.cs`** — Fix MT→EEClass→MT validation to allow continuations (like arrays/generics) - **`Constants.cs`** — Add `ContinuationMethodTable` constant name - **Tests** — 4 test methods (8 cases across architectures): true positive, true negative, null global, and CanonMT validation --------- Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com>

max-charlamb requested review from noahfalk and rcj1 February 24, 2026 02:18

max-charlamb added the area-Diagnostics-coreclr label Feb 24, 2026

max-charlamb requested review from barosiak and Copilot February 24, 2026 02:18

dotnet-policy-service bot assigned max-charlamb Feb 24, 2026

Copilot started reviewing on behalf of max-charlamb February 24, 2026 02:19 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

src/native/managed/cdac/tests/MethodTableTests.cs Outdated Show resolved Hide resolved

src/native/managed/cdac/tests/MethodTableTests.cs Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings February 24, 2026 03:47

Copilot started reviewing on behalf of max-charlamb February 24, 2026 03:48 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

...icrosoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs Show resolved Hide resolved

src/native/managed/cdac/tests/MethodTableTests.cs Outdated Show resolved Hide resolved

src/native/managed/cdac/tests/MethodTableTests.cs Outdated Show resolved Hide resolved

max-charlamb marked this pull request as draft February 24, 2026 03:56

noahfalk reviewed Feb 24, 2026

View reviewed changes

...icrosoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs Outdated Show resolved Hide resolved

max-charlamb force-pushed the cdac-fix-eeclass-validation branch from 646f1e8 to 7bf5e94 Compare February 24, 2026 16:44

max-charlamb marked this pull request as ready for review February 24, 2026 17:30

Copilot AI review requested due to automatic review settings February 24, 2026 17:30

Copilot started reviewing on behalf of max-charlamb February 24, 2026 17:31 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

rcj1 mentioned this pull request Feb 24, 2026

Add GetStackLimits cDAC API #124682

Merged

noahfalk mentioned this pull request Feb 24, 2026

cDAC + SOS validation behavior #124829

Open

max-charlamb mentioned this pull request Feb 25, 2026

Add cDAC tests and implementation for GetGenerationTable and GetFinalizationFillPointers #124674

Merged

update type validation

f43b229

max-charlamb force-pushed the cdac-fix-eeclass-validation branch from 7bf5e94 to f43b229 Compare February 26, 2026 16:52

max-charlamb requested a review from noahfalk February 26, 2026 16:52

jkotas reviewed Feb 26, 2026

View reviewed changes

...icrosoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/TypeValidation.cs Show resolved Hide resolved

max-charlamb mentioned this pull request Feb 26, 2026

[cDAC] Add continuation support to RuntimeTypeSystem #124918

Merged

build-analysis bot mentioned this pull request Feb 26, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

noahfalk approved these changes Feb 26, 2026

View reviewed changes

hoyosjs approved these changes Feb 27, 2026

View reviewed changes

max-charlamb merged commit c69c476 into dotnet:main Feb 27, 2026
48 of 52 checks passed

max-charlamb deleted the cdac-fix-eeclass-validation branch February 27, 2026 15:05

dotnet-maestro bot mentioned this pull request Feb 28, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#5145

Merged

Conversation

max-charlamb commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkotas commented Feb 24, 2026

Uh oh!

noahfalk commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jkotas commented Feb 24, 2026

Uh oh!

max-charlamb commented Feb 24, 2026

Uh oh!

jkotas commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

noahfalk commented Feb 24, 2026

Uh oh!

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

max-charlamb commented Feb 24, 2026 •

edited

Loading

noahfalk commented Feb 24, 2026 •

edited

Loading