Race in PInvoke stub generation

### Summary

Multiple threads can race in P/Invoke ILStub creation and produce different `PCODE` values for the same `MethodDesc`, violating the assert in `DoPrestub`.

### Failing Assert

[prestub.cpp#L2340](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/prestub.cpp#L2340):
```cpp
_ASSERTE(pCode == (PCODE)NULL || GetNativeCode() == (PCODE)NULL || pCode == GetNativeCode());
```

Thread A calls `GetStubForInteropMethod` and gets back `pCode` = X, then thread B does the same and gets `pCode` = Y. Thread A's `SetCodeEntryPoint(X)` publishes X as `GetNativeCode()`. Thread B then hits the assert because `Y != X`.

### Why different stubs are produced

The existing comment at [dllimport.cpp#L5689-L5699](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/dllimport.cpp#L5689-L5699) claims this can't happen:

```cpp
// NOTE: there is a race in updating this MethodDesc.  We depend on all
// threads getting back the same DynamicMethodDesc for a particular
// PInvokeMethodDesc, in that case, the locking around the actual JIT
// operation will prevent the code from being jitted more than once.
// By the time we get here, all threads get the same address of code
// back from the JIT operation and they all just fill in the same value
// here.
```

This invariant depends on all threads resolving to the **same** `DynamicMethodDesc` (ILStub) via the `ILStubCache`. However, the cache lookup and insert in `ILStubCache::GetStubMethodDesc` ([ilstubcache.cpp#L514-L586](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/ilstubcache.cpp#L514-L586)) is **guarded by `SF_IsSharedStub`**:

```cpp
if (SF_IsSharedStub(dwStubFlags))
{
    CrstHolder ch(&m_crst);
    const ILStubCacheEntry* phe = m_hashMap.LookupPtr(pHashBlob);
    if (phe)
    {
        pMD = phe->m_pMethodDesc;
    }
}
```

For forward P/Invokes (non-CALLI, non-vararg), `IsSharedStubScenario` returns **`false`** ([dllimport.cpp#L232-L235](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/dllimport.cpp#L232-L235)):

```cpp
if (SF_IsForwardPInvokeStub(dwStubFlags) && !SF_IsCALLIStub(dwStubFlags) && !SF_IsVarArgStub(dwStubFlags))
{
    return false;
}
```

So `SF_IsSharedStub` is never set for ordinary P/Invokes, which means:
1. The hash-based cache lookup is **skipped entirely**
2. `CreateNewMethodDesc` is called unconditionally — each racing thread gets a **distinct** `MethodDesc*`
3. The subsequent `JitILStub` produces different `PCODE` values (same code, different addresses)
4. `SetCodeEntryPoint` in `DoPrestub` sees a mismatch and asserts

### Root Cause

PR [#117901](https://github.com/dotnet/runtime/pull/117901) removed shared IL stubs for P/Invokes by making `IsSharedStubScenario` return `false` for forward P/Invokes. This was previously the mechanism that ensured all threads resolved to the same `DynamicMethodDesc`. With it gone, concurrent threads each create their own stub and JIT it independently, producing distinct `PCODE` values — invalidating the comment at [dllimport.cpp#L5689](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/dllimport.cpp#L5689).

### Relevant Call Path

```
MethodDesc::DoPrestub                                       // prestub.cpp
  → GetStubForInteropMethod(this)                           // prestub.cpp:2283
    → PInvoke::GetStubForILStub(pNMD, &pStubMD, dwStubFlags)  // dllimport.cpp:5779
      → GetILStubMethodDesc(pNMD, &sigInfo, dwStubFlags)      // dllimport.cpp:5653
        → PInvoke::CreateCLRToNativeILStub(...)                // dllimport.cpp:5528
          → CreateInteropILStub(...)                           // dllimport.cpp:4878
            → ILStubCreatorHelper::GetStubMethodDesc()         // dllimport.cpp:4975
              → ILStubCache::GetStubMethodDesc(...)            // ilstubcache.cpp:486
                // SF_IsSharedStub is FALSE → cache lookup skipped
                // → CreateNewMethodDesc (new stub every time)
      → JitILStub(*ppStubMD)                                  // dllimport.cpp:5660 — JITs the new (unique) stub
  // returns distinct PCODE per thread
  → _ASSERTE(... pCode == GetNativeCode())                  // prestub.cpp:2341 — FAILS
```

### Questions

1. Should P/Invoke ILStub creation re-enable cache lookup/insert (i.e., revert the `IsSharedStubScenario` change for the cache path, even if the stubs are no longer "shared" in the old sense)?
2. Or should `DoPrestub` handle `pCode != GetNativeCode()` for P/Invokes by discarding the duplicate and using the already-published value?
3. Or should synchronization be added in `PInvoke::GetStubForILStub` so only one thread creates the stub per `MethodDesc`?

### Repro Notes

1. Add `ClrSleepEx(100, FALSE)` after `GetStubForInteropMethod` at [prestub.cpp#L2284](https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/prestub.cpp#L2284) to make repro more consistent
2. Build runtime on a branch with https://github.com/dotnet/runtime/pull/124400
3. Run System.Globalization.Tests with `dotnet build src\libraries\System.Runtime\tests\System.Globalization.Tests\System.Globalization.Tests.csproj /p:TestReadyToRun=true /t:test`
4. Failure looks like the following:
```
Assert failure(PID 46220 [0x0000b48c], Thread: 88832 [0x15b00]): pCode == (PCODE)NULL || GetNativeCode() == (PCODE)NULL || pCode == GetNativeCode()
      File: D:\runtime-main\src\coreclr\vm\prestub.cpp:2341
      Image: D:\runtime-main\artifacts\bin\System.Globalization.Tests\Debug\net11.0\publish\System.Globalization.Tests.exe
```

Note, this only reproduces with ReadyToRun enabled. When running with ReadyToRun, HasPrecode is false, so GetNativeCode() returns the code pointed to by the method. However, without ReadyToRun, HasPrecode returns true for the PInvoke that causes failures. Then GetNativeCode() short circuits to return NULL, and the Assert doesn't fire. I'm still not sure where or why CoreCLR sets HasPrecode while loading the method for a fixup doesn't, but this seems to be why the failure doesn't repro without R2R. Applying the following diff (with the delay) demonstrates that without R2R, we are still jitting the stub multiple times and have multiple entrypoints that we try to set the entry point to - though it does look like the final set is atomic. We probably don't _need_ a backport for the fix.

<details>

```diff
--- a/src/coreclr/vm/method.cpp
+++ b/src/coreclr/vm/method.cpp
@@ -3254,7 +3254,16 @@ void MethodDesc::SetCodeEntryPoint(PCODE entryPoint)
         // Use this path if there already exists a Precode, OR if RequiresStableEntryPoint is set.
         //
         // RequiresStableEntryPoint currently requires that the entrypoint must be a Precode
-        GetOrCreatePrecode()->SetTargetInterlocked(entryPoint);
+        Precode* pPrecode = GetOrCreatePrecode();
+        BOOL wonSetTargetRace = pPrecode->SetTargetInterlocked(entryPoint);
+#ifdef _DEBUG
+        if (!wonSetTargetRace && IsPInvoke())
+        {
+            // For P/Invoke, losing the precode publication race should still mean that all racing
+            // threads computed the same target entry point.
+            _ASSERTE(pPrecode->GetTarget() == entryPoint);
+        }
+#endif // _DEBUG
         return;
     }
```

</details>

Summary generated with AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race in PInvoke stub generation #124530

Summary

Failing Assert

Why different stubs are produced

Root Cause

Relevant Call Path

Questions

Repro Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race in PInvoke stub generation #124530

Description

Summary

Failing Assert

Why different stubs are produced

Root Cause

Relevant Call Path

Questions

Repro Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions