Add support for type preinitialization in crossgen2#124568
Add support for type preinitialization in crossgen2#124568hez2010 wants to merge 10 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR brings NativeAOT-style type preinitialization to ReadyToRun (crossgen2), enabling eligible .cctor bodies to be interpreted at compile time and the resulting static state to be serialized into the R2R image. At runtime, validated preinitialized types can skip normal class-init and materialize precomputed non-GC/GC static state directly.
Changes:
- Introduces a new R2R section (
TypePreinitializationMap, format v18.2) and corresponding compiler emission (map + serialized static payload/object templates). - Adds runtime support to detect preinitialized types, fetch serialized static payloads, and materialize GC graphs during static allocation.
- Adds crossgen2 switches to enable/disable preinit statics and adds r2rdump support to display the new section.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/readytoruninfo.h | Declares ReadyToRun_TypePreinitializationMap for runtime map queries. |
| src/coreclr/vm/readytoruninfo.cpp | Loads the new section and implements map lookup helpers. |
| src/coreclr/vm/methodtable.h | Adds MethodTable helpers for applying/materializing preinitialized static data. |
| src/coreclr/vm/methodtable.cpp | Implements copying/fixups for non-GC statics and GC graph materialization from templates. |
| src/coreclr/vm/ceeload.h | Adds Module APIs for preinit queries and payload retrieval. |
| src/coreclr/vm/ceeload.cpp | Implements runtime payload resolution and bounds validation for preinit static data. |
| src/coreclr/tools/r2rdump/TextDumper.cs | Adds dumping/diagnostics for the TypePreinitializationMap section. |
| src/coreclr/tools/aot/crossgen2/Properties/Resources.resx | Adds help text for new crossgen2 switches. |
| src/coreclr/tools/aot/crossgen2/Program.cs | Wires up preinitialization manager based on optimization/flags and logs stats. |
| src/coreclr/tools/aot/crossgen2/Crossgen2RootCommand.cs | Adds --preinitstatics / --nopreinitstatics switches. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs | Skips compiling .cctor for types proven preinitialized; treats them as pre-inited. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/ILCompiler.ReadyToRun.csproj | Includes new compiler nodes and preinit-related sources. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilationBuilder.cs | Plumbs PreinitializationManager into the R2R compilation pipeline. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs | Passes preinit manager through component rewrite path. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/Preinitialization/ReadyToRunPreinitializationManager.cs | New: caches per-type preinit records and manages imports/object templates. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/Preinitialization/PreinitializationExtensions.cs | New: R2R-specific symbol/dependency hooks for serialized templates and delegate targets. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/Preinitialization/DelegateCreationInfo.ReadyToRun.cs | New: R2R-specific delegate creation/target node resolution for serialization. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRunSymbolNodeFactory.cs | Triggers preinit record materialization when emitting static base helpers. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRunCodegenNodeFactory.cs | Adds preinit import section and emits the new header table section. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/TypePreinitializedStaticsDataNode.cs | New: emits per-type serialized non-GC + GC static payload and boxed-static templates. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/TypePreinitializationMapNode.cs | New: emits the new map section (typedef + instantiation table + signature blob). |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/SerializedPreinitializationObjectDataNode.cs | New: emits serialized object templates with reloc-driven dependencies. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/MethodFixupSignature.cs | Adjusts method entry fixup optimization eligibility (accounts for unboxing). |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/Dataflow/FlowAnnotations.cs | New: R2R-local FlowAnnotations stub for preinit plumbing under READYTORUN. |
| src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/TypePreinit.cs | Adds READYTORUN-specific delegate/object serialization layout and post-scan failure downgrade. |
| src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/PreinitializationManager.cs | Adds READYTORUN FlowAnnotations aliasing. |
| src/coreclr/tools/Common/Internal/Runtime/ReadyToRunConstants.cs | Adds ReadyToRunTypePreinitializationFlags. |
| src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs | Bumps R2R minor version and adds new section enum value. |
| src/coreclr/tools/Common/Compiler/ObjectWriter/ObjectWriter.cs | Tweaks progress reporting behavior. |
| src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h | Bumps R2R minor version to match tools/runtime. |
| src/coreclr/jit/helperexpansion.cpp | Loosens assert to allow additional import address types in static init expansion. |
| src/coreclr/inc/readytorun.h | Bumps R2R minor version and defines the new section + map entry structs. |
| docs/design/features/readytorun-preinitialized-statics.md | New design doc describing format, enabling, and runtime behavior. |
...s/aot/ILCompiler.ReadyToRun/Compiler/Preinitialization/ReadyToRunPreinitializationManager.cs
Show resolved
Hide resolved
|
|
||
| using System; | ||
| using System.Collections.Generic; | ||
| using System.Drawing; |
There was a problem hiding this comment.
using System.Drawing; appears unused in this file. With TreatWarningsAsErrors enabled in the repo, the unnecessary using will fail the build (CS8019). Remove it unless you’re about to use System.Drawing types here.
| using System.Drawing; |
| uint computedGCStart = nonGCRva + alignedNonGCSize; | ||
| uint nextPayloadStart = 0; | ||
| for (int i = 0; i < sortedPayloadStarts.Count; i++) | ||
| { | ||
| uint payloadStart = sortedPayloadStarts[i]; | ||
| if (payloadStart > nonGCRva) | ||
| { | ||
| nextPayloadStart = payloadStart; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| uint size = nextPayloadStart == 0 ? (uint)_r2r.TargetPointerSize : nextPayloadStart - computedGCStart; | ||
| if ((size % pointerSize) != 0) | ||
| { | ||
| reason = $"size {size} is not pointer-size aligned"; | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
TryComputeGCDataRange picks nextPayloadStart based on payloadStart > nonGCRva, but the GC payload starts at computedGCStart (nonGCRva + alignedNonGCSize). If the next payload start is between nonGCRva and computedGCStart (possible in malformed/overlapping images), nextPayloadStart - computedGCStart will underflow and produce a huge gcSize, potentially causing r2rdump to read out of bounds. Consider selecting the first payloadStart strictly greater than computedGCStart and validating nextPayloadStart >= computedGCStart before subtracting.
| PTR_TADDR pPointerField = dac_cast<PTR_TADDR>(pFieldAddress); | ||
| *pPointerField = ResolveReadyToRunEncodedPointer(*pPointerField); |
There was a problem hiding this comment.
Pointer-like fields in value types can be unaligned (e.g., explicit layout / packing). This code reads and writes them via PTR_TADDR dereference, which can fault on architectures that require alignment. Use unaligned access helpers (e.g., GET_UNALIGNED_PTR/SET_UNALIGNED_PTR or equivalent) when loading/storing pointer-sized values from raw value-type data.
| PTR_TADDR pPointerField = dac_cast<PTR_TADDR>(pFieldAddress); | |
| *pPointerField = ResolveReadyToRunEncodedPointer(*pPointerField); | |
| UNALIGNED TADDR* pPointerField = reinterpret_cast<UNALIGNED TADDR*>(pFieldAddress); | |
| TADDR pointerValue = *pPointerField; | |
| pointerValue = ResolveReadyToRunEncodedPointer(pointerValue); | |
| *pPointerField = pointerValue; |
|
|
||
| if (pField->IsObjRef()) | ||
| { | ||
| TADDR encodedFieldReference = VolatileLoadWithoutBarrier(dac_cast<PTR_SIZE_T>(pTemplateFieldAddress)); |
There was a problem hiding this comment.
This reads an encoded object reference from pTemplateFieldAddress via dac_cast<PTR_SIZE_T> and dereference. For value types with packing/explicit layout, reference fields may be unaligned, so this should use an unaligned read helper (e.g., GET_UNALIGNED_PTR) to avoid alignment faults on some architectures.
| TADDR encodedFieldReference = VolatileLoadWithoutBarrier(dac_cast<PTR_SIZE_T>(pTemplateFieldAddress)); | |
| TADDR encodedFieldReference = VolatileLoadWithoutBarrier(GET_UNALIGNED_PTR(TADDR, pTemplateFieldAddress)); |
| PTR_TADDR pPointerFieldAddress = dac_cast<PTR_TADDR>(pFieldAddress); | ||
| *pPointerFieldAddress = ResolveReadyToRunEncodedPointer(*pPointerFieldAddress); |
There was a problem hiding this comment.
Non-GC static data fixup writes pointer-like statics via PTR_TADDR dereference. If the static field layout includes unaligned pointer-sized fields (possible with explicit layout/pack), this can cause unaligned stores. Consider using unaligned load/store helpers for pointer-sized fields here as well.
| PTR_TADDR pPointerFieldAddress = dac_cast<PTR_TADDR>(pFieldAddress); | |
| *pPointerFieldAddress = ResolveReadyToRunEncodedPointer(*pPointerFieldAddress); | |
| TADDR pointerFieldValue; | |
| memcpy(&pointerFieldValue, pFieldAddress, sizeof(TADDR)); | |
| pointerFieldValue = ResolveReadyToRunEncodedPointer(pointerFieldValue); | |
| memcpy(pFieldAddress, &pointerFieldValue, sizeof(TADDR)); |
This sounds too good to be true. Can you share more detailed breakdown where this startup time reduction actually came from? |
...piler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/TypePreinitializedStaticsDataNode.cs
Outdated
Show resolved
Hide resolved
| private bool TargetMethodIsUnboxingThunk | ||
| => PossiblyUnresolvedTargetMethod.OwningType.IsValueType && !PossiblyUnresolvedTargetMethod.Signature.IsStatic; |
There was a problem hiding this comment.
This seems like a heuristic. Can we get a comment here about why this is always true or do a type check against the unboxing method desc type(s)?
There was a problem hiding this comment.
This is mirrored from ilc:
| private void RootStaticsDataNode(TypePreinitializationRecord record) | ||
| { | ||
| if (record?.StaticsDataNode != null) | ||
| { | ||
| _factory.AddCompilationRoot(record.StaticsDataNode, "ReadyToRun preinitialized statics data"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Instead of rooting the statics data node as a compilation root here, can we get it added as a dependency of the class constructor's method node?
For cases where we'd skip compilation in our various root providers, we can root the statics data there as well.
Alternatively, you can make this type an ICompilationRootProvider (and extend IRootingServiceProvider with a void AddCompilationRoot(object o, string reason) method like ILC has).
There was a problem hiding this comment.
I don't see an obvious way to refactor this. Either seems to complicate things because instantiated preinit types can be discovered lazily after the dependency analysis done, that's why we need to manually root those static data nodes (for instantiations).
| internal void AddCompilationRoot(DependencyNodeCore<NodeFactory> node, string reason) | ||
| { | ||
| _graph?.AddRoot(node, reason); | ||
| } | ||
|
|
There was a problem hiding this comment.
Let's look at ways to avoid adding this method (and storing the graph in the node factory as a field).
src/coreclr/tools/aot/ILCompiler.ReadyToRun/ILCompiler.ReadyToRun.csproj
Outdated
Show resolved
Hide resolved
Meh... I have to awkwardly admit that I just realized that I was using corerun to run the baseline without regenerating the image, so actually prejit code was actually skipped due to r2r version mismatch. Update the new numbers: Avalonia:
ASP.NET Core Blazor SSR:
Sorry for the hype... The above numbers are from i7-13700K. I think there will be more diff on low-end devices and apps that do heavy things in cctor like cswinrt 3.0, but I don't have the environment to test those cases. |
This is still on the high side of what I would expect from this change. How accurate are these numbers? Can you share more detailed breakdown of where the improvement came from? |
As for Avalonia, I record a DateTime at the entrypoint From the crossgen2 verbose log I see 4482 out of 7957 types were preinitialized. Maybe I need to switch to a Linux environment for more accurate measurement? Do we have any performance test suite for startup time? |
|
Also, there're still some room need to explore. For example, the preinitialized objects can be allocated on FOH, also the JIT can take advantage of it to skip classIsInit check etc. |
|
I have doubts that the pre-initialization would be a material win for R2R in its current state. It is a tradeoff between the cost of pre-initialized data lookup and application vs. cost of. finding and running the static constructor code (+ saving some static constructor probes). These costs are very small compared to the total R2R overhead (loading all types, etc.). I think that the best way to optimize R2R is by fixing bugs and local inefficiencies (#124519 is a good example). My guess is that we have quite a few of those. For larger new features, focus on the most expensive things (likely type loading). |
|
I think this may be an interesting win for our mobile platforms but we definitely need numbers for those platforms before taking the change. |
Why would it be more interesting for mobile platforms? Overall R2R perf characteristics on mobile platforms should not be significantly different. |
|
We're still lagging behind Mono's startup performance as of the last measurements I saw and this would be an optimization that doesn't exist in Mono's AOT story. Additionally, I expect that the interop tooling and MAUI code will be more similar to the Avalonia case and provide a more consistent win across the experiences our customers care about for those platforms than platforms that have more server-focused experiences. Once again, this depends on actual numbers. |
|
If we have things like extensive vtbl population inside the cctor (eg. a bunch of code that takes function pointers and save them into a struct) like what CsWinRT 3.0 does, I think preinitializing them would be a great improvement. |
I would like to understand where the saving is actually coming from in the Avalonia case. I suspect that the saving is caused by a second order effect and we can achieve the same saving by bug-fix level changes, without adding a new complicated structure to R2R format.
The problem is that R2R does type loading at runtime, and so these function pointers cost a very non-trivial number of instructions to hydrate at runtime. You are saving close to nothing by avoiding the few instructions that deal with writing the function pointer into the vtbl. |
Co-authored-by: Jeremy Koritzinsky <jkoritzinsky@gmail.com>
FWIW, the last numbers that I have seen show that CoreCLR is better than Mono (for real apps, on iOS). |
| PCCOR_SIGNATURE pSignature = reinterpret_cast<PCCOR_SIGNATURE>(pSignatureBlob + pCurrentEntry->TypeSignatureOffset); | ||
| SigPointer sp(pSignature, pCurrentEntry->TypeSignatureLength); | ||
| if (FAILED(sp.SkipExactlyOne()) || (sp.GetPtr() != (pSignature + pCurrentEntry->TypeSignatureLength))) | ||
| continue; | ||
|
|
||
| if (!ZapSig::CompareSignatureToTypeHandle(pSignature, pModule, TypeHandle(pMT), &zapSigContext)) | ||
| continue; |
There was a problem hiding this comment.
The instantiation entry lookup uses TypeSignatureOffset/TypeSignatureLength to compute pSignature = pSignatureBlob + offset without validating that the computed range stays within the ReadyToRun image/section. A malformed/corrupted R2R image could cause out-of-bounds reads during signature parsing/comparison. Add bounds checks (including overflow-safe offset+length validation) before constructing SigPointer/calling CompareSignatureToTypeHandle.
| <data name="PreinitStaticsOption" xml:space="preserve"> | ||
| <value>Interpret static constructors at compile time if possible (implied by -O)</value> | ||
| </data> | ||
| <data name="NoPreinitStaticsOption" xml:space="preserve"> | ||
| <value>Do not interpret static constructors at compile time</value> | ||
| </data> |
There was a problem hiding this comment.
Help text says preinit statics is "implied by -O", but the implementation enables it for any non-None optimization mode (including --Os/--Ot, and potentially default optimization selection). Update the option description to reflect the actual enabling conditions so the CLI help is accurate.
| if (!GetModule()->IsReadyToRunTypePreinitialized(this)) | ||
| return false; | ||
|
|
||
| PTR_BYTE pPreinitializedData = NULL; | ||
| DWORD cbPreinitializedData = 0; | ||
|
|
||
| DWORD cbExpectedNonGCStatics = GetClass()->GetNonGCRegularStaticFieldBytes(); | ||
| if (cbExpectedNonGCStatics != 0 && | ||
| (!GetModule()->TryGetReadyToRunPreinitializedNonGCStaticsData(this, &pPreinitializedData, &cbPreinitializedData) || | ||
| (cbPreinitializedData != cbExpectedNonGCStatics))) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| DWORD cExpectedGCStatics = GetClass()->GetNumHandleRegularStatics(); | ||
| if (cExpectedGCStatics != 0) | ||
| { | ||
| DWORD cbExpectedGCStatics = cExpectedGCStatics * sizeof(TADDR); | ||
| if (!GetModule()->TryGetReadyToRunPreinitializedGCStaticsData(this, &pPreinitializedData, &cbPreinitializedData) || | ||
| (cbPreinitializedData != cbExpectedGCStatics)) | ||
| { | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| return true; |
There was a problem hiding this comment.
New R2R preinitialized statics behavior changes class-init semantics (e.g., types with a .cctor can now be treated as pre-inited when the preinit map/data matches). There should be targeted ReadyToRun tests exercising: (1) preinitialized non-GC statics, (2) GC statics materialization (including object graphs), and (3) generic instantiation entries. Consider adding coverage under src/tests/readytorun/ to validate correctness and guard against regressions.
|
|
||
| using System; | ||
| using System.Collections.Generic; | ||
| using System.Drawing; |
There was a problem hiding this comment.
Remove the unused using directive 'using System.Drawing;'. This namespace is not used anywhere in the file and adds an unnecessary dependency.
| using System.Drawing; |
| SIZE_T componentSize = pObjectType->RawGetComponentSize(); | ||
| TypeHandle elementTypeHandle = pObjectType->GetArrayElementTypeHandle(); |
There was a problem hiding this comment.
Redundant variable declaration: 'componentSize' is already computed at line 4237 and then redeclared at line 4273. Remove the redundant declaration and reuse the existing variable.
| SIZE_T componentSize = pObjectType->RawGetComponentSize(); | |
| TypeHandle elementTypeHandle = pObjectType->GetArrayElementTypeHandle(); |
| DWORD cbGCStatics = pMT->GetClass()->GetNumHandleRegularStatics() * sizeof(TADDR); | ||
| if (cbGCStatics == 0) | ||
| return false; | ||
|
|
There was a problem hiding this comment.
Potential integer overflow: The multiplication pMT->GetClass()->GetNumHandleRegularStatics() * sizeof(TADDR) could overflow if GetNumHandleRegularStatics() returns a very large value. Consider using checked arithmetic or validating the result fits within a DWORD before the multiplication.
| DWORD cbGCStatics = pMT->GetClass()->GetNumHandleRegularStatics() * sizeof(TADDR); | |
| if (cbGCStatics == 0) | |
| return false; | |
| SIZE_T numHandleStatics = pMT->GetClass()->GetNumHandleRegularStatics(); | |
| if (numHandleStatics == 0) | |
| return false; | |
| if (numHandleStatics > (MAXDWORD / sizeof(TADDR))) | |
| return false; | |
| DWORD cbGCStatics = static_cast<DWORD>(numHandleStatics * sizeof(TADDR)); |
| TADDR gcDataAddress = dataAddress + alignedNonGCDataSize; | ||
| ReadyToRunLoadedImage *pImage = GetReadyToRunImage(); | ||
| TADDR imageBase = pImage->GetBase(); | ||
| uint32_t imageSize = pImage->GetVirtualSize(); | ||
|
|
||
| TADDR imageEnd = imageBase + imageSize; |
There was a problem hiding this comment.
Potential integer overflow: The addition dataAddress + alignedNonGCDataSize could overflow if both values are large. Consider checking for overflow before the addition or using saturating arithmetic to prevent wraparound.
| TADDR gcDataAddress = dataAddress + alignedNonGCDataSize; | |
| ReadyToRunLoadedImage *pImage = GetReadyToRunImage(); | |
| TADDR imageBase = pImage->GetBase(); | |
| uint32_t imageSize = pImage->GetVirtualSize(); | |
| TADDR imageEnd = imageBase + imageSize; | |
| ReadyToRunLoadedImage *pImage = GetReadyToRunImage(); | |
| TADDR imageBase = pImage->GetBase(); | |
| uint32_t imageSize = pImage->GetVirtualSize(); | |
| TADDR imageEnd = imageBase + imageSize; | |
| if (imageEnd < imageBase) | |
| return false; | |
| if ((dataAddress < imageBase) || (dataAddress > imageEnd)) | |
| return false; | |
| if (alignedNonGCDataSize > static_cast<DWORD>(imageEnd - dataAddress)) | |
| return false; | |
| TADDR gcDataAddress = dataAddress + alignedNonGCDataSize; |
| ReadyToRunLoadedImage *pImage = pModule->GetReadyToRunImage(); | ||
| TADDR imageBase = pImage->GetBase(); | ||
| uint32_t imageSize = pImage->GetVirtualSize(); | ||
|
|
There was a problem hiding this comment.
Potential integer overflow: The addition imageBase + imageSize could overflow if both values are near the maximum address space. Consider checking for overflow before the addition.
| if (imageBase > (TADDR)-1 - static_cast<TADDR>(imageSize)) | |
| return false; |
| if (pCurrentEntry->TypeSignatureLength == 0) | ||
| continue; | ||
|
|
||
| PCCOR_SIGNATURE pSignature = reinterpret_cast<PCCOR_SIGNATURE>(pSignatureBlob + pCurrentEntry->TypeSignatureOffset); |
There was a problem hiding this comment.
Missing bounds check: Before computing pSignature = pSignatureBlob + pCurrentEntry->TypeSignatureOffset, validate that TypeSignatureOffset and TypeSignatureLength don't exceed the bounds of the signature blob region. An out-of-bounds offset could lead to reading arbitrary memory.
| SIZE_T cbData = (SIZE_T)arrayLength * componentSize; | ||
|
|
There was a problem hiding this comment.
Potential integer overflow: The multiplication (SIZE_T)arrayLength * componentSize at line 4238 could overflow if arrayLength is very large. Although TryAllocateFrozenSzArray validates the allocation, this computation should be checked for overflow before use, as cbData is used to validate memory accesses later.
| SIZE_T cbData = (SIZE_T)arrayLength * componentSize; | |
| S_SIZE_T cbDataSafe = S_SIZE_T(arrayLength) * S_SIZE_T(componentSize); | |
| if (cbDataSafe.IsOverflow()) | |
| { | |
| COMPlusThrowOM(); | |
| } | |
| SIZE_T cbData = cbDataSafe.Value(); |
| if (!elementTypeHandle.IsTypeDesc() && !elementTypeHandle.AsMethodTable()->IsValueType()) | ||
| { | ||
| PTR_OBJECTREF pArrayRefs = dac_cast<PTR_OBJECTREF>(pArrayData); | ||
| PTR_TADDR pTemplateRefs = dac_cast<PTR_TADDR>(templateDataAddress + dataOffset); |
There was a problem hiding this comment.
Missing bounds check: Before dereferencing templateDataAddress + dataOffset, validate that this address and the subsequent data (up to templateDataAddress + dataOffset + cbData) are within the bounds of the R2R image. A corrupted or malicious image could specify offsets that point outside valid memory.
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/TypePreinit.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 34 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (2)
src/coreclr/tools/r2rdump/TextDumper.cs:6
- The
using System.Drawing;directive is not used anywhere in this file and should be removed.
using System.IO;
src/coreclr/tools/Common/Compiler/TypePreinit.cs:3274
- The code at lines 3266-3274 handles open static delegates (when
_firstParameter == null), butDelegateCreationInfo.CreatethrowsNotSupportedException("Open static delegates")(DelegateCreationInfo.ReadyToRun.cs:55) before this code can be reached. Either the check in DelegateCreationInfo should be removed to allow open static delegates, or this code path is unreachable dead code and should be removed or replaced with an assertion that throws. Additionally, line 3271 asserts thatcreationInfo.Thunk != null, but the Thunk is always set to null in DelegateCreationInfo (line 65), so this assertion will always fail if the code is reached.
| uint32_t instantiationEntryCount = *reinterpret_cast<const uint32_t*>(pAfterTypeDefEntries); | ||
| const READYTORUN_TYPE_PREINITIALIZATION_MAP_INSTANTIATION_ENTRY *pInstantiationEntries = | ||
| reinterpret_cast<const READYTORUN_TYPE_PREINITIALIZATION_MAP_INSTANTIATION_ENTRY*>(pAfterTypeDefEntries + sizeof(uint32_t)); | ||
|
|
||
| const uint8_t *pSignatureBlob = reinterpret_cast<const uint8_t*>(pInstantiationEntries + instantiationEntryCount); | ||
|
|
||
| if (instantiationStartIndex > instantiationEntryCount) | ||
| return false; | ||
|
|
||
| if (instantiationCountForType > (instantiationEntryCount - instantiationStartIndex)) | ||
| return false; | ||
|
|
||
| ZapSig::Context zapSigContext(pModule, pModule); | ||
| for (uint32_t index = 0; index < instantiationCountForType; index++) | ||
| { | ||
| const READYTORUN_TYPE_PREINITIALIZATION_MAP_INSTANTIATION_ENTRY *pCurrentEntry = &pInstantiationEntries[instantiationStartIndex + index]; | ||
|
|
||
| if (pCurrentEntry->TypeSignatureLength == 0) | ||
| continue; | ||
|
|
||
| PCCOR_SIGNATURE pSignature = reinterpret_cast<PCCOR_SIGNATURE>(pSignatureBlob + pCurrentEntry->TypeSignatureOffset); | ||
| SigPointer sp(pSignature, pCurrentEntry->TypeSignatureLength); | ||
| if (FAILED(sp.SkipExactlyOne()) || (sp.GetPtr() != (pSignature + pCurrentEntry->TypeSignatureLength))) | ||
| continue; |
There was a problem hiding this comment.
TryGetInstantiationEntry computes pSignatureBlob + TypeSignatureOffset and then parses TypeSignatureLength bytes without validating that the offset/length are within the TypePreinitializationMap section (or at least within the mapped R2R image). A malformed R2R image could cause out-of-bounds reads/AVs here. Add bounds checks for TypeSignatureOffset + TypeSignatureLength against the end of the R2R image (and similarly validate the instantiation table read at pAfterTypeDefEntries).
| if (!IsAddressInReadyToRunImage(GetModule(), templateDataAddress)) | ||
| COMPlusThrow(kInvalidProgramException); | ||
|
|
||
| TADDR encodedMethodTableAddress = VolatileLoadWithoutBarrier(dac_cast<PTR_SIZE_T>(templateDataAddress)); | ||
| BYTE methodTableFixupKind = 0; | ||
| TADDR methodTableAddress = ResolveReadyToRunEncodedPointer(encodedMethodTableAddress, &methodTableFixupKind); | ||
| if (methodTableFixupKind != READYTORUN_FIXUP_TypeHandle) | ||
| COMPlusThrow(kInvalidProgramException); | ||
|
|
||
| TypeHandle th = TypeHandle::FromTAddr(methodTableAddress); | ||
| if (th.IsNull() || th.IsTypeDesc()) | ||
| COMPlusThrow(kInvalidProgramException); | ||
|
|
||
| MethodTable *pObjectType = th.AsMethodTable(); | ||
| pObjectType->EnsureInstanceActive(); | ||
|
|
||
| if (pObjectType->IsArray()) | ||
| { | ||
| _ASSERTE(!pObjectType->IsMultiDimArray()); | ||
|
|
||
| INT32 arrayLength = GET_UNALIGNED_VAL32((PTR_BYTE)(templateDataAddress + TARGET_POINTER_SIZE)); | ||
| OBJECTREF arrayRef = TryAllocateFrozenSzArray(pObjectType, arrayLength); | ||
| if (arrayRef == NULL) | ||
| arrayRef = AllocateSzArray(pObjectType, arrayLength); | ||
|
|
There was a problem hiding this comment.
MaterializeReadyToRunPreinitializedClassData validates that templateDataAddress is within the R2R image, but it then reads variable-sized data (method table pointer, array length, field payloads) without checking that the full template range is within image bounds. A malformed image could cause out-of-bounds reads/AVs during static materialization. Consider validating the minimum required range before each read (e.g., header + array length, and then header + dataOffset + cbData for arrays; and for objects validate up to the last instance field end).
|
|
||
| TADDR imageEnd = imageBase + imageSize; | ||
| TADDR dataAddress = pImage->GetRvaData(dataRva); | ||
| if ((dataAddress < imageBase) || (dataAddress > imageEnd)) |
There was a problem hiding this comment.
In TryGetReadyToRunPreinitializedStaticsDataInfo, the bounds check allows dataAddress == imageEnd (dataAddress > imageEnd), which is not a valid start address (it is one-past-the-end). Tighten the check to reject dataAddress >= imageEnd to avoid accepting a non-dereferenceable pointer when dataSize == 0 and to match the usual half-open range conventions used elsewhere.
| if ((dataAddress < imageBase) || (dataAddress > imageEnd)) | |
| if ((dataAddress < imageBase) || (dataAddress >= imageEnd)) |
| uint size = nextPayloadStart == 0 ? (uint)_r2r.TargetPointerSize : nextPayloadStart - computedGCStart; | ||
| if ((size % pointerSize) != 0) | ||
| { | ||
| reason = $"size {size} is not pointer-size aligned"; | ||
| return false; | ||
| } | ||
|
|
||
| gcRva = computedGCStart; | ||
| gcSize = size; | ||
| return true; | ||
| } |
There was a problem hiding this comment.
TryComputeGCDataRange falls back to pointerSize when there is no next payload start (nextPayloadStart == 0), which will almost always under-report GC static payload size for the last entry and can make r2rdump output misleading. Consider printing GC data as "unknown" in that case, or compute the expected GC handle count from metadata (static fields) when metadata is available.
@jkotas FYI I experimented implementing In my test app (still Avalonia but more complicated). Before (no-preinit): The current state of PR so far (preinit, no jit constant folding): After (preinit + jit constant folding): This achieved a 8% reduction of startup time. I'm still verifying some implementation details so not committed into this PR yet. |

Type preinitialization for R2R bringing up
This PR brings type preinitialization support (already used by NativeAOT) to ReadyToRun.
With this change, eligible type static constructors can be interpreted at compile time, and the resulting static state is serialized into the R2R native format. At runtime, preinitialized types can skip normal class-init execution when validation succeeds, and materialize the preinitialized data directly without needs of running the cctor.
For easier context and details, please read the design doc:https://github.com/hez2010/runtime/blob/bcc30a083c8f7a2df504cf37d2079bec049c36de/docs/design/features/readytorun-preinitialized-statics.md
Benchmarks
I have tested several scenarios with composite R2R on self-contained apps, without trimming.
Note that I forgot to regenerate r2r images for the baseline so that the r2r code was skipped for baseline. I'll update the benchmark result with new numbers.
Avalonia app
I measured the time from startup to the window loaded. Startup time goes from 523ms to 299ms on my machine, which is a 43% startup time reduction. WithTC=0, it boosts the startup time from 927ms to 340ms, which is a 63% reduction.The size regression is about 0.8% (144mb vs 146mb).
Blazor SSR
I also measured the time from startup to the first page rendered. Without preinitializer it took 1050ms withTC=1, and 1733ms withTC=0, while with preinitializer, it reduced to 901ms and 1299ms, respectively.The size regression is about 0.9% (252mb vs 254mb).
cc: @jkoritzinsky