[Experiment] allocate static readonly objects on the frozen heap#127693
[Experiment] allocate static readonly objects on the frozen heap#127693EgorBo wants to merge 4 commits intodotnet:mainfrom
Conversation
Add a new JIT phase, fgPromoteCctorAllocsToFrozenHeap, that runs after PHASE_OPTIMIZE_INDEX_CHECKS in class constructors and rewrites allocator helper calls whose result is stored to a static readonly field to use the *_MAYBEFROZEN allocator helpers. This lets the runtime place the object on the frozen heap (when possible), which in turn lets later JIT compilations bake the frozen-object pointer directly into call sites that read the field. Also force cctors to FullOpts (gated by JitOptimizeCctors, default 1) so the new phase can use SSA/VN. R2R and JIT-only paths are both supported. The old newarr;stsfld;ret peephole in the importer is removed -- the new phase subsumes it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new CoreCLR JIT optimization for class constructors (.cctor) that detects allocations stored into static readonly fields and promotes those allocation helper calls to *_MAYBEFROZEN variants so the VM can allocate on the frozen object heap (FOH) when allowed. This enables later JIT compilations to embed direct frozen object pointers at use sites, reducing static field load indirections.
Changes:
- Add a new JIT phase (
PHASE_PROMOTE_CCTOR_ALLOCS) that pattern-matchesstsfld-style stores in.cctorand rewrites eligible allocation helpers to*_MAYBEFROZEN. - Ensure
.cctormethods can be compiled with FullOpts (viaJitOptimizeCctors) when frozen allocation is allowed, so SSA/VN-based analysis is available. - Remove the previous importer-special-casing for
newarr; stsfld; retand unify promotion under the new phase (including R2R scenarios via call rebuild +fgMorphArgs).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/promotefrozenstaticalloc.cpp | New phase implementation: candidate detection (SSA-aware) and helper rewrite/rebuild logic. |
| src/coreclr/jit/objectalloc.cpp | Preserve allocation class handle on helper calls to support later promotion. |
| src/coreclr/jit/jitconfigvalues.h | Add JitOptimizeCctors config to allow FullOpts compilation for eligible cctors. |
| src/coreclr/jit/importer.cpp | Track static readonly fields written by cctors; remove prior narrow newarr/stsfld/ret frozen emission. |
| src/coreclr/jit/compphases.h | Register new phase name. |
| src/coreclr/jit/compiler.h | Declare new phase and add m_cctorFinalStaticFields tracking set. |
| src/coreclr/jit/compiler.cpp | Invoke the new phase after range-check optimization and enable Tier0→optimized switching for cctors (config-gated). |
| src/coreclr/jit/CMakeLists.txt | Add new source file to the JIT build. |
Address PR feedback: GetStaticReadonlyFieldFromStoreInd documented the 'cctor's own class' restriction but didn't enforce it. Verifiable IL forbids cross-class stsfld of initonly fields, but unverifiable IL can do it; promoting such an allocation could orphan a frozen object if another writer overwrites the field later. Add the getFieldClass == compClassHnd check at the importer's registration site. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
This needs to account for the fact that cctors run multiple times on exception and can be called unlimited amount of times from IL and reflection. The latter would be fine to forbid in an ECMA addendum I believe but the former needs to be accounted for cc @jkotas. |
it's not something we need to do for this PR if it lands (note: it's just a draft-experiment) as we already do this for very limitted cases. |
Capture the parent store node (GT_STOREIND or GT_STORE_LCL_VAR) and the containing statement at registration time so promotion can directly mutate the parent's Data() slot (which returns GenTree*&) and re-thread the statement, without a tree-walk lookup. For SSA-traced cases the def lives in a different statement than the stsfld; build a small defNode -> stmt side map during pass 1 (which already visits every statement) so the SSA def's statement is recoverable in O(1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| JITDUMP("cctor on shared generic instance -- bailing out\n"); | ||
| return PhaseStatus::MODIFIED_NOTHING; | ||
| } | ||
|
|
| LclSsaVarDsc* ssaDef = lclDsc->GetPerSsaData(ssaNum); | ||
| if (ssaDef == nullptr) | ||
| { | ||
| return result; | ||
| } | ||
|
|
||
| GenTreeLclVarCommon* defNode = ssaDef->GetDefNode(); | ||
| if ((defNode == nullptr) || !defNode->OperIs(GT_STORE_LCL_VAR)) | ||
| { | ||
| return result; | ||
| } | ||
|
|
||
| GenTree* defValue = defNode->Data(); | ||
| if (defValue == nullptr) | ||
| { | ||
| return result; | ||
| } | ||
|
|
||
| defValue = defValue->gtEffectiveVal(); | ||
| if (!defValue->IsCall()) | ||
| { | ||
| return result; | ||
| } | ||
|
|
||
| Statement* defStmt = nullptr; | ||
| if (!defStmtMap->Lookup(defNode, &defStmt)) | ||
| { | ||
| return result; | ||
| } | ||
|
|
||
| result.call = defValue->AsCall(); | ||
| result.parent = defNode; | ||
| result.stmt = defStmt; | ||
| return result; | ||
| } |
| // Force class constructors (.cctor) to be JITed at the FullOpts optimization level so that | ||
| // the frozen-heap-promotion phase (which relies on SSA / VN) can run against them. Has no | ||
| // effect when the VM has not set CORJIT_FLAG_FROZEN_ALLOC_ALLOWED (e.g. collectible loader | ||
| // contexts). | ||
| RELEASE_CONFIG_INTEGER(JitOptimizeCctors, "JitOptimizeCctors", 1) // enabled for CI testing |
…t via VN for JIT Per PR feedback: compileTimeHelperArgumentHandle is only needed for R2R alloc helpers (READYTORUN_NEW, READYTORUN_NEWARR_1) where the type handle lives in the R2R indirection cell rather than a user arg. For JIT helpers (NEWFAST*, NEWARR_1_*) the type handle is always arg 0, so we can recover it from the arg's VN -- robust against CSE/copy-prop replacing the embedded icon. Changes: * objectalloc.cpp: only populate compileTimeHelperArgumentHandle for CORINFO_HELP_READYTORUN_NEW (instead of every alloc helper). Field is preserved through gtCloneExprCallHelper via the union, and Equals intentionally doesn't compare it (annotation, not semantics). * New TryGetAllocClsHnd helper: R2R helpers -> use the field; JIT helpers -> extract from arg 0 via vnStore->IsVNTypeHandle / CoercedConstantValue. * Pass 1 uses TryGetAllocClsHnd for candidate validation. * Pass 2 only calls classMustBeLoadedBeforeCodeIsRun + extracts clsHnd in the R2R rebuild branch (the JIT in-place mutation doesn't need it). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
They don't... I would be more worried about "Force class constructors (.cctor) to be JITed at the FullOpts optimization level". You have to save a lot of compensate for the cost of compiling cctors with full opts. |
I do not plan to enable that by default, it will rely on R2R, so if you want your static ctor to produce frozen objects - prejit it. For better coverage, I introduced a knob to enable them for normal jitting as well. |
This PR adds a new JIT phase that promotes object/array allocations stored to
static readonlyfields in class constructors to the frozen object heap (presumably, only useful for CoreCLR as NAOT does a good job preinitializing cctors as is). When astatic readonlyreference field holds a frozen object, later JIT compilations can bake the frozen pointer directly into call sites instead of going through an indirect load.When allocators get promoted to frozen-heap variants:
.cctor) and the VM must allow frozen allocation (CORJIT_FLAG_FROZEN_ALLOC_ALLOWED).static readonlyfield of the cctor's own class.GT_BOX/ commas) be either a direct allocator helper call, or a single-SSA-defLCL_VARwhose SSA def isSTORE_LCL_VAR(allocator-helper-call).Open question: is it fine to do:
Cctors are normally Tier-0 only, so a small companion change (gated by
JitOptimizeCctors, default 1, "enabled for CI testing") flips them to FullOpts when frozen allocation is allowed, so SSA/VN are available for the analysis.Repro:
Codegen diff for
Test:Each field load drops from 2 instructions (load address + indirect load) to 1 (direct frozen-pointer immediate). Same shape applies in R2R (verified via
crossgen2 --inputbubble).