[Wasm RyuJIT] Wire up GC info encoding/decoding for Wasm#126932
[Wasm RyuJIT] Wire up GC info encoding/decoding for Wasm#126932kg wants to merge 2 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR enables GC info encoding/decoding support for the CoreCLR WASM RyuJIT backend by turning on GCInfo generation, adding the necessary encoder/decoder sources to the WASM JIT build, and introducing a WASM-specific GC info encoding definition.
Changes:
- Enable
EMIT_GENERATE_GCINFOfor WASM and wire upCodeGen::genCreateAndStoreGCInfoto useGcInfoEncoder. - Add/adjust conditional compilation to skip register-GC-tracking paths on WASM while still enabling GC info generation.
- Update CMake wiring to link WASM standalone JITs against the appropriate
gcinfo_*library and add a universal wasm gcinfo target.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/targetwasm.h | Enables GC info generation for WASM (but comment needs updating). |
| src/coreclr/jit/gcinfo.cpp | Disables register pointer marking path for WASM. |
| src/coreclr/jit/gcencode.cpp | Disables register-state-change recording for WASM. |
| src/coreclr/jit/emit.cpp | Disables register-GC live update emission for WASM. |
| src/coreclr/jit/codegenwasm.cpp | Implements GCInfo creation/emission via GcInfoEncoder for WASM. |
| src/coreclr/jit/codegenlinear.cpp | Skips GC register update/debug validation paths for WASM. |
| src/coreclr/jit/codegencommon.cpp | Skips register-based GC liveness updates/validation for WASM. |
| src/coreclr/jit/CMakeLists.txt | Ensures standalone JIT links a gcinfo lib and builds gcencode/gcdecode for WASM JIT. |
| src/coreclr/inc/gcinfotypes.h | Adds Wasm32GcInfoEncoding and sets it as the WASM target encoding. |
| src/coreclr/gcinfo/CMakeLists.txt | Adds gcinfo_universal_wasm target (currently under the wrong build condition for the failing scenario). |
|
Latest problem: with We call genEmitUnwindDebugGCandEH multiple times on some methods, and the second call attempts to overwrite existing debug information and (rightly, as far as I can tell) fails. I can't figure out why we're doing this, maybe something to do with funclets? |
This was intentional behavior - we were failing the compilation of a method as R2R unsupported late enough in compilation that debug info had already been generated, then hit an assertion the second time we tried to generate debug info (on the retried compilation). I think the asserts are just wrong in this case. Now failing with block(s) missing a bbEmitCookie in particular methods, i.e.: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/coreclr/jit/codegencommon.cpp:1081
- This TARGET_WASM guard skips all handling of locals going live in registers. If GC-tracked locals can be assigned to Wasm locals (lvIsInReg()==true), they will bypass both the register path and the stack path here, so their liveness may never be reflected in GC info. Either ensure GC refs are never enregistered for Wasm, or add a Wasm-specific strategy to report/enforce where GC refs live at safe points (stack home or encoded register slots).
if (varDsc->lvIsInReg())
{
#ifndef TARGET_WASM
// If this variable is going live in a register, it is no longer live on the stack,
// unless it is an EH/"spill at single-def" var, which always remains live on the stack.
if (!varDsc->IsAlwaysAliveInMemory())
{
| #endif // !TARGET_WASM | ||
| // Update the gcVarPtrSetCur if it is in memory. | ||
| if (isInMemory && (isGCRef || isByRef)) | ||
| { | ||
| VarSetOps::RemoveElemD(this, codeGen->gcInfo.gcVarPtrSetCur, deadVarIndex); |
There was a problem hiding this comment.
On TARGET_WASM the register-handling block above is compiled out, but the logic below only updates gcVarPtrSetCur when isInMemory is true. Since the Wasm reg allocator can enregister TYP_REF/TYP_BYREF locals (see src/coreclr/jit/regalloc.cpp:434-446), isInMemory can be false for GC-tracked locals and they won't be removed from (or added to) gcVarPtrSetCur correctly, producing incorrect stack-root reporting in the encoded GC info. Consider either (1) preventing GC-tracked locals from being enregistered on Wasm, or (2) keeping a real stack home for GC refs and ensuring gcVarPtrSetCur reflects it, or (3) implementing register-slot tracking/encoding for Wasm instead of compiling it out.
There was a problem hiding this comment.
This sounds like it could be a real problem, thoughts @AndyAyersMS @SingleAccretion ? I think out of its proposed options, 3 doesn't make sense and 1 would make our performance terrible. I think the plan was that we'd spill any enregistered gc refs to memory before safe points, yeah? Which is roughly 2. Wondering if this indicates a change I need to make.
There was a problem hiding this comment.
I don't think we should be enabling this liveness tracking (EMIT_GENERATE_GCINFO) infrastructure back in any case. It's useless - all the GC locals should be described by untracked (thus live for the entire method) slots anyhow.
As for the spilling and such, yes, we'll need to spill. For now you can disable enregistration of locals that have GC pointers to make it correct (not fully correct though, since SDSDu need to be handled too, but that can be added separately).
There was a problem hiding this comment.
Yeah I agree, we aren't going find a use for any of the emitter-driven info for GC tracking.
For an initial PR I think a good goal would be to just enable emission of the GC Info data to the host, without any GC slot reporting of any kind. This should be sufficient for conservative scanning and provide other information the runtime needs for various purposes.
It might also unearth some interesting questions that we'll need to come to terms with sooner or later.
There was a problem hiding this comment.
OK, I'll switch emit_generate_gcinfo back to 0, thank you both!
There was a problem hiding this comment.
The problem is that with it set to 0, various parts of gcencode etc don't build. I don't think this is a configuration we actually support.
| // FIXME: This can erroneously trigger when a method fails compilation late and we | ||
| // successfully retry compilation after the failure. | ||
| // Debug.Assert(_debugVarInfos == null); | ||
|
|
||
| if (cVars == 0) | ||
| { | ||
| _debugVarInfos = null; | ||
| return; | ||
| } |
There was a problem hiding this comment.
Commenting out these asserts makes it harder to detect real contract violations (setVars/setBoundaries called more than once per compilation). Since CompileMethodInternal can request CompilationRetryRequested without running CompileMethodCleanup (see src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs:322-433), a better fix is to explicitly reset per-attempt state (e.g., _debugVarInfos/_debugLocInfos) when a retry is requested or at the start of each compilation attempt, and keep the asserts enabled.
| ) | ||
|
|
||
| set( JIT_WASM_SOURCES | ||
| gcdecode.cpp |
There was a problem hiding this comment.
The more proper way to add these files is by moving them from JIT_NATIVE_TARGET_SOURCES to JIT_SOURCES.
Checkpoint Checkpoint Comment out some overly sensitive asserts that can be hit when things are working as designed Funclet start blocks need labels on Wasm for some reason. Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Cleanup jit-format Compile the gc info decoder too, since we'll need it Add comment
This set of changes is sufficient to R2R release S.P.CoreLib like before but with gc info encoding wired up and working. Still figuring out how to clean some of it up, feedback welcome.