Separate GC WKS and SVR compilation units#126720
Separate GC WKS and SVR compilation units#126720janvorli wants to merge 5 commits intodotnet:mainfrom
Conversation
Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects. This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout. It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp. During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @agocke, @dotnet/gc |
There was a problem hiding this comment.
Pull request overview
This PR refactors CoreCLR GC build plumbing to stop using wrapper translation units (gcwks.cpp/gcsvr.cpp and gceewks.cpp/gceesvr.cpp) and instead compile the shared GC implementation sources directly into separate WKS/SVR object sets, using a new shared compilation-context header gcinternal.h.
Changes:
- Introduces
gcinternal.hand updates many GC.cppfiles to include it and wrap code inWKS/SVRnamespaces based onSERVER_GC. - Updates CoreCLR VM and standalone GC CMake build graphs to build GC sources as object libraries for WKS/SVR and consume them from coreclr/clrgc targets.
- Updates NativeAOT runtime and the GC sample project build surfaces to consume the new GC source layout.
Reviewed changes
Copilot reviewed 31 out of 33 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/wks/CMakeLists.txt | Adds GC object files into cee_wks_core build inputs. |
| src/coreclr/vm/CMakeLists.txt | Defines shared GC source list and builds vm_gc_wks/vm_gc_svr object libraries. |
| src/coreclr/dlls/mscoree/coreclr/CMakeLists.txt | Links vm_gc_wks/vm_gc_svr into coreclr and coreclr_static. |
| src/coreclr/nativeaot/Runtime/CMakeLists.txt | Replaces wrapper sources with direct GC .cpp compilation for NativeAOT. |
| src/coreclr/gc/gcinternal.h | New shared GC compilation-context header; centralizes includes and inlines. |
| src/coreclr/gc/*.cpp | Switches individual GC implementation files to include gcinternal.h and wrap in WKS/SVR namespaces. |
| src/coreclr/gc/CMakeLists.txt | Builds standalone GC (clrgc/clrgcexp) with WKS/SVR object libraries. |
| src/coreclr/gc/sample/* | Updates GC sample to compile split GC .cpp files directly. |
|
what is the motivation for this change? Does it improve build times? |
It doesn't affect build time in any way. The main reason is code editing experience. The individual files into which the gc.cpp was split in my recent change didn't include the headers for symbols they use, so when editing one of those e.g. in VS code, it was showing a lot of red squiggles and code navigation didn't work well. |
|
Build breaks.... |
|
2% size savings on Hello World on Linux, nice! Size statisticsPull request #126720
|
Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux. This type of refactoring tends to depend on good PGO data and whole program optimizations for good perf:
|
I definitely want to run the GC perf runs and understand where the size improvement comes from before we merge this. |
|
@jkotas performance and functionality tests didn't show any regressions. |
|
Have you measured it on a binary that showed the large code size reduction? (Also, there is a merge conflict that needs to be resolved.) |
It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned. |
I am not very surprised. GC perf is typically dominated by memory accesses (or, more precisely, by cache misses since access patterns could be cache-unfriendly). Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too. |
Compilers do that reasonably well if they can see the whole program. This change breaks down the GC into multiple compilations units, and the compiler won't the see the whole GC anymore (when compiling for NAOT at least). |
|
What do they run to measure? Is it possible to publish the same test w/ NAOT and run it locally? |
These are the tests we run in the ASP.NET perf lab: https://aka.ms/aspnet/nativeaot/benchmarks. TodosApi from the table above is the Stage2 app on the benchmarks page. The crank command line used to trigger the run is shown at the bottom of the dashboard. There is a way to send it a custom toolchain or just a custom binary, but I have not used it in years and have no memory of how it was done. https://github.com/aspnet/benchmarks is the entrypoint to all the docs. We could also just commit and wait for result. The only gotcha is that if there are build breaks or the flow from dotnet/runtime to dotnet/dotnet is stuck, this could be bunched up with a week or two worth of changes and then we'll need to prove it's not caused by this after the fact. |
|
They run aspnetbenchmarks (https://github.com/dotnet/performance/blob/main/src/benchmarks/gc/GC.Infrastructure/Configurations/ASPNetBenchmarks/ASPNetBenchmarks.csv), GCPerfSim benchmarks (https://github.com/dotnet/performance/tree/main/src/benchmarks/gc/GC.Infrastructure/Configurations/GCPerfSim) and a bunch of hand-picked microbenchmarks (https://github.com/dotnet/performance/blob/main/src/benchmarks/gc/GC.Infrastructure/Configurations/Microbenchmark/MicrobenchmarksToRun.txt) from the dotnet/performance repo on Windows. |
|
I can see that on Windows, the clrgcexp.dll is about 5kB smaller and coreclr.dll about 3.5kB smaller. |
|
I have just noticed that the list of sizes that @MichalStrehovsky has shared contains much larger size changes on linux for the same apps, I am going to do some Linux testing and also compare the disassembly of some of the binaries from Michal's list. |
|
Linux shows substantial changes in inlining and also loop unrolling, I need to run the perf tests on Linux too to see the impact. |
Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under
SERVER_GCand instead compile the shared sources directly as separate WKS and SVR objects.This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout.
It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp.
During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header.
I've made size comparison between the new clrgc.dll, clrgcexp.dll and coreclr.dll and the changes in the gc dlls were very minor, around ~1.5kB growth due to little different decisions of the linker / compiler w.r.t. cold / hot code. The coreclr even became ~1.5kB smaller.