Skip to content

Separate GC WKS and SVR compilation units#126720

Open
janvorli wants to merge 5 commits intodotnet:mainfrom
janvorli:gc-separate-compilation
Open

Separate GC WKS and SVR compilation units#126720
janvorli wants to merge 5 commits intodotnet:mainfrom
janvorli:gc-separate-compilation

Conversation

@janvorli
Copy link
Copy Markdown
Member

@janvorli janvorli commented Apr 9, 2026

Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects.

This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout.

It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp.

During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header.

I've made size comparison between the new clrgc.dll, clrgcexp.dll and coreclr.dll and the changes in the gc dlls were very minor, around ~1.5kB growth due to little different decisions of the linker / compiler w.r.t. cold / hot code. The coreclr even became ~1.5kB smaller.

Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects.

This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout.

It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp.

During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@janvorli janvorli added this to the 11.0.0 milestone Apr 9, 2026
@janvorli janvorli self-assigned this Apr 9, 2026
Copilot AI review requested due to automatic review settings April 9, 2026 16:53
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors CoreCLR GC build plumbing to stop using wrapper translation units (gcwks.cpp/gcsvr.cpp and gceewks.cpp/gceesvr.cpp) and instead compile the shared GC implementation sources directly into separate WKS/SVR object sets, using a new shared compilation-context header gcinternal.h.

Changes:

  • Introduces gcinternal.h and updates many GC .cpp files to include it and wrap code in WKS/SVR namespaces based on SERVER_GC.
  • Updates CoreCLR VM and standalone GC CMake build graphs to build GC sources as object libraries for WKS/SVR and consume them from coreclr/clrgc targets.
  • Updates NativeAOT runtime and the GC sample project build surfaces to consume the new GC source layout.

Reviewed changes

Copilot reviewed 31 out of 33 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/wks/CMakeLists.txt Adds GC object files into cee_wks_core build inputs.
src/coreclr/vm/CMakeLists.txt Defines shared GC source list and builds vm_gc_wks/vm_gc_svr object libraries.
src/coreclr/dlls/mscoree/coreclr/CMakeLists.txt Links vm_gc_wks/vm_gc_svr into coreclr and coreclr_static.
src/coreclr/nativeaot/Runtime/CMakeLists.txt Replaces wrapper sources with direct GC .cpp compilation for NativeAOT.
src/coreclr/gc/gcinternal.h New shared GC compilation-context header; centralizes includes and inlines.
src/coreclr/gc/*.cpp Switches individual GC implementation files to include gcinternal.h and wrap in WKS/SVR namespaces.
src/coreclr/gc/CMakeLists.txt Builds standalone GC (clrgc/clrgcexp) with WKS/SVR object libraries.
src/coreclr/gc/sample/* Updates GC sample to compile split GC .cpp files directly.

Comment thread src/coreclr/gc/gcinternal.h
Comment thread src/coreclr/vm/wks/CMakeLists.txt
Comment thread src/coreclr/nativeaot/Runtime/CMakeLists.txt Outdated
@mangod9
Copy link
Copy Markdown
Member

mangod9 commented Apr 9, 2026

what is the motivation for this change? Does it improve build times?

@janvorli
Copy link
Copy Markdown
Member Author

janvorli commented Apr 9, 2026

what is the motivation for this change? Does it improve build times?

It doesn't affect build time in any way. The main reason is code editing experience. The individual files into which the gc.cpp was split in my recent change didn't include the headers for symbols they use, so when editing one of those e.g. in VS code, it was showing a lot of red squiggles and code navigation didn't work well.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 10, 2026

Build breaks....

Copilot AI review requested due to automatic review settings April 10, 2026 12:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 33 changed files in this pull request and generated 1 comment.

Comment thread src/coreclr/nativeaot/Runtime/CMakeLists.txt Outdated
Copilot AI review requested due to automatic review settings April 10, 2026 19:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 33 changed files in this pull request and generated 1 comment.

Comment thread src/coreclr/gc/plan_phase.cpp
Copy link
Copy Markdown
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MichalStrehovsky
Copy link
Copy Markdown
Member

2% size savings on Hello World on Linux, nice!

Size statistics

Pull request #126720

Project Size before Size after Difference
TodosApi-linux 24545416 24488072 -57344
TodosApi-windows 25923584 25909760 -13824
avalonia.app-linux 18912136 18887560 -24576
avalonia.app-windows 19439616 19433472 -6144
hello-linux 1246088 1221512 -24576
hello-minimal-linux 1098528 1073952 -24576
hello-minimal-windows 773632 766464 -7168
hello-windows 933376 927232 -6144
kestrel-minimal-linux 5354976 5293536 -61440
kestrel-minimal-windows 4854784 4840960 -13824
reflection-linux 1832784 1808208 -24576
reflection-windows 1696256 1689600 -6656
webapiaot-linux 9721456 9660016 -61440
webapiaot-windows 10246144 10232320 -13824
winrt-component-minimal-windows 721408 714240 -7168

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 10, 2026

2% size savings on Hello World on Linux, nice!

Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux.

This type of refactoring tends to depend on good PGO data and whole program optimizations for good perf:

  • On Windows, we do not use LGCG and PGO for NAOT to avoid compiler version fragility
  • On Linux, the infrastructure for collecting PGO data is broken for libcoreclr.so. It is one of the check boxes in Startup time of small workloads #120407 and there was a chat on Teams about that. We may want to frontload fixing it to avoid regressing GC perf on Linux.

@janvorli
Copy link
Copy Markdown
Member Author

janvorli commented Apr 13, 2026

Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux.

I definitely want to run the GC perf runs and understand where the size improvement comes from before we merge this.
There were two large functions that were inlined before and that I stopped marking as inlined because I felt like they were too large to make sense inlining them and wanted to verify a perf impact of that. Those were gc_heap::mark_through_cards_helper and gc_heap::set_region_gen_num. Edit: And gc_heap::get_promoted_bytes

@janvorli
Copy link
Copy Markdown
Member Author

@jkotas performance and functionality tests didn't show any regressions.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 16, 2026

Have you measured it on a binary that showed the large code size reduction?

(Also, there is a merge conflict that needs to be resolved.)

@janvorli
Copy link
Copy Markdown
Member Author

Have you measured it on a binary that showed the large code size reduction?

It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned.

@VSadov
Copy link
Copy Markdown
Member

VSadov commented Apr 16, 2026

performance and functionality tests didn't show any regressions.

I am not very surprised. GC perf is typically dominated by memory accesses (or, more precisely, by cache misses since access patterns could be cache-unfriendly). Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too.
This refactoring moves pretty large chunks of code around, so it is more likely not affecting perf in a meaningful way. There is always a chance of some subtle but consequential change, but likelihood seems low.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 16, 2026

Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too.

Compilers do that reasonably well if they can see the whole program. This change breaks down the GC into multiple compilations units, and the compiler won't the see the whole GC anymore (when compiling for NAOT at least).

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 16, 2026

What do they run to measure? Is it possible to publish the same test w/ NAOT and run it locally?

@MichalStrehovsky
Copy link
Copy Markdown
Member

It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned.

These are the tests we run in the ASP.NET perf lab: https://aka.ms/aspnet/nativeaot/benchmarks. TodosApi from the table above is the Stage2 app on the benchmarks page. The crank command line used to trigger the run is shown at the bottom of the dashboard. There is a way to send it a custom toolchain or just a custom binary, but I have not used it in years and have no memory of how it was done. https://github.com/aspnet/benchmarks is the entrypoint to all the docs.

We could also just commit and wait for result. The only gotcha is that if there are build breaks or the flow from dotnet/runtime to dotnet/dotnet is stuck, this could be bunched up with a week or two worth of changes and then we'll need to prove it's not caused by this after the fact.

@janvorli
Copy link
Copy Markdown
Member Author

I can see that on Windows, the clrgcexp.dll is about 5kB smaller and coreclr.dll about 3.5kB smaller.

@janvorli
Copy link
Copy Markdown
Member Author

I have just noticed that the list of sizes that @MichalStrehovsky has shared contains much larger size changes on linux for the same apps, I am going to do some Linux testing and also compare the disassembly of some of the binaries from Michal's list.

@janvorli
Copy link
Copy Markdown
Member Author

Linux shows substantial changes in inlining and also loop unrolling, I need to run the perf tests on Linux too to see the impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants