Port 126929 to 10.0#126977
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/gc |
There was a problem hiding this comment.
Pull request overview
Ports the fix for #126903 to the 10.0 branch by ensuring memory that is logically decommitted while GCLargePages is enabled is also cleared, preventing stale object references from being observed when the region is later reused.
Changes:
- Extends
gc_heap::virtual_decommitwith an optionalend_of_dataparameter to allow clearing memory when OS decommit is a no-op under large pages. - In the aggressive-induced GC path, passes
heap_segment_used(region)so the GC clears stale reference-containing bytes when shrinkingheap_segment_committed.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/gc/gcpriv.h | Updates the virtual_decommit declaration to accept an optional end_of_data pointer. |
| src/coreclr/gc/gc.cpp | Implements large-page clearing in virtual_decommit and wires end_of_data from distribute_free_regions() for aggressive-induced decommits. |
Yes, so far so good on the production test, will keep you posted if we run into anything on that front. |
|
ok will work on getting this approved for back port. Thanks! |
|
@neiljohari, @janvorli, @mangod9, please resolve code review comments. |
|
@BenV, this is now merged so should be included in June servicing. |
Excellent, thanks again for all the help! |
Fixes #126903
Customer Impact
GC heap corruption when
DOTNET_GCLargePages=1is enabled on Linux (#126903). . Reproducible by callingGC.Collect(2, GCCollectionMode.Aggressive, true, true)with large pages enabled, but also occurs in normal production workloads without aggressive GC.Regression
This is a pre-existing bug in the GC's large-page decommit logic. When
GCLargePagesis enabled, the GC skips OS-leveldecommits but still updates bookkeeping as if the decommit succeeded. This causes regions to be reused without being zeroed, leading to heap corruption. The bug has existed since Regions was enabled.
Testing
The fix was validated by the customer against their production workload.
Risk
Low. The fix clears decommitted memory in the large-pages scenario to ensure regions are properly zeroed before reuse. This is a targeted change to the GC's decommit path that only affects
GCLargePages=1configurations. The larger fix #127290 is made in .NET 11