fix for largepages with agressive decommit logic by mangod9 · Pull Request #126929 · dotnet/runtime

mangod9 · 2026-04-15T01:09:23Z

clear decommitted memory in the largepages scenario. Fixes #126903

dotnet-policy-service · 2026-04-15T01:10:21Z

Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

Fixes a GC heap-corruption scenario when GCLargePages is enabled and an induced Aggressive GC triggers “decommit” bookkeeping that doesn’t actually decommit at the OS level for large pages. The change ensures the memory that is treated as decommitted is explicitly cleared so stale references can’t be observed later.

Changes:

In the induced-aggressive path of gc_heap::distribute_free_regions, clear the region tail that would normally be decommitted/zeroed by the OS.
Gate the clearing to use_large_pages_p, since only large pages make virtual_decommit a no-op while still updating GC bookkeeping.

janvorli

LGTM, thank you!

janvorli · 2026-04-15T10:39:36Z

@mangod9 I believe this change should get in as is. But I wonder if it would be better to integrate the clearing of used part of the large page into the virtual_decommit (adding an "end of used data" argument) in the future so that we prevent similar issues to occur due to some changes in the GC. I also wonder if all the other usages of virtual_decommit are fine for large pages w.r.t. the fact the memory is not cleared.

mangod9 · 2026-04-15T14:51:11Z

@mangod9 I believe this change should get in as is. But I wonder if it would be better to integrate the clearing of used part of the large page into the virtual_decommit (adding an "end of used data" argument) in the future so that we prevent similar issues to occur due to some changes in the GC. I also wonder if all the other usages of virtual_decommit are fine for large pages w.r.t. the fact the memory is not cleared.

yeah moved it centrally to virtual_decommit now. I have looked through other large_pages code flow and this looks to be the only case.

mangod9 · 2026-04-16T00:04:26Z

/ba-g downloading artifacts is constantly stuck on macOS

jkotas · 2026-04-16T01:39:35Z

+    // observes leftover object references after the region is reused.
+    if (use_large_pages_p && (end_of_data != nullptr) && (end_of_data > address))
+    {
+        memclr ((uint8_t*)address, (uint8_t*)end_of_data - (uint8_t*)address);


In other paths, the GC just takes keeps track of the fact that memory is dirty and clears it right before it is used for allocations again in gc_heap::adjust_limit_clr. Would it be a better option here?

the fix was following the same pattern like this in decommit_region:

runtime/src/coreclr/gc/memory.cpp

Lines 358 to 373 in 75d3e60

if (require_clearing_memory_p)

{

uint8_t* clear_end = use_large_pages_p ? heap_segment_used (region) : heap_segment_committed (region);

size_t clear_size = clear_end - page_start;

memclr (page_start, clear_size);

heap_segment_used (region) = heap_segment_mem (region);

dprintf(REGIONS_LOG, ("cleared region %p(%p-%p) (%zu bytes)",

region,

page_start,

clear_end,

clear_size));

}

else

{

heap_segment_committed (region) = heap_segment_mem (region);

}

where memclr clears the full region for large_pages. Similar cleanup was missing during aggressive decommitting of tail regions.

With large pages, VirtualDecommit is a no-op since large pages cannot be partially decommitted. PR dotnet#126929 fixed the resulting stale data corruption by adding memclr in virtual_decommit, but this approach has downsides: the memory is never returned to the OS, yet we pay for the clearing and produce misleading committed/used bookkeeping. Instead, skip the decommit entirely for large pages: 1. distribute_free_regions: skip the aggressive tail-region decommit (the committed-but-unallocated tail of in-use regions). This was the path that caused the heap corruption in dotnet#126903. 2. decommit_heap_segment: skip the whole-segment decommit used for segment hoarding and BGC segment deletion. Same class of issue: committed/used are lowered but physical memory retains stale data. 3. decommit_region: bypass virtual_decommit and call reduce_committed_bytes directly, since decommit_region already handles large pages correctly by clearing memory itself. 4. virtual_decommit: add an assert that it is never called for heap memory when large pages are on. This catches any future caller that forgets to handle the large pages case. The end_of_data parameter and no-op ternary added by dotnet#126929 are removed. Add GCLargePages=2 mode that simulates large pages using small pages: sets use_large_pages_p=true but reserves with normal pages and commits everything upfront. This exercises all large page GC code paths without requiring OS large page setup or privileges, enabling CI testing. Fix dotnet#126903

…7290) With large pages, VirtualDecommit is a no-op since large pages cannot be partially decommitted. PR #126929 fixed the resulting stale data corruption by adding memclr in virtual_decommit, but this approach has downsides: the memory is never returned to the OS, yet we pay for the clearing and produce misleading committed/used bookkeeping. Instead, skip the decommit entirely for large pages: 1. distribute_free_regions: skip the aggressive tail-region decommit (the committed-but-unallocated tail of in-use regions). This was the path that caused the heap corruption in #126903. 2. decommit_heap_segment: skip the whole-segment decommit used for segment hoarding and BGC segment deletion. Same class of issue: committed/used are lowered but physical memory retains stale data. 3. decommit_region: bypass virtual_decommit and call reduce_committed_bytes directly, since decommit_region already handles large pages correctly by clearing memory itself. 4. virtual_decommit: add an assert that it is never called for heap memory when large pages are on. This catches any future caller that forgets to handle the large pages case. The end_of_data parameter and no-op ternary added by #126929 are removed. Add GCLargePages=2 mode that simulates large pages using small pages: sets use_large_pages_p=true but reserves with normal pages and commits everything upfront. This exercises all large page GC code paths without requiring OS large page setup or privileges, enabling CI testing. Fix #126903

fix for largepages with agressive decommit

adc1262

mangod9 requested a review from janvorli April 15, 2026 01:09

mangod9 added the area-GC-coreclr label Apr 15, 2026

Copilot AI review requested due to automatic review settings April 15, 2026 01:09

dotnet-policy-service Bot assigned mangod9 Apr 15, 2026

Copilot started reviewing on behalf of mangod9 April 15, 2026 01:10 View session

mangod9 mentioned this pull request Apr 15, 2026

GC heap corruption with GCLargePages #126903

Open

Copilot AI reviewed Apr 15, 2026

View reviewed changes

build-analysis Bot mentioned this pull request Apr 15, 2026

System.Net.NameResolution.Tests DNS failures: Name or service not known #126641

Open

janvorli approved these changes Apr 15, 2026

View reviewed changes

mangod9 added 2 commits April 15, 2026 07:27

CR Feedback

3a643c5

adding null check for clarity

009ad4d

janvorli approved these changes Apr 15, 2026

View reviewed changes

mangod9 merged commit 830b6fe into dotnet:main Apr 16, 2026
109 of 113 checks passed

jkotas reviewed Apr 16, 2026

View reviewed changes

dotnet-maestro Bot mentioned this pull request Apr 16, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#6035

Merged

cshung mentioned this pull request Apr 22, 2026

Skip decommit for large pages and add fake large pages test mode #127290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for largepages with agressive decommit logic#126929

fix for largepages with agressive decommit logic#126929
mangod9 merged 3 commits intodotnet:mainfrom
mangod9:fix/gc-largepages

mangod9 commented Apr 15, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

janvorli left a comment

Uh oh!

janvorli commented Apr 15, 2026

Uh oh!

mangod9 commented Apr 15, 2026

Uh oh!

mangod9 commented Apr 16, 2026

Uh oh!

Uh oh!

jkotas Apr 16, 2026

Uh oh!

mangod9 Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (require_clearing_memory_p)
	{
	uint8_t* clear_end = use_large_pages_p ? heap_segment_used (region) : heap_segment_committed (region);
	size_t clear_size = clear_end - page_start;
	memclr (page_start, clear_size);
	heap_segment_used (region) = heap_segment_mem (region);
	dprintf(REGIONS_LOG, ("cleared region %p(%p-%p) (%zu bytes)",
	region,
	page_start,
	clear_end,
	clear_size));
	}
	else
	{
	heap_segment_committed (region) = heap_segment_mem (region);
	}

Conversation

mangod9 commented Apr 15, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

janvorli left a comment

Choose a reason for hiding this comment

Uh oh!

janvorli commented Apr 15, 2026

Uh oh!

mangod9 commented Apr 15, 2026

Uh oh!

mangod9 commented Apr 16, 2026

Uh oh!

Uh oh!

jkotas Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

mangod9 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants