Skip to content

Replace HashMap COOP transitions with Epoch-Based Reclamation (EBR)#124307

Open
AaronRobinsonMSFT wants to merge 24 commits intodotnet:mainfrom
AaronRobinsonMSFT:ebr-hashmap
Open

Replace HashMap COOP transitions with Epoch-Based Reclamation (EBR)#124307
AaronRobinsonMSFT wants to merge 24 commits intodotnet:mainfrom
AaronRobinsonMSFT:ebr-hashmap

Conversation

@AaronRobinsonMSFT
Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT commented Feb 12, 2026

Supercedes #123492

This pull request introduces Epoch-Based Reclamation (EBR) to the CoreCLR runtime for safe, low-overhead deferred deletion in concurrent data structures such as HashMap. The EBR mechanism enables memory reclamation without requiring garbage collection suspension or cooperative mode transitions, improving performance and safety in async scenarios. The main changes include adding new EBR lock types, integrating the EBR collector, updating HashMap memory management, and ensuring proper cleanup of deferred deletions.

EBR integration and lock management:

  • Added two new Crst lock types, CrstEbrPending and CrstEbrThreadList, specifically for EBR's internal thread list and pending deletion queues. These are leaf locks and must not be held across HashMap operations or GC transitions. (src/coreclr/inc/CrstTypes.def, src/coreclr/inc/crsttypes_generated.h) [1] [2] [3] [4]

EBR collector implementation and initialization:

  • Introduced the new ebr.h and ebr.cpp files, defining the EbrCollector class and related functions for managing critical regions, deferred deletion, and epoch advancement. The global collector g_HashMapEbr is initialized at EE startup and used throughout the runtime. (src/coreclr/vm/CMakeLists.txt, src/coreclr/vm/ceemain.cpp, src/coreclr/vm/ebr.h) [1] [2] [3] [4] [5]

Deferred deletion and cleanup integration:

  • Modified the finalizer thread to trigger EBR cleanup when necessary, ensuring that deferred deletions are reclaimed safely and efficiently. (src/coreclr/vm/finalizerthread.cpp) [1] [2] [3]

HashMap memory management and async safety:

  • Refactored HashMap bucket allocation and deletion to use new helper functions (AllocateBuckets, FreeBuckets, DeleteObsoleteBuckets) and integrated EBR critical region holders to protect async operations from concurrent memory reclamation. (src/coreclr/vm/hash.cpp) [1] [2] [3] [4] [5] [6] [7]

Build and include updates:

  • Updated build files and includes to ensure EBR is compiled and linked as part of the runtime, and properly referenced in affected components. (src/coreclr/vm/CMakeLists.txt, src/coreclr/vm/corhost.cpp, src/coreclr/vm/hash.cpp, src/coreclr/vm/ceemain.cpp, src/coreclr/vm/finalizerthread.cpp) [1] [2] [3] [4] [5] [6]

HashMap's async mode used GCX_MAYBE_COOP_NO_THREAD_BROKEN to transition
into cooperative GC mode on every operation, preventing the GC from
freeing obsolete bucket arrays mid-read. Old bucket arrays were queued
via SyncClean::AddHashMap and freed during GC pauses.

This caused a deadlock: when HashMap::LookupValue() was called while
holding the DebuggerController lock, the COOP transition (which is
level-equivalent to taking the ThreadStore lock) violated lock ordering
constraints, since ThreadStore must be acquired before DebuggerController.

Replace both mechanisms with Epoch-Based Reclamation (EBR), based on
Fraser's algorithm from 'Practical Lock-Freedom' (UCAM-CL-TR-579):

- EnterCriticalRegion/ExitCriticalRegion are simple atomic flag stores
  with memory barriers -- they never block or trigger GC transitions
- Obsolete bucket arrays are queued for deferred deletion and freed
  once all threads have passed through a quiescent state
- An RAII holder (EbrCriticalRegionHolder) replaces GCX_MAYBE_COOP
  at all 6 call sites in hash.cpp

Changes:
- New: src/coreclr/vm/ebr.h, ebr.cpp (EbrCollector, ~340 lines)
- hash.cpp: Replace 6 GCX_MAYBE_COOP_NO_THREAD_BROKEN with EBR holders,
  replace SyncClean::AddHashMap with QueueForDeletion
- syncclean.hpp/cpp: Remove HashMap-related members and cleanup code
- ceemain.cpp: Init g_HashMapEbr at startup, shutdown at EE shutdown
- CrstTypes.def: Add CrstEbrThreadList, CrstEbrPending
- crsttypes_generated.h: Regenerated with new Crst types
- CMakeLists.txt: Add ebr.cpp, ebr.h to build
- Rename memoryBudget/m_pendingSize to memoryBudgetInBytes/m_pendingSizeInBytes
- Mark EbrCollector and EbrCriticalRegionHolder as final
- Delete move constructors/assignment operators
- Move NextObsolete from hash.h (public) to hash.cpp (file-static)
- Reuse DeleteObsoleteBuckets for sync-mode path in Rehash
- Trim redundant backstory comments at EBR call sites
- Remove unused forward decls from syncclean.hpp
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces HashMap async-mode protection that relied on per-operation COOP GC transitions and GC-time cleanup with an Epoch-Based Reclamation (EBR) mechanism to avoid lock-ordering deadlocks (notably involving DebuggerController vs ThreadStore/GC transitions).

Changes:

  • Introduces a new EBR implementation (EbrCollector + EbrCriticalRegionHolder) and a global collector for HashMap async mode (g_HashMapEbr).
  • Updates HashMap async call sites to use EBR critical regions and queues obsolete bucket arrays for deferred deletion via EBR.
  • Removes the HashMap-specific deferred cleanup path from SyncClean and adds new Crst types for EBR internal locks.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/coreclr/vm/syncclean.hpp Removes HashMap cleanup surface from SyncClean.
src/coreclr/vm/syncclean.cpp Removes HashMap obsolete-bucket list tracking and GC-time deletion.
src/coreclr/vm/hash.h Removes NextObsolete helper from the header.
src/coreclr/vm/hash.cpp Adds EBR critical region usage and EBR-based deferred deletion for obsolete buckets.
src/coreclr/vm/ebr.h Adds public EBR APIs (EbrCollector, EbrCriticalRegionHolder) and global collector declaration.
src/coreclr/vm/ebr.cpp Implements the EBR collector, per-thread tracking, and deferred deletion queues.
src/coreclr/vm/ceemain.cpp Initializes/shuts down the global HashMap EBR collector during runtime startup/shutdown.
src/coreclr/vm/CMakeLists.txt Adds EBR sources/headers to the VM build.
src/coreclr/inc/crsttypes_generated.h Adds new CrstEbrPending / CrstEbrThreadList types and metadata.
src/coreclr/inc/CrstTypes.def Declares new EBR Crst types.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 12, 2026 01:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

- QueueForDeletion: leak object on OOM instead of immediate deletion,
  which could cause use-after-free for concurrent EBR readers. Track
  leaked count via InterlockedIncrement counter.
- Rehash: read obsolete bucket size directly from allocation base
  instead of calling GetSize with wrong pointer (undefined behavior).
- Shutdown: early-return if !m_initialized instead of asserting
- Buckets()/Rehash(): simplify assert to !m_fAsyncMode || InCriticalRegion()
- LookupValue: remove GC thread exclusion from EBR critical region
- Comment fixes in InsertValue and Rehash deferred deletion
Add EbrCollector::ThreadDetach() to unlink and free per-thread EBR
data. Call it from ThreadDetaching() in corhost.cpp, following the
existing StressLog::ThreadDetach() pattern. This prevents unbounded
growth of the EBR thread list in processes with short-lived threads.
Replace thread_local EbrThreadData* with thread_local EbrThreadData
value, eliminating the OOM failure path in GetOrCreateThreadData().
This removes the risk of null dereference in ExitCriticalRegion()
when the RAII holder unwinds after a failed EnterCriticalRegion().
Shutdown and ThreadDetach now clear the data with = {} instead of
deleting heap memory.
Copilot AI review requested due to automatic review settings February 13, 2026 22:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Introduce static AllocateBuckets and FreeBuckets helpers to ensure
consistent BYTE[] allocation and deallocation of bucket arrays. Move
GetSize/SetSize from HashMap members to file-static functions. Remove
vestigial NextObsolete and chain-traversal loop from DeleteObsoleteBuckets
since EBR queues each array independently.
Split DrainQueue into DetachQueue (under lock) and DeletePendingEntries
(outside lock) so CRT free calls don't hold m_pendingLock. Add
EbrPendingEntry constructor to initialize fields at allocation time.
@AaronRobinsonMSFT AaronRobinsonMSFT marked this pull request as draft February 18, 2026 20:49
EbrThreadData was stored directly as a thread_local struct, but the node
remains linked in the collector's thread list after the thread exits.
When C++ TLS tears down the storage, CanAdvanceEpoch/TryAdvanceEpoch
would chase a dangling pointer.

Additionally, ThreadDetaching() in corhost.cpp only fires for threads
with a runtime Thread object. Threads that used EBR without one would
never get cleaned up.

- Heap-allocate EbrThreadData so the node outlives the thread
- Add EbrTlsDestructor to call ThreadDetach for all threads
- TryAdvanceEpoch now deletes pruned nodes
- Remove g_HashMapEbr.ThreadDetach() from corhost.cpp
- Inline UnlinkThreadData logic into ThreadDetach

TODO: The heap allocation in GetOrCreateThreadData (new EbrThreadData)
runs on the GC_NOTRIGGER + MODE_COOPERATIVE hot path. This allocation
should be moved to a safer location (e.g. thread setup) to avoid
potential OOM on the critical dispatch path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 20:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

@AaronRobinsonMSFT AaronRobinsonMSFT marked this pull request as ready for review February 18, 2026 22:14
Copilot AI review requested due to automatic review settings February 18, 2026 22:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings February 19, 2026 00:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

- Fix comment on t_pThreadData: it is a thread_local value, not a
  heap-allocated pointer, and ThreadDetach prunes it (not TryAdvanceEpoch).
- Fix m_threadListLock comment: used for pruning and epoch scanning,
  not only pruning.
- Fix fence comments: MemoryBarrier() is a full fence, not an acquire fence.
- Remove CrstEbrPending and CrstEbrThreadList from CrstTypes.def and
  regenerate crsttypes_generated.h.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 19, 2026 23:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings February 20, 2026 03:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}
}

if (m_EEHashTable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to fix this one in a follow up as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants

Comments