[clr-interp] Implement cached virtual/interface dispatch#123815
[clr-interp] Implement cached virtual/interface dispatch#123815BrzVlad merged 4 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
There was a problem hiding this comment.
Pull request overview
This pull request implements cached virtual and interface method dispatch for the CoreCLR interpreter to improve performance of virtual method calls. The implementation adds a simple hash table (InterpDispatchCache) that caches the mapping from (DispatchToken, MethodTable*) to the resolved MethodDesc*, similar to the DispatchCache used by Virtual Stub Dispatch (VSD).
Changes:
- Implements a new InterpDispatchCache structure that holds 4096 entries (12-bit cache) with collision replacement strategy
- Refactors DispatchToken::GetHash() to be shared between VSD and interpreter caches
- Integrates cache cleanup into GC sync points
- Adds dispatch token caching slots to INTOP_CALLVIRT and INTOP_CALLVIRT_TAIL instructions
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/interpexec.cpp | Adds InterpDispatchCache implementation with lookup/insert/reclaim logic, CreateDispatchTokenForMethod helper, and integrates cache into virtual call execution path |
| src/coreclr/vm/virtualcallstub.cpp | Refactors hash function by moving token hashing logic into DispatchToken::GetHash() method to enable sharing with interpreter cache |
| src/coreclr/vm/contractimpl.h | Adds GetHash() method declaration to DispatchToken struct for computing 12-bit cache hash |
| src/coreclr/vm/syncclean.cpp | Adds call to InterpDispatchCache_ReclaimAll() during GC sync point to clean up dead cache entries |
| src/coreclr/interpreter/inc/intops.def | Changes instruction size from 4 to 5 bytes for INTOP_CALLVIRT and INTOP_CALLVIRT_TAIL to accommodate dispatch token cache slot |
| src/coreclr/interpreter/compiler.h | Adds GetNewDataItemIndex() helper method to allocate non-shared data item slots for runtime caching |
| src/coreclr/interpreter/compiler.cpp | Updates EmitCall to allocate dispatch token cache slots for virtual call instructions |
|
@jkotas @davidwrighton Simplistic implementation from the discussion from the other day. Let me know if this approach seems ok. |
davidwrighton
left a comment
There was a problem hiding this comment.
I want to look at this more on Monday, but this is about what I thought we should build here.
|
Also this needs to handle collectibility correctly. There is a callback when an assembly is in the process of collecting where we clear the vsd cache. We should do the same here. |
|
@davidwrighton Added callback for when |
|
@davidwrighton Tested 2 benchmarks in full interpreted mode (json and blazor): default cache size 4k => ~40 collisions on json, 740 on blazor So it seems like heavily increasing the cache size reduces collisions but it is not so deterministic and incremental. I believe we should keep the default 4k until we have more widespread performance capability. |
|
Hmm, that looks like there might be real value to increasing the bucket size a bit or tweaking the hash function. Or maybe making the hash size a prime number. In any case, let's do that after merging the general concept in. |
105a086 to
0bfbcdf
Compare
We create a simple hashtable (InterpDispatchCache) that maps DispatchToken + target MT to target method to be called. The cache is similar to the `DispatchCache` used by VSD. It holds a single mapping per index, when a collision happens the entry will be replaced with the new one. Replaced entries are freed during GC. The expectation is that there will be few collisions given only a subset of methods are being interpreted.
0bfbcdf to
608b7c1
Compare
We create a simple hashtable (InterpDispatchCache) that maps DispatchToken + target MT to target method to be called. The cache is similar to the
DispatchCacheused by VSD. It holds a single mapping per index, when a collision happens the entry will be replaced with the new one. Replaced entries are freed during GC. The expectation is that there will be few collisions given only a subset of methods are being interpreted.This makes a microbenchmark from the suite 4x faster.