-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix Thread.GetCurrentProcessorId for > 64 CPUs on Windows. #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| #ifndef FEATURE_PAL | ||
| PROCESSOR_NUMBER proc_no_cpu_group; | ||
| GetCurrentProcessorNumberEx(&proc_no_cpu_group); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK if ThreadNative::GetCurrentProcessorNumber might return an index that is larger than total number of active processors on the system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. What we return to the user is technically “ID correlated with last core we ran on”. We may even default to ThreadID if OS API is not functional. (PAL may return -1).
Are process groups contiguous? (1,2,3, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer my question - yes, OS "packs" cores into as few process groups as possible and considers topology when assigning.
https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups
|
/cc: @janvorli |
It is not capped, it returns number of processor within the current processor group. Processor groups cannot have more than 64 CPUs. |
|
The goal here is to form an integer that user can use to softly affinitize data to cores. Returning proc number within a group is clearly wrong since it maps all cores into 0-64 range. If there is a better way to produce a CoreID - how? |
|
Any difference in perf compared to what we have today? |
|
@jkotas - perf difference in API itself due to The latter really depends on the app. |
|
on machines with fast The following are measurements as reported by printf-instrumented #467 when it is forced to calibrate FCALL against standalone access to ================= Main machine (12 logical cores, Coffeelake 4.2 GHz) 100ns tick ( times (in ticks for one iteration) === considering cpu group (calling and folding adds 6.5% to ID call ================== Older machine (8 logical cores, Kabylake 4.0 GHz) times (in ticks) === considering cpu group adds 4% to ID call ================== Rome (256 logical cores, Zen2, 2.4 GHz, has RDPID) times (in ticks) === considering cpu group adds 9% per ID call |
|
Sounds reasonable. |
GetCurrentProcessorNumberis capped to 64 on Windows and that results in unexpected sharing when having 64+ cores. In particular if the processor groups are in different NUMA nodes.We need to use
GetCurrentProcessorNumberEx.GCToOSInterface::GetCurrentProcessorNumberhas another implementation, which looks correct. This is basically a short version of that.