Fix Thread.GetCurrentProcessorId for > 64 CPUs on Windows. #581

VSadov · 2019-12-05T19:00:47Z

GetCurrentProcessorNumber is capped to 64 on Windows and that results in unexpected sharing when having 64+ cores. In particular if the processor groups are in different NUMA nodes.
We need to use GetCurrentProcessorNumberEx.

GCToOSInterface::GetCurrentProcessorNumber has another implementation, which looks correct. This is basically a short version of that.

sergiy-k · 2019-12-05T19:31:01Z

src/coreclr/src/vm/comsynchronizable.cpp


+#ifndef FEATURE_PAL
+    PROCESSOR_NUMBER proc_no_cpu_group;
+    GetCurrentProcessorNumberEx(&proc_no_cpu_group);


Is it OK if ThreadNative::GetCurrentProcessorNumber might return an index that is larger than total number of active processors on the system?

Yes. What we return to the user is technically “ID correlated with last core we ran on”. We may even default to ThreadID if OS API is not functional. (PAL may return -1).

Are process groups contiguous? (1,2,3, ...)

To answer my question - yes, OS "packs" cores into as few process groups as possible and considers topology when assigning.
https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups

sergiy-k · 2019-12-05T19:33:21Z

/cc: @janvorli

janvorli · 2019-12-05T19:42:24Z

GetCurrentProcessorNumber is capped to 64 on Windows and that results in unexpected sharing when having 64+ cores.

It is not capped, it returns number of processor within the current processor group. Processor groups cannot have more than 64 CPUs.
I am not sure if returning a combined processor number the way you've done it (group * 64 + in_group_cpu_index) is something that we should make public, as Windows have no concept of CPU index larger than 64, and groups don't necessarily have to be completely filled. That means that the range of CPU indices would not be necessarily continuous.
In GC code, it is actually used as an internal encoding of the group / index, we always decode it back to group / index before using it.

VSadov · 2019-12-05T19:56:53Z

The goal here is to form an integer that user can use to softly affinitize data to cores.
This is not to describe topology. If interesting, we may need another API for that.

Returning proc number within a group is clearly wrong since it maps all cores into 0-64 range.

If there is a better way to produce a CoreID - how?

jkotas · 2019-12-05T20:33:39Z

Any difference in perf compared to what we have today?

VSadov · 2019-12-05T20:51:23Z

@jkotas - perf difference in API itself due to << and | or perf difference on 256core machine where every proc #N shares data with procs #N in 3 other NUMA nodes?

The latter really depends on the app.
The sharing is very unintuitive though. Even if we wanted to compress the ProcID range to 64, I'd prefer to share within nodes, not across.

VSadov · 2019-12-05T23:55:28Z

on machines with fast GetCurrentProcessorNumber this change adds 5-10% to the cost of the FCALL.

The following are measurements as reported by printf-instrumented #467 when it is forced to calibrate FCALL against standalone access to ThreadStatic
I picked just the best measurements, since there is some noise.

================= Main machine (12 logical cores, Coffeelake 4.2 GHz)

100ns tick (Stopwatch)
4096 iters

times (in ticks for one iteration)
=== ignoring cpu group (just calling GetCurrentProcessorNumber)
ID: 0.08984375 TLS: 0.04443359375

=== considering cpu group (calling and folding GetCurrentProcessorNumberEx)
ID: 0.095703125 TLS: 0.04443359375

adds 6.5% to ID call

================== Older machine (8 logical cores, Kabylake 4.0 GHz)
284ns tick
2048 iters

times (in ticks)
=== ignoring cpu group
ID: 0.03466796875 TLS: 0.015625

=== considering cpu group
ID: 0.0361328125 TLS: 0.0146484375

adds 4% to ID call

================== Rome (256 logical cores, Zen2, 2.4 GHz, has RDPID)
100ns tick
4096 iters

times (in ticks)
=== ignoring cpu group
ID: 0.0455322265625 TLS: 0.074462890625

=== considering cpu group
ID: 0.0498046875 TLS: 0.074462890625

adds 9% per ID call

jkotas · 2019-12-05T23:59:28Z

Sounds reasonable.

Fix GetCurrentProcessorId for > 64 CPUs on Windows.

94b1ff0

Dotnet-GitSync-Bot added the area-System.Diagnostics.Process label Dec 5, 2019

VSadov requested review from jkotas and sergiy-k December 5, 2019 19:04

VSadov added area-System.Threading and removed area-System.Diagnostics.Process labels Dec 5, 2019

sergiy-k reviewed Dec 5, 2019

View reviewed changes

stephentoub approved these changes Dec 5, 2019

View reviewed changes

jkotas approved these changes Dec 5, 2019

View reviewed changes

VSadov merged commit a6d62d6 into dotnet:master Dec 6, 2019

VSadov deleted the BigProcNum branch December 6, 2019 02:34

VSadov mentioned this pull request Dec 7, 2019

Adjusting GetCurrentProcessorId caching to different environments. #467

Merged

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Thread.GetCurrentProcessorId for > 64 CPUs on Windows. #581

Fix Thread.GetCurrentProcessorId for > 64 CPUs on Windows. #581

Uh oh!

VSadov commented Dec 5, 2019 •

edited

Loading

Uh oh!

sergiy-k Dec 5, 2019

Uh oh!

VSadov Dec 5, 2019 •

edited

Loading

Uh oh!

VSadov Dec 5, 2019

Uh oh!

sergiy-k commented Dec 5, 2019

Uh oh!

janvorli commented Dec 5, 2019

Uh oh!

VSadov commented Dec 5, 2019 •

edited

Loading

Uh oh!

jkotas commented Dec 5, 2019

Uh oh!

VSadov commented Dec 5, 2019 •

edited

Loading

Uh oh!

VSadov commented Dec 5, 2019 •

edited

Loading

Uh oh!

jkotas commented Dec 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fix Thread.GetCurrentProcessorId for > 64 CPUs on Windows. #581

Fix Thread.GetCurrentProcessorId for > 64 CPUs on Windows. #581

Uh oh!

Conversation

VSadov commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiy-k Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

VSadov Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

sergiy-k commented Dec 5, 2019

Uh oh!

janvorli commented Dec 5, 2019

Uh oh!

VSadov commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Dec 5, 2019

Uh oh!

VSadov commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Dec 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

VSadov commented Dec 5, 2019 •

edited

Loading

VSadov Dec 5, 2019 •

edited

Loading

VSadov commented Dec 5, 2019 •

edited

Loading

VSadov commented Dec 5, 2019 •

edited

Loading

VSadov commented Dec 5, 2019 •

edited

Loading