CPU utilization computation fixes#2196
Conversation
- On Unix, move scalling for total number of processors from PAL to managed side, so that it can use container limit aware ProcessorCount - Delete asserts for CPU utilization being between 0 and 100. These asserts can fail due to races or rounding errors. - Converted a few classes to structs Fixes #2195
| * compute the current process's CPU utilization instead. | ||
| * compute the current process's CPU utilization instead. The CPU utilization | ||
| * returned is sum of utilization accross all processors, e.g. this function will | ||
| * return 200 when two cores are running at 100%. |
There was a problem hiding this comment.
I'm a little confused by removing the multiplication by number of processors, but then this saying it's larger for more processors. I'm sure you're right, but what's my misunderstanding?
There was a problem hiding this comment.
System.Native shim does not have easy access to the process count adjusted for container limits. The processor count on managed size is adjusted and thus it is the right number to use to compute how much spare cycles may extra threads take advantage of.
In other words, this change should get us equivalent of this piece of logic from CoreCLR PAL.
There was a problem hiding this comment.
It's essentially removing division, not multiplication. It's not obvious from reading the unified diff because GitHub hides the cpuUtilization = (int32_t)(cpuBusyTime * 100 / cpuTotalTime); line. The multiplication in cpuTotalTime is affecting the divisor.
There was a problem hiding this comment.
Ah, yeah, I missed that, thanks.
| long cpuTotalTime = ((long)userTime - _processCpuInfo.userTime) + ((long)kernelTime - _processCpuInfo.kernelTime); | ||
| long cpuBusyTime = cpuTotalTime - ((long)idleTime - _processCpuInfo.idleTime); | ||
| long cpuTotalTime = ((long)userTime - _userTime) + ((long)kernelTime - _kernelTime); | ||
| long cpuBusyTime = cpuTotalTime - ((long)idleTime - _idleTime); |
There was a problem hiding this comment.
Do we have any tests that would fall if we got something wrong here?
There was a problem hiding this comment.
Currently, this is only used for Mono on Windows that gets a very light testing. If we got this wrong, this would manifest itself as a performance bug in thread pool scaling.
Hopefully, we will find time to switch to the managed threadpool for CoreCLR too. Validating details like this would be the bulk of the work for that.
Fixes #2195