Skip to content

Linux: handle offline CPUs when getting frequencies#650

Closed
mwahlroos wants to merge 3 commits intohtop-dev:masterfrom
mwahlroos:linux-ignore-offline-cpu-freqs
Closed

Linux: handle offline CPUs when getting frequencies#650
mwahlroos wants to merge 3 commits intohtop-dev:masterfrom
mwahlroos:linux-ignore-offline-cpu-freqs

Conversation

@mwahlroos
Copy link
Copy Markdown

Fixes: #648

When getting frequencies from CPUFreq, the CPU nodes in sysfs are iterated from 0 to M-1 where M is the number of configured CPUs in the system and which should include offline ones as well, thus allowing for iterating through the full number of possible CPUs. The ones that are offline according to sysfs are ignored, and frequencies are fetched for up to N others where N is the number for which CPUData entries have been allocated.

When getting frequencies from /proc/cpuinfo, the CPU nodes and their frequencies are just taken in the order in which they appear. Offline CPUs do not seem to appear in cpuinfo, so they're automatically skipped. The CPU IDs from cpuinfo are ignored, apart from checking the output line format.

Possible issues that I can think of:

  • The sysconf variable _SC_NPROCESSORS_CONF used for getting the total number of CPUs here (in the CPUFreq code path) isn't standard, but it seems to have been mentioned on the Linux man page sysconf(3) since circa 2011.
  • It might not be the best idea to check the online status of every CPU node separately from sysfs. I don't know how fast or slow reading that information is, or if it has performance implications if there's a large number of CPUs or cores. It might be possible to parse e.g. /sys/devices/system/cpu/online for a list of online CPUs instead, which would require a bit of work to do properly but would allow for a single read.
  • The /proc/cpuinfo code path in its form suggested here assumes that offline CPUs do not appear and that the online nodes appear in the order of their CPU IDs. The author of the previous code didn't seem to want to make that assumption, and I'm not sure if it's valid.

Any feedback is welcome!

Fixes: htop-dev#648

If some CPUs have been turned offline, those CPUs will not appear in the
active CPU count parsed from /proc/stat.

Offline CPUs that are still present may still count in terms of sequential
CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/.
Reading the frequencies for the N active CPUs from CPUFreq information in
/sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some
of the CPUs [0..N-1] are actually offline.

If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores,
the code would attempt to read the frequencies for the remaining two online
CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU
entries from which to read the frequencies would be {0,2}.

The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq
suffers from a similar problem.

This fixes the reporting of frequencies in case of offline CPUs being mixed
within the range of online CPUs.
Copy link
Copy Markdown
Member

@BenBE BenBE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the style guide regarding indentation.

Comment thread linux/LinuxProcessList.c Outdated
Comment on lines +1834 to +1836
xSnprintf(pathBuffer, sizeof(pathBuffer), "/sys/devices/system/cpu/cpu%u/online", cpuId);

FILE* file = fopen(pathBuffer, "r");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the wrapper function for openat or readfile to reduce the number of FS level operations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also avoid the FILE* APIs as they have quite a bit of overhead.

Comment thread linux/LinuxProcessList.c Outdated
Comment thread linux/LinuxProcessList.c
Comment on lines +1848 to +1850
static int getConfiguredCPUCount(void) {
return (int) sysconf(_SC_NPROCESSORS_CONF);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be available globally already IIRC.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried looking for that but couldn't find it.

ProcessList has cpuCount, which is updated on Linux by LinuxProcessList_updateCPUcount, but that number only includes the number of active CPUs, and using it as the upper limit for the CPU id leads to precisely the problem that I'm trying to solve.

Any pointers to where this would be available globally?

Comment thread linux/LinuxProcessList.c

static int scanCPUFreqencyFromSysCPUFreq(LinuxProcessList* this) {
unsigned int cpus = this->super.cpuCount;
int configuredCPUs = getConfiguredCPUCount();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to cache that value and avoid requesting it on every run; occasional refresh is no problem though.

Comment thread linux/LinuxProcessList.c
if (onlineCPUIndex == 0)
clock_gettime(CLOCK_MONOTONIC, &start);

FILE* file = fopen(pathBuffer, "r");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid fopen in favour of openat; opendir the path /sys/devices/system/cpu and only scan from there. May combine this with the above and opendir cpu%d additionally.

Comment thread linux/LinuxProcessList.c
@@ -1859,14 +1897,14 @@ static int scanCPUFreqencyFromSysCPUFreq(LinuxProcessList* this) {
if (fscanf(file, "%lu", &frequency) == 1) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use readfile API

@BenBE
Copy link
Copy Markdown
Member

BenBE commented Jun 9, 2021

@mwahlroos Feel free to rebase/squash commits on your branch as you see fit. The final PR should preferably only contain a (minimal) set of linear changes to implement the change without all the intermediate steps from typos and bug fixes.

cgzones added a commit to cgzones/htop that referenced this pull request Jun 12, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
cgzones added a commit to cgzones/htop that referenced this pull request Jun 12, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
@cgzones cgzones mentioned this pull request Jun 12, 2021
8 tasks
cgzones added a commit to cgzones/htop that referenced this pull request Jun 12, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
cgzones added a commit to cgzones/htop that referenced this pull request Jun 12, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
cgzones added a commit to cgzones/htop that referenced this pull request Jun 12, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
cgzones added a commit to cgzones/htop that referenced this pull request Jun 13, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
BenBE pushed a commit to cgzones/htop that referenced this pull request Jul 18, 2021
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
@BenBE BenBE added enhancement Extension or improvement to existing feature Linux 🐧 Linux related issues needs-rebase Pull request needs to be rebased and conflicts to be resolved labels Aug 6, 2021
@cgzones
Copy link
Copy Markdown
Member

cgzones commented Aug 8, 2021

@mwahlroos can this be closed as #656 has been merged?

@mwahlroos mwahlroos closed this Aug 8, 2021
@mwahlroos
Copy link
Copy Markdown
Author

Closing this PR as #656 does this better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Extension or improvement to existing feature Linux 🐧 Linux related issues needs-rebase Pull request needs to be rebased and conflicts to be resolved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linux: incorrect CPU frequencies when some CPUs are offline

3 participants