Skip to content

Rework CPU counting#656

Merged
fasterit merged 12 commits intohtop-dev:masterfrom
cgzones:cpu_count
Aug 2, 2021
Merged

Rework CPU counting#656
fasterit merged 12 commits intohtop-dev:masterfrom
cgzones:cpu_count

Conversation

@cgzones
Copy link
Copy Markdown
Member

@cgzones cgzones commented Jun 12, 2021

Currently htop does not support offline CPUs and hot-swapping, e.g. via
echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and existingCPUs.

Supersedes: #650
Related: #580

/cc @mwahlroos @sthen

@natoscott please take a look at the PCP code

TODO:

  • Check on Darwin
  • Check on DragonFlyBSD
  • Check on FreeBSD
  • Check on NetBSD (Add NetBSD support in htop(1) #603)
  • Check on OpenBSD
  • Check on Solaris
  • Check with PCP
  • Hide offline CPUs in affinity panel

@cgzones cgzones force-pushed the cpu_count branch 4 times, most recently from a3b2ac6 to 5408fd7 Compare June 12, 2021 21:07
@BenBE BenBE added the enhancement Extension or improvement to existing feature label Jun 13, 2021
@cgzones cgzones force-pushed the cpu_count branch 12 times, most recently from 37fd370 to 5ff6d27 Compare June 13, 2021 12:41
@cgzones cgzones marked this pull request as ready for review June 13, 2021 12:42
Comment thread solaris/SolarisProcessList.c
Comment thread ProcessList.h

unsigned int cpuCount;
unsigned int activeCPUs;
unsigned int existingCPUs;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal but I find the naming slightly unclear - AIUI 'active' means 'online' and 'existing' means the CPU exists but is either online or offline? An alternative might be to use 'onlineCPUs' and something like 'actualCPUs' or 'realCPUs' here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer actualCPUs over realCPUs to not clash with virtualization stuff …

Comment thread linux/LinuxProcessList.c
#endif

static void LinuxProcessList_updateCPUcount(ProcessList* super, FILE* stream) {
static void LinuxProcessList_updateCPUcount(ProcessList* super) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This sysfs-walk looks like a relatively expensive approach - going from what is currently a handful of syscalls at most, per sample, to potentially many thousands of additional syscalls with each sample on large CPU count machines.

Does an offline CPU appear at all in /proc/stat output? If not, we should be able to do this operation using a combination of a/ one-time call to sysconf(_SC_NPROCESSORS_CONF) at startup and, b/ CPU state-management based on presence (or zero values?) in /proc/stat (while reading stats) at sample time... i.e. if cpu22 of 32 is missing in /proc/stat mark it as !online ... and update activeCPU count accordingly - would that work?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporter of the issue that probably got this started here. I have no other authority on the matter. :)

Another alternative in case of Linux would be to parse the list of online CPU nodes at /sys/devices/system/cpu/online on update. That would be single read from sysfs rather than one read per CPU.

The sysfs files are listed in the Linux sysfs documentation. The format for the CPU-listing files seems to be e.g. "0-3,5,7", which in case of the aggregate online file would mean that CPU nodes 0 to 3, and additionally 5 and 7 are online.

That would give explicit and authoritative information on which CPUs are currently being scheduled. It might of course be simpler to just assume that the ones not appearing in /proc/stat are offline. That seems to be the case on my machine but I have no broader knowledge on how the file lists the CPUs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original proposal used sysconf(_SC_NPROCESSORS_CONF), which is implemented as a sysfs-walk, see https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/getsysstats.c.html#236.
By doing the walk on our own we can search specific for the data we need.
Maybe at RC stage we can take a more serious look on performance.

Comment thread linux/LinuxProcessList.c
unsigned int cpuid;
(void) sscanf(buffer, "cpu%4u %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu", &cpuid, &usertime, &nicetime, &systemtime, &idletime, &ioWait, &irq, &softIrq, &steal, &guest, &guestnice);
assert(cpuid == i - 1);
adjCpuId = cpuid + 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is what I was looking for in that previous comment. :) So an alternate approach to finding offline CPUs might be to mark all offline at start of this function, and then mark each individually as 'online' as we parse the file and find a CPU (and update counts as we go if we need total online vs actual).

cgzones added 6 commits July 18, 2021 07:44
Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
Bogus uptime measurements can result in wrap-arounds, leading to
negative garbage values printed.
Kernel threads do not have an executable and the check can result in
garbage values as unprivileged user.
Wait until it has been decided what kind of task the entry actually is.
cgzones added 6 commits July 18, 2021 07:50
sched_getaffinity() and sched_setaffinity() are also available on BSDs.
Remove the Linux restraint.
Wait until it has been decided what kind of task the entry actually is.
Example hot-swapping:
    psradm -F -f 2
openbsd/OpenBSDProcessList.c:176:56: error: no member named 'ki_pid' in 'struct kinfo_proc'; did you mean 'p_pid'?
   const int mib[] = { CTL_KERN, KERN_PROC_CWD, kproc->ki_pid };
                                                       ^~~~~~
                                                       p_pid
/usr/include/sys/sysctl.h:375:10: note: 'p_pid' declared here
        int32_t p_pid;                  /* PID_T: Process identifier. */
                ^
openbsd/OpenBSDProcessList.c:458:33: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
      if (opl->cpus[i].cpuIndex == id)
          ~~~~~~~~~~~~~~~~~~~~~ ^  ~~
@BenBE BenBE added this to the 3.1.0 milestone Jul 18, 2021
@fasterit
Copy link
Copy Markdown
Member

fasterit commented Aug 2, 2021

Works on MacOSX 11.3.1 (Intel CPU)
https://discussions.apple.com/thread/250559322 on how to offline CPUs / Hypertreading on MacOS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Extension or improvement to existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants