-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Host operating system: output of uname -a
# uname -a
Linux haswell 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of node_exporter --version
$ ./node_exporter --version
node_exporter, version 0.14.0 (branch: package_throttles_total, revision: 60ee361e86cb1457151753f0aa8c0da976c6bc26)
node_exporter command line flags
./node_exporter --collectors.enabled=cpu --log.level="debug"
Are you running node_exporter in Docker?
No, on physical multi-core systems.
What did you do that produced an error?
I am testing the node_cpu_core_throttles_total metric. As the metric name indicates, this is a per (physical) core metric and not a per (logical) cpu metric.
However, node_exporter currently creates two identical time series for each physical core if Hyper-Threading is enable.
# HELP node_cpu_core_throttles_total Number of times this cpu core has been throttled.
# TYPE node_cpu_core_throttles_total counter
node_cpu_core_throttles_total{cpu="cpu1"} 61
node_cpu_core_throttles_total{cpu="cpu2"} 3
node_cpu_core_throttles_total{cpu="cpu4"} 108
node_cpu_core_throttles_total{cpu="cpu9"} 49
node_cpu_core_throttles_total{cpu="cpu25"} 61
node_cpu_core_throttles_total{cpu="cpu26"} 3
node_cpu_core_throttles_total{cpu="cpu28"} 108
node_cpu_core_throttles_total{cpu="cpu33"} 49
(I've omitted the metrics with value 0.)
What did you expect to see?
I expected node_cpu_core_throttles_total metric for each physical core (24 in my case) and not for each logical cpu (48).
This creates lots of redundant time series on multi-core systems.
What did you see instead?
The /sys file system of one of my test machines looks like this:
# for i in /sys/bus/cpu/devices/cpu{0..47}/thermal_throttle/core_throttle_count; do\
echo "$i : $(cat $i)"; done | grep -vw 0
/sys/bus/cpu/devices/cpu1/thermal_throttle/core_throttle_count : 61
/sys/bus/cpu/devices/cpu2/thermal_throttle/core_throttle_count : 3
/sys/bus/cpu/devices/cpu4/thermal_throttle/core_throttle_count : 108
/sys/bus/cpu/devices/cpu9/thermal_throttle/core_throttle_count : 49
/sys/bus/cpu/devices/cpu25/thermal_throttle/core_throttle_count : 61
/sys/bus/cpu/devices/cpu26/thermal_throttle/core_throttle_count : 3
/sys/bus/cpu/devices/cpu28/thermal_throttle/core_throttle_count : 108
/sys/bus/cpu/devices/cpu33/thermal_throttle/core_throttle_count : 49
# lscpu | grep ^NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Notice how each core metric is present twice in the fs (once for each HT sibling) and the cpu collector replicates this its node_cpu_core_throttles_total metrics:
See my PR #657 for another, similar problem.