-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
I have been trying out the node exporter on my machine when I noticed the rapl collector was filling up the logs on each scrape with error:
evel=error ts=2021-07-07T12:08:10.427Z caller=stdlib.go:105 caller="error gathering metrics: [from Gatherer prometheus/procfs#2] collected metric \"node_rapl_package_joules_total\" { label:<name:\"index\" value:\"0\" > counter:<value:16574" msg=".328968 > } was collected before with the same name and label values"
After dumping out some more info about the zones in the node exporter's rapl collector, I noticed that the package is being reported twice with the same index:
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=package value=16574328968
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=dram value=4388422054
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=package value=16574328968
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=core value=12677608414
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=uncore value=815362353
level=error ts=2021-07-07T12:08:10.319Z caller=rapl_linux.go:78 collector=rapl index=1 name=dram value=4388424861
level=error ts=2021-07-07T12:08:10.319Z caller=rapl_linux.go:78 collector=rapl index=0 name=psys value=36848287443
After checking out the powercap class, it looks like it is giving priority to the index embedded in the name of the RAPL zone. However, in my case, there are two zones with identical names / indices. After checking in /sys/class/powercap on my machine, I noticed I have two package-0 zones:
~ ❯ find -L /sys/class/powercap -maxdepth 2 -name 'name' -exec ls {} \; -exec cat {} \; 2>/dev/null
/sys/class/powercap/intel-rapl:1/name
psys
/sys/class/powercap/intel-rapl:0:2/name
dram
/sys/class/powercap/intel-rapl:0:0/name
core
/sys/class/powercap/intel-rapl-mmio:0:0/name
dram
/sys/class/powercap/intel-rapl:0/name
package-0
/sys/class/powercap/intel-rapl:0:1/name
uncore
/sys/class/powercap/intel-rapl-mmio:0/name
package-0
It seems then both are being parsed as package name with index 0, causing duplication and the aforementioned error on the side of node_exporter
Output of my uname -a for completeness sake:
Linux <machine-name> 4.18.0-305.3.1.el8_4.x86_64 prometheus/procfs#1 SMP Mon May 17 10:08:25 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux