Add a new ethtool stats collector#1832
Add a new ethtool stats collector#1832SuperQ merged 5 commits intoprometheus:masterfrom ventifus:master
Conversation
|
Interesting, last time I looked into ethtool metrics, everything required root access. We don't allow collectors that require privileged system access in the node_exporter. It's expected to work fine To answer your other questions:
|
|
Thanks for the feedback, I'll modify it so to include a dependency on safchain/ethtool. Likewise switch to UntypedValue. I tested non-root access and the only system I found where it didn't work was CentOS 6. On CentOS 7 and 8 it works fine unprivileged. I'll see if I can find out specifically in which kernel version this changed. |
|
Thanks, documenting where it does/doesn't work is probably good enough. When you think this is ready for more testing, let me know. One thing that would be nice is if we could have tests for this. |
|
The removal of CAP_NET_ADMIN from querying ethtool was in commit: 0fdc100bdc4b7ab61ed632962c76dfe539047296 ethtool: allow non-netadmin to query settings The mentioned commit landed first in the 2.6.37, so CentOS 6 just missed out. It's only going to be with us until November, and no other still supported LTS distro I checked (Ubuntu, Debian, SUSE) still runs a 2.x kernel. I'll look into writing tests next. |
|
This would be a nice feature to have indeed! Would be really nice if it was integrated directly into node_exporter. |
|
Thinking about this some more, I don't like the use of dumping random stats as different labels. Labels are meant to have aggregation meanings, not just tag random things. Similar to other collectors, I think we need to parse the list of returned stats and convert them into meaningful metric names. |
|
Thank you for the feedback! Would just concatenating metric names like node_ethtool_stats_d_rx_extra_frags_ring2 be acceptable? It would be challenging to make them conform to the OpenMetrics style suffixes (_count, _sum, etc) since they are driver-dependent. |
|
I've changed the metric names to now be a concatenation of "node_ethtool" plus ethtool's stat name. Metrics now look like: How does it look to you now? |
|
How do all metrics look like now? Even better, can you add tests for this? |
|
And now there are tests! You can see what the metrics look like in |
|
Almost there folks, lets make it happen! |
…htool -S") Signed-off-by: W. Andrew Denton <git@flying-snail.net>
|
I've refactored the tests to use an interface to ethtool to allow swapping out the real tool for the test fixture. What do you think now? The bulk of the fixture stuff does still need to live with the main code to allow it to be activated at runtime for |
|
FYI, kernel 5.13 will support standardized ethtool statistics: It makes use of the new ethtool netlink interface, not the ioctl interface used here. |
|
This still leaks test stuff into the main binary which I'd like to avoid. I don't have time to provide more detailed guides though. Maybe best to ask on the mailinglist for help. |
|
Hi @discordianfish , including test stuff into the main binary is required for "end-to-end-test.sh" support. I can remove that code and make the tests entirely separate if that's preferable. |
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
|
@idosch thanks for the note, I'll see about getting support for that into mdlayher/ethtool which will let me add them here. |
|
Just realized we have prior art for this. So guess that's fine then for now: node_exporter/collector/wifi_linux.go Line 52 in 46cdf61 |
|
Right, the wifi and qdisc collectors are like that. Anything that wants to be runtime testable and needs data outside the /proc and /sys filesystems will need something similar. |
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
|
Any ETA this can be pulled into master? |
Signed-off-by: W. Andrew Denton <git@flying-snail.net> Co-authored-by: Manuel Rüger <manuel@rueg.eu>
|
Getting this error when I'm trying to build it: |
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors #2012 * [CHANGE] Make node_exporter print usage to STDOUT #2039 * [FEATURE] Add conntrack statistics metrics #1155 * [FEATURE] Add ethtool stats collector #1832 * [FEATURE] Add flag to ignore network speed if it is unknown #1989 * [FEATURE] Add tapestats collector for Linux #2044 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887 * [ENHANCEMENT] Add time zone offset metric #2060 * [BUGFIX] Add ErrorLog plumbing to promhttp #1887 * [BUGFIX] Handle errors from disabled PSI subsystem #1983 * [BUGFIX] Fix panic when using backwards compatible flags #2000 * [BUGFIX] Only initiate collectors once #2048 * [BUGFIX] Handle small backwards jumps in CPU idle #2067 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors #2012 * [CHANGE] Make node_exporter print usage to STDOUT #2039 * [FEATURE] Add conntrack statistics metrics #1155 * [FEATURE] Add ethtool stats collector #1832 * [FEATURE] Add flag to ignore network speed if it is unknown #1989 * [FEATURE] Add tapestats collector for Linux #2044 * [FEATURE] Add nvme collector #2062 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887 * [ENHANCEMENT] Add more Infiniband counters #2019 * [ENHANCEMENT] netclass: retrieve interface names and filter before parsing #2033 * [ENHANCEMENT] Add time zone offset metric #2060 * [BUGFIX] Handle errors from disabled PSI subsystem #1983 * [BUGFIX] Fix panic when using backwards compatible flags #2000 * [BUGFIX] Fix wrong value for OpenBSD memory buffer cache #2015 * [BUGFIX] Only initiate collectors once #2048 * [BUGFIX] Handle small backwards jumps in CPU idle #2067 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012 * [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039 * [FEATURE] Add conntrack statistics metrics prometheus#1155 * [FEATURE] Add ethtool stats collector prometheus#1832 * [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989 * [FEATURE] Add tapestats collector for Linux prometheus#2044 * [FEATURE] Add nvme collector prometheus#2062 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887 * [ENHANCEMENT] Add more Infiniband counters prometheus#2019 * [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033 * [ENHANCEMENT] Add time zone offset metric prometheus#2060 * [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983 * [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000 * [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015 * [BUGFIX] Only initiate collectors once prometheus#2048 * [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012 * [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039 * [FEATURE] Add conntrack statistics metrics prometheus#1155 * [FEATURE] Add ethtool stats collector prometheus#1832 * [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989 * [FEATURE] Add tapestats collector for Linux prometheus#2044 * [FEATURE] Add nvme collector prometheus#2062 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887 * [ENHANCEMENT] Add more Infiniband counters prometheus#2019 * [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033 * [ENHANCEMENT] Add time zone offset metric prometheus#2060 * [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983 * [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000 * [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015 * [BUGFIX] Only initiate collectors once prometheus#2048 * [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067 Signed-off-by: Ben Kochie <superq@gmail.com>
This adds a new collector for Linux to gather "ethtool" stats, the same as "ethtool -S" provides. This is collected by using the
SIOCETHTOOLioctl.Some questions and comments:
Here is some example output:
Signed-off-by: W Andrew Denton ventifus@flying-snail.net