Adds metrics and tests for UDP receive and send buffer errors#1534
Adds metrics and tests for UDP receive and send buffer errors#1534SuperQ merged 6 commits intoprometheus:masterfrom
Conversation
Signed-off-by: Phil Porada <philporada@gmail.com>
e59a2ef to
f5147ba
Compare
|
Per Circle CI output, I'm not entirely sure why the following test broke. |
|
The test is failing because you need to update |
SuperQ
left a comment
There was a problem hiding this comment.
LGTM, thanks. I will merge as soon as the end-to-end is fixed up.
|
Thank you for the pointers! That seems to have done the trick. |
Signed-off-by: Phil Porada <philporada@gmail.com>
66fc60f to
6d9063b
Compare
|
It's not obvious in the end-to-end test, but please add the same lines to |
Signed-off-by: Phil Porada <philporada@gmail.com>
|
Hrm, looking at some of my metrics, it seems like there's some overlap and now you've changed it to do UdpLite. The |
|
I wonder if |
|
Is there a reason to not gather UdpLite metrics? I see that I had messed up the regex instead of making it better. Switching it to This post sheds some light on UdpRcvbufErrors vs UdpInErrors netdata/netdata#4086 (comment) |
|
We're trying to avoid ingesting too many metrics, so we prefer to take the highest top level covering metric for some of these cases. If X is Y+Z, we prefer to just have X and leave it up to the user to figure out if it's Y or Z for as many cases as possible. |
|
That makes sense, I'll remove the UdpLite portion of the regex. |
|
I'd like to find some more docs or some kernel source that explains these things better. I'm not saying no to this PR, but I want to do some more detailed investigation before we finish things. |
Signed-off-by: Phil Porada <philporada@gmail.com>
Signed-off-by: Phil Porada <philporada@gmail.com>
Signed-off-by: Phil Porada <philporada@gmail.com>
|
Here are the cases where
Cases where
There are also several occurrences where |
|
@SuperQ Leaving this to you, I'm fine either way. |
|
Sorry for the long delay in review. This fell out of my TODO list. |
|
Thank you! |
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. #1279 * The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. #1393 * Refactoring of the mdadm collector changes several metrics - `node_md_disks_active` is removed - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks. - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync". * Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. #1417 * Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. #1510 * Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`. * [CHANGE] Add `--collector.netdev.device-whitelist`. #1279 * [CHANGE] Ignore iso9600 filesystem on Linux #1355 * [CHANGE] Refactor mdadm collector #1403 * [CHANGE] Add `mountaddr` label to NFS metrics. #1417 * [CHANGE] Don't count empty collectors as success. #1613 * [FEATURE] New flag to disable default collectors #1276 * [FEATURE] Add experimental TLS support #1277, #1687, #1695 * [FEATURE] Add collector for Power Supply Class #1280 * [FEATURE] Add new schedstat collector #1389 * [FEATURE] Add FreeBSD zfs support #1394 * [FEATURE] Add uname support for Darwin and OpenBSD #1433 * [FEATURE] Add new metric node_cpu_info #1489 * [FEATURE] Add new thermal_zone collector #1425 * [FEATURE] Add new cooling_device metrics to thermal zone collector #1445 * [FEATURE] Add swap usage on darwin #1508 * [FEATURE] Add Btrfs collector #1512 * [FEATURE] Add RAPL collector #1523 * [FEATURE] Add new softnet collector #1576 * [FEATURE] Add new udp_queues collector #1503 * [FEATURE] Add basic authentication #1673 * [ENHANCEMENT] Log pid when there is a problem reading the process stats #1341 * [ENHANCEMENT] Collect InfiniBand port state and physical state #1357 * [ENHANCEMENT] Include additional XFS runtime statistics. #1423 * [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. #1439 * [ENHANCEMENT] Expose IPVS firewall mark as a label #1455 * [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. #1413 * [ENHANCEMENT] Add a flag to adjust mount timeout #1486 * [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 #1548 * [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors #1534 * [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. #1552 * [ENHANCEMENT] Add infiniband info metric #1563 * [ENHANCEMENT] Add unix socket support for supervisord collector #1592 * [ENHANCEMENT] Implement loadavg on all BSDs without cgo #1584 * [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric #1617 * [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. #1561 * [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. #1636 * [ENHANCEMENT] Add perf tracepoint collection flag #1664 * [ENHANCEMENT] ZFS: read contents of objset file #1632 * [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing #1711 * [BUGFIX] Read /proc/net files with a single read syscall #1380 * [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. #1393 * [BUGFIX] Fix netdev nil reference on Darwin #1414 * [BUGFIX] Strip path.rootfs from mountpoint labels #1421 * [BUGFIX] Fix seconds reported by schedstat #1426 * [BUGFIX] Fix empty string in path.rootfs #1464 * [BUGFIX] Fix typo in cpufreq metric names #1510 * [BUGFIX] Read /proc/stat in one syscall #1538 * [BUGFIX] Fix OpenBSD cache memory information #1542 * [BUGFIX] Refactor textfile collector to avoid looping defer #1549 * [BUGFIX] Fix network speed math #1580 * [BUGFIX] collector/systemd: use regexp to extract systemd version #1647 * [BUGFIX] Fix initialization in perf collector when using multiple CPUs #1665 * [BUGFIX] Fix accidentally empty lines in meminfo_linux #1671 Signed-off-by: Ben Kochie <superq@gmail.com>
…heus#1534) * Adds metrics for UDP receive and send buffer errors Signed-off-by: Phil Porada <philporada@gmail.com>
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. prometheus#1279 * The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. prometheus#1393 * Refactoring of the mdadm collector changes several metrics - `node_md_disks_active` is removed - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks. - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync". * Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. prometheus#1417 * Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. prometheus#1510 * Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`. * [CHANGE] Add `--collector.netdev.device-whitelist`. prometheus#1279 * [CHANGE] Ignore iso9600 filesystem on Linux prometheus#1355 * [CHANGE] Refactor mdadm collector prometheus#1403 * [CHANGE] Add `mountaddr` label to NFS metrics. prometheus#1417 * [CHANGE] Don't count empty collectors as success. prometheus#1613 * [FEATURE] New flag to disable default collectors prometheus#1276 * [FEATURE] Add experimental TLS support prometheus#1277, prometheus#1687, prometheus#1695 * [FEATURE] Add collector for Power Supply Class prometheus#1280 * [FEATURE] Add new schedstat collector prometheus#1389 * [FEATURE] Add FreeBSD zfs support prometheus#1394 * [FEATURE] Add uname support for Darwin and OpenBSD prometheus#1433 * [FEATURE] Add new metric node_cpu_info prometheus#1489 * [FEATURE] Add new thermal_zone collector prometheus#1425 * [FEATURE] Add new cooling_device metrics to thermal zone collector prometheus#1445 * [FEATURE] Add swap usage on darwin prometheus#1508 * [FEATURE] Add Btrfs collector prometheus#1512 * [FEATURE] Add RAPL collector prometheus#1523 * [FEATURE] Add new softnet collector prometheus#1576 * [FEATURE] Add new udp_queues collector prometheus#1503 * [FEATURE] Add basic authentication prometheus#1673 * [ENHANCEMENT] Log pid when there is a problem reading the process stats prometheus#1341 * [ENHANCEMENT] Collect InfiniBand port state and physical state prometheus#1357 * [ENHANCEMENT] Include additional XFS runtime statistics. prometheus#1423 * [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. prometheus#1439 * [ENHANCEMENT] Expose IPVS firewall mark as a label prometheus#1455 * [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. prometheus#1413 * [ENHANCEMENT] Add a flag to adjust mount timeout prometheus#1486 * [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 prometheus#1548 * [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors prometheus#1534 * [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. prometheus#1552 * [ENHANCEMENT] Add infiniband info metric prometheus#1563 * [ENHANCEMENT] Add unix socket support for supervisord collector prometheus#1592 * [ENHANCEMENT] Implement loadavg on all BSDs without cgo prometheus#1584 * [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric prometheus#1617 * [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. prometheus#1561 * [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. prometheus#1636 * [ENHANCEMENT] Add perf tracepoint collection flag prometheus#1664 * [ENHANCEMENT] ZFS: read contents of objset file prometheus#1632 * [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing prometheus#1711 * [BUGFIX] Read /proc/net files with a single read syscall prometheus#1380 * [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. prometheus#1393 * [BUGFIX] Fix netdev nil reference on Darwin prometheus#1414 * [BUGFIX] Strip path.rootfs from mountpoint labels prometheus#1421 * [BUGFIX] Fix seconds reported by schedstat prometheus#1426 * [BUGFIX] Fix empty string in path.rootfs prometheus#1464 * [BUGFIX] Fix typo in cpufreq metric names prometheus#1510 * [BUGFIX] Read /proc/stat in one syscall prometheus#1538 * [BUGFIX] Fix OpenBSD cache memory information prometheus#1542 * [BUGFIX] Refactor textfile collector to avoid looping defer prometheus#1549 * [BUGFIX] Fix network speed math prometheus#1580 * [BUGFIX] collector/systemd: use regexp to extract systemd version prometheus#1647 * [BUGFIX] Fix initialization in perf collector when using multiple CPUs prometheus#1665 * [BUGFIX] Fix accidentally empty lines in meminfo_linux prometheus#1671 Signed-off-by: Ben Kochie <superq@gmail.com>
…heus#1534) * Adds metrics for UDP receive and send buffer errors Signed-off-by: Phil Porada <philporada@gmail.com>
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. prometheus#1279 * The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. prometheus#1393 * Refactoring of the mdadm collector changes several metrics - `node_md_disks_active` is removed - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks. - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync". * Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. prometheus#1417 * Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. prometheus#1510 * Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`. * [CHANGE] Add `--collector.netdev.device-whitelist`. prometheus#1279 * [CHANGE] Ignore iso9600 filesystem on Linux prometheus#1355 * [CHANGE] Refactor mdadm collector prometheus#1403 * [CHANGE] Add `mountaddr` label to NFS metrics. prometheus#1417 * [CHANGE] Don't count empty collectors as success. prometheus#1613 * [FEATURE] New flag to disable default collectors prometheus#1276 * [FEATURE] Add experimental TLS support prometheus#1277, prometheus#1687, prometheus#1695 * [FEATURE] Add collector for Power Supply Class prometheus#1280 * [FEATURE] Add new schedstat collector prometheus#1389 * [FEATURE] Add FreeBSD zfs support prometheus#1394 * [FEATURE] Add uname support for Darwin and OpenBSD prometheus#1433 * [FEATURE] Add new metric node_cpu_info prometheus#1489 * [FEATURE] Add new thermal_zone collector prometheus#1425 * [FEATURE] Add new cooling_device metrics to thermal zone collector prometheus#1445 * [FEATURE] Add swap usage on darwin prometheus#1508 * [FEATURE] Add Btrfs collector prometheus#1512 * [FEATURE] Add RAPL collector prometheus#1523 * [FEATURE] Add new softnet collector prometheus#1576 * [FEATURE] Add new udp_queues collector prometheus#1503 * [FEATURE] Add basic authentication prometheus#1673 * [ENHANCEMENT] Log pid when there is a problem reading the process stats prometheus#1341 * [ENHANCEMENT] Collect InfiniBand port state and physical state prometheus#1357 * [ENHANCEMENT] Include additional XFS runtime statistics. prometheus#1423 * [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. prometheus#1439 * [ENHANCEMENT] Expose IPVS firewall mark as a label prometheus#1455 * [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. prometheus#1413 * [ENHANCEMENT] Add a flag to adjust mount timeout prometheus#1486 * [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 prometheus#1548 * [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors prometheus#1534 * [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. prometheus#1552 * [ENHANCEMENT] Add infiniband info metric prometheus#1563 * [ENHANCEMENT] Add unix socket support for supervisord collector prometheus#1592 * [ENHANCEMENT] Implement loadavg on all BSDs without cgo prometheus#1584 * [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric prometheus#1617 * [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. prometheus#1561 * [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. prometheus#1636 * [ENHANCEMENT] Add perf tracepoint collection flag prometheus#1664 * [ENHANCEMENT] ZFS: read contents of objset file prometheus#1632 * [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing prometheus#1711 * [BUGFIX] Read /proc/net files with a single read syscall prometheus#1380 * [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. prometheus#1393 * [BUGFIX] Fix netdev nil reference on Darwin prometheus#1414 * [BUGFIX] Strip path.rootfs from mountpoint labels prometheus#1421 * [BUGFIX] Fix seconds reported by schedstat prometheus#1426 * [BUGFIX] Fix empty string in path.rootfs prometheus#1464 * [BUGFIX] Fix typo in cpufreq metric names prometheus#1510 * [BUGFIX] Read /proc/stat in one syscall prometheus#1538 * [BUGFIX] Fix OpenBSD cache memory information prometheus#1542 * [BUGFIX] Refactor textfile collector to avoid looping defer prometheus#1549 * [BUGFIX] Fix network speed math prometheus#1580 * [BUGFIX] collector/systemd: use regexp to extract systemd version prometheus#1647 * [BUGFIX] Fix initialization in perf collector when using multiple CPUs prometheus#1665 * [BUGFIX] Fix accidentally empty lines in meminfo_linux prometheus#1671 Signed-off-by: Ben Kochie <superq@gmail.com>

Context: https://jvns.ca/blog/2016/08/24/find-out-where-youre-dropping-packets/