Add gauges for allocated memory for queued UDP and TCP packages#1503
Conversation
|
One hint, you can always |
|
Yes, thank you. This was also my intention, but I think I did something wrong with reset and rebase. So, I got also the changes from the infiniband-commits into my first PR. Couldn't get rid of them... |
|
Thanks! So all the parsing should eventually be moved to https://github.com/prometheus/procfs |
…ngths. @pgier this belongs to the requested of @discordianfish in prometheus/node_exporter#1503. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
c57aa5b to
2fa2286
Compare
|
Currently waiting for a release of https://github.com/prometheus/procfs (esp. https://github.com/prometheus/procfs/tree/496ec92a4c3ca16a9902b10e44fc80628ed72745) BTW. tested it locally: |
2fa2286 to
5f2fe80
Compare
|
@SuperQ and @discordianfish, now all discussed changes are inside. Can you please have a look, if now everything lgty? |
|
Hi @discordianfish, Hi @SuperQ, anything else you need, so that this PR can be merged? |
|
Two small things I forgot to ask about. Please add a changelog entry Please update the enabled collectors in the end-to-end.test.sh script, and update the test result fixtures. |
…d_bytes and tx_queued_bytes. For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue. @SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing. The names of the new TCP states and the UDP metric can be discussed. The current reasons are just: I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector. I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
…cfs master branch as the dependencies for net_udp is not released. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
…llection of ipv4 based queues or ipv6 based queues. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
…gelog. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
ab0b102 to
b77425c
Compare
|
Hi @discordianfish or @SuperQ, something else needed to merge this? |
SuperQ
left a comment
There was a problem hiding this comment.
Sorry, just busy. LGTM, Thanks!
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. #1279 * The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. #1393 * Refactoring of the mdadm collector changes several metrics - `node_md_disks_active` is removed - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks. - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync". * Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. #1417 * Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. #1510 * Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`. * [CHANGE] Add `--collector.netdev.device-whitelist`. #1279 * [CHANGE] Ignore iso9600 filesystem on Linux #1355 * [CHANGE] Refactor mdadm collector #1403 * [CHANGE] Add `mountaddr` label to NFS metrics. #1417 * [CHANGE] Don't count empty collectors as success. #1613 * [FEATURE] New flag to disable default collectors #1276 * [FEATURE] Add experimental TLS support #1277, #1687, #1695 * [FEATURE] Add collector for Power Supply Class #1280 * [FEATURE] Add new schedstat collector #1389 * [FEATURE] Add FreeBSD zfs support #1394 * [FEATURE] Add uname support for Darwin and OpenBSD #1433 * [FEATURE] Add new metric node_cpu_info #1489 * [FEATURE] Add new thermal_zone collector #1425 * [FEATURE] Add new cooling_device metrics to thermal zone collector #1445 * [FEATURE] Add swap usage on darwin #1508 * [FEATURE] Add Btrfs collector #1512 * [FEATURE] Add RAPL collector #1523 * [FEATURE] Add new softnet collector #1576 * [FEATURE] Add new udp_queues collector #1503 * [FEATURE] Add basic authentication #1673 * [ENHANCEMENT] Log pid when there is a problem reading the process stats #1341 * [ENHANCEMENT] Collect InfiniBand port state and physical state #1357 * [ENHANCEMENT] Include additional XFS runtime statistics. #1423 * [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. #1439 * [ENHANCEMENT] Expose IPVS firewall mark as a label #1455 * [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. #1413 * [ENHANCEMENT] Add a flag to adjust mount timeout #1486 * [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 #1548 * [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors #1534 * [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. #1552 * [ENHANCEMENT] Add infiniband info metric #1563 * [ENHANCEMENT] Add unix socket support for supervisord collector #1592 * [ENHANCEMENT] Implement loadavg on all BSDs without cgo #1584 * [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric #1617 * [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. #1561 * [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. #1636 * [ENHANCEMENT] Add perf tracepoint collection flag #1664 * [ENHANCEMENT] ZFS: read contents of objset file #1632 * [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing #1711 * [BUGFIX] Read /proc/net files with a single read syscall #1380 * [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. #1393 * [BUGFIX] Fix netdev nil reference on Darwin #1414 * [BUGFIX] Strip path.rootfs from mountpoint labels #1421 * [BUGFIX] Fix seconds reported by schedstat #1426 * [BUGFIX] Fix empty string in path.rootfs #1464 * [BUGFIX] Fix typo in cpufreq metric names #1510 * [BUGFIX] Read /proc/stat in one syscall #1538 * [BUGFIX] Fix OpenBSD cache memory information #1542 * [BUGFIX] Refactor textfile collector to avoid looping defer #1549 * [BUGFIX] Fix network speed math #1580 * [BUGFIX] collector/systemd: use regexp to extract systemd version #1647 * [BUGFIX] Fix initialization in perf collector when using multiple CPUs #1665 * [BUGFIX] Fix accidentally empty lines in meminfo_linux #1671 Signed-off-by: Ben Kochie <superq@gmail.com>
…ngths. @pgier this belongs to the requested of @discordianfish in prometheus/node_exporter#1503. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
…etheus#1503) * Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes. For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue. @SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing. The names of the new TCP states and the UDP metric can be discussed. The current reasons are just: I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector. I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. prometheus#1279 * The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. prometheus#1393 * Refactoring of the mdadm collector changes several metrics - `node_md_disks_active` is removed - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks. - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync". * Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. prometheus#1417 * Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. prometheus#1510 * Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`. * [CHANGE] Add `--collector.netdev.device-whitelist`. prometheus#1279 * [CHANGE] Ignore iso9600 filesystem on Linux prometheus#1355 * [CHANGE] Refactor mdadm collector prometheus#1403 * [CHANGE] Add `mountaddr` label to NFS metrics. prometheus#1417 * [CHANGE] Don't count empty collectors as success. prometheus#1613 * [FEATURE] New flag to disable default collectors prometheus#1276 * [FEATURE] Add experimental TLS support prometheus#1277, prometheus#1687, prometheus#1695 * [FEATURE] Add collector for Power Supply Class prometheus#1280 * [FEATURE] Add new schedstat collector prometheus#1389 * [FEATURE] Add FreeBSD zfs support prometheus#1394 * [FEATURE] Add uname support for Darwin and OpenBSD prometheus#1433 * [FEATURE] Add new metric node_cpu_info prometheus#1489 * [FEATURE] Add new thermal_zone collector prometheus#1425 * [FEATURE] Add new cooling_device metrics to thermal zone collector prometheus#1445 * [FEATURE] Add swap usage on darwin prometheus#1508 * [FEATURE] Add Btrfs collector prometheus#1512 * [FEATURE] Add RAPL collector prometheus#1523 * [FEATURE] Add new softnet collector prometheus#1576 * [FEATURE] Add new udp_queues collector prometheus#1503 * [FEATURE] Add basic authentication prometheus#1673 * [ENHANCEMENT] Log pid when there is a problem reading the process stats prometheus#1341 * [ENHANCEMENT] Collect InfiniBand port state and physical state prometheus#1357 * [ENHANCEMENT] Include additional XFS runtime statistics. prometheus#1423 * [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. prometheus#1439 * [ENHANCEMENT] Expose IPVS firewall mark as a label prometheus#1455 * [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. prometheus#1413 * [ENHANCEMENT] Add a flag to adjust mount timeout prometheus#1486 * [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 prometheus#1548 * [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors prometheus#1534 * [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. prometheus#1552 * [ENHANCEMENT] Add infiniband info metric prometheus#1563 * [ENHANCEMENT] Add unix socket support for supervisord collector prometheus#1592 * [ENHANCEMENT] Implement loadavg on all BSDs without cgo prometheus#1584 * [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric prometheus#1617 * [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. prometheus#1561 * [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. prometheus#1636 * [ENHANCEMENT] Add perf tracepoint collection flag prometheus#1664 * [ENHANCEMENT] ZFS: read contents of objset file prometheus#1632 * [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing prometheus#1711 * [BUGFIX] Read /proc/net files with a single read syscall prometheus#1380 * [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. prometheus#1393 * [BUGFIX] Fix netdev nil reference on Darwin prometheus#1414 * [BUGFIX] Strip path.rootfs from mountpoint labels prometheus#1421 * [BUGFIX] Fix seconds reported by schedstat prometheus#1426 * [BUGFIX] Fix empty string in path.rootfs prometheus#1464 * [BUGFIX] Fix typo in cpufreq metric names prometheus#1510 * [BUGFIX] Read /proc/stat in one syscall prometheus#1538 * [BUGFIX] Fix OpenBSD cache memory information prometheus#1542 * [BUGFIX] Refactor textfile collector to avoid looping defer prometheus#1549 * [BUGFIX] Fix network speed math prometheus#1580 * [BUGFIX] collector/systemd: use regexp to extract systemd version prometheus#1647 * [BUGFIX] Fix initialization in perf collector when using multiple CPUs prometheus#1665 * [BUGFIX] Fix accidentally empty lines in meminfo_linux prometheus#1671 Signed-off-by: Ben Kochie <superq@gmail.com>
…etheus#1503) * Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes. For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue. @SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing. The names of the new TCP states and the UDP metric can be discussed. The current reasons are just: I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector. I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state. Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. prometheus#1279 * The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. prometheus#1393 * Refactoring of the mdadm collector changes several metrics - `node_md_disks_active` is removed - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks. - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync". * Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. prometheus#1417 * Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. prometheus#1510 * Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`. * [CHANGE] Add `--collector.netdev.device-whitelist`. prometheus#1279 * [CHANGE] Ignore iso9600 filesystem on Linux prometheus#1355 * [CHANGE] Refactor mdadm collector prometheus#1403 * [CHANGE] Add `mountaddr` label to NFS metrics. prometheus#1417 * [CHANGE] Don't count empty collectors as success. prometheus#1613 * [FEATURE] New flag to disable default collectors prometheus#1276 * [FEATURE] Add experimental TLS support prometheus#1277, prometheus#1687, prometheus#1695 * [FEATURE] Add collector for Power Supply Class prometheus#1280 * [FEATURE] Add new schedstat collector prometheus#1389 * [FEATURE] Add FreeBSD zfs support prometheus#1394 * [FEATURE] Add uname support for Darwin and OpenBSD prometheus#1433 * [FEATURE] Add new metric node_cpu_info prometheus#1489 * [FEATURE] Add new thermal_zone collector prometheus#1425 * [FEATURE] Add new cooling_device metrics to thermal zone collector prometheus#1445 * [FEATURE] Add swap usage on darwin prometheus#1508 * [FEATURE] Add Btrfs collector prometheus#1512 * [FEATURE] Add RAPL collector prometheus#1523 * [FEATURE] Add new softnet collector prometheus#1576 * [FEATURE] Add new udp_queues collector prometheus#1503 * [FEATURE] Add basic authentication prometheus#1673 * [ENHANCEMENT] Log pid when there is a problem reading the process stats prometheus#1341 * [ENHANCEMENT] Collect InfiniBand port state and physical state prometheus#1357 * [ENHANCEMENT] Include additional XFS runtime statistics. prometheus#1423 * [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. prometheus#1439 * [ENHANCEMENT] Expose IPVS firewall mark as a label prometheus#1455 * [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. prometheus#1413 * [ENHANCEMENT] Add a flag to adjust mount timeout prometheus#1486 * [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 prometheus#1548 * [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors prometheus#1534 * [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. prometheus#1552 * [ENHANCEMENT] Add infiniband info metric prometheus#1563 * [ENHANCEMENT] Add unix socket support for supervisord collector prometheus#1592 * [ENHANCEMENT] Implement loadavg on all BSDs without cgo prometheus#1584 * [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric prometheus#1617 * [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. prometheus#1561 * [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. prometheus#1636 * [ENHANCEMENT] Add perf tracepoint collection flag prometheus#1664 * [ENHANCEMENT] ZFS: read contents of objset file prometheus#1632 * [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing prometheus#1711 * [BUGFIX] Read /proc/net files with a single read syscall prometheus#1380 * [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. prometheus#1393 * [BUGFIX] Fix netdev nil reference on Darwin prometheus#1414 * [BUGFIX] Strip path.rootfs from mountpoint labels prometheus#1421 * [BUGFIX] Fix seconds reported by schedstat prometheus#1426 * [BUGFIX] Fix empty string in path.rootfs prometheus#1464 * [BUGFIX] Fix typo in cpufreq metric names prometheus#1510 * [BUGFIX] Read /proc/stat in one syscall prometheus#1538 * [BUGFIX] Fix OpenBSD cache memory information prometheus#1542 * [BUGFIX] Refactor textfile collector to avoid looping defer prometheus#1549 * [BUGFIX] Fix network speed math prometheus#1580 * [BUGFIX] collector/systemd: use regexp to extract systemd version prometheus#1647 * [BUGFIX] Fix initialization in perf collector when using multiple CPUs prometheus#1665 * [BUGFIX] Fix accidentally empty lines in meminfo_linux prometheus#1671 Signed-off-by: Ben Kochie <superq@gmail.com>



@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:
Signed-off-by: Peter Bueschel peter.bueschel@logmein.com