An Icinga check plugin to check Prometheus.
Usage:
check_prometheus [flags]
check_prometheus [command]
Available Commands:
alert Checks the status of a Prometheus alert
health Checks the health or readiness status of the Prometheus server
query Checks the status of a Prometheus query
Flags:
-H, --hostname string Hostname of the Prometheus server (CHECK_PROMETHEUS_HOSTNAME) (default "localhost")
-p, --port int Port of the Prometheus server (default 9090)
-U, --url string URL/Path to append to the Promethes Hostname (CHECK_PROMETHEUS_URL) (default "/")
-s, --secure Use a HTTPS connection
-i, --insecure Skip the verification of the server's TLS certificate
-b, --bearer string Specify the Bearer Token for server authentication (CHECK_PROMETHEUS_BEARER)
-u, --user string Specify the user name and password for server authentication <user:password> (CHECK_PROMETHEUS_BASICAUTH)
--ca-file string Specify the CA File for TLS authentication (CHECK_PROMETHEUS_CA_FILE)
--cert-file string Specify the Certificate File for TLS authentication (CHECK_PROMETHEUS_CERT_FILE)
--key-file string Specify the Key File for TLS authentication (CHECK_PROMETHEUS_KEY_FILE)
-t, --timeout int Timeout in seconds for the CheckPlugin (default 30)
--header strings Additional HTTP header to include in the request. Can be used multiple times.
Keys and values are separated by a colon (--header "X-Custom: example").
-h, --help help for check_prometheus
-v, --version version for check_prometheusThe check plugin respects the environment variables HTTP_PROXY, HTTPS_PROXY and NO_PROXY.
Various flags can be set with environment variables, refer to the help to see which flags.
In the case Prometheus runs behind a reverse proxy, the --url parameter can be used:
# https://monitoring.example.com:443/subpath
$ check_prometheus health -H 'monitoring.example.com' --port 443 --secure --url /subpath
OK - Prometheus Server is Healthy. | statuscode=200Checks the health or readiness status of the Prometheus server.
Health: Checks the health of an endpoint, which returns OK if the Prometheus server is healthy.Ready: Checks the readiness of an endpoint, which returns OK if the Prometheus server is ready to serve traffic (i.e. respond to queries).
Usage:
check_prometheus health [flags]
Examples:
$ check_prometheus health --hostname 'localhost' --port 9090 --insecure
OK - Prometheus Server is Healthy. | statuscode=200
Flags:
-r, --ready Checks the readiness of an endpoint
-I, --info Displays various build information properties about the Prometheus server
-h, --help help for health$ check_prometheus health --hostname 'localhost' --port 9090 --insecure
OK - Prometheus Server is Healthy. | statuscode=200
$ check_prometheus health --ready
OK - Prometheus Server is Ready. | statuscode=200Checks the status of a Prometheus query and evaluates the result of the alert.
The warning and critical support thresholds in the common Nagios format (e.g. ~:10).
Note: Time range values e.G. 'go_memstats_alloc_bytes_total[10s]', only the latest value will be evaluated, other values will be ignored!
Usage:
check_prometheus query [flags]
Examples:
$ check_prometheus query -q 'go_gc_duration_seconds_count' -c 5000 -w 2000
CRITICAL - 2 Metrics: 1 Critical - 0 Warning - 1 Ok
\_[OK] go_gc_duration_seconds_count{instance="localhost:9090", job="prometheus"} - value: 1599
\_[CRITICAL] go_gc_duration_seconds_count{instance="node-exporter:9100", job="node-exporter"} - value: 79610
| value_go_gc_duration_seconds_count_localhost:9090_prometheus=1599 value_go_gc_duration_seconds_count_node-exporter:9100_node-exporter=79610
Flags:
-q, --query string An Prometheus query which will be performed and the value result will be evaluated
-w, --warning string The warning threshold for a value (default "10")
-c, --critical string The critical threshold for a value (default "20")
-h, --help help for query$ check_prometheus query -q 'go_goroutines{job="prometheus"}' -c 40 -w 27
WARNING - 1 Metrics: 0 Critical - 1 Warning - 0 Ok
\_[WARNING] go_goroutines{instance="localhost:9090", job="prometheus"} - value: 37
| value_go_goroutines_localhost:9090_prometheus=37$ check_prometheus query -q 'go_goroutines' -c 40 -w 27
WARNING - 2 Metrics: 0 Critical - 1 Warning - 1 Ok
\_[WARNING] go_goroutines{instance="localhost:9090", job="prometheus"} - value: 37
\_[OK] go_goroutines{instance="node-exporter:9100", job="node-exporter"} - value: 7
| value_go_goroutines_localhost:9090_prometheus=37 value_go_goroutines_node-exporter:9100_node-exporter=7Hint: Currently only the latest value will be evaluated, other values will be ignored.
$ check_prometheus query -q 'go_goroutines{job="prometheus"}[10s]' -c5 -w 10
CRITICAL - 1 Metrics: 1 Critical - 0 Warning - 0 Ok
\_[CRITICAL] go_goroutines{instance="localhost:9090", job="prometheus"} - value: 37
| value_go_goroutines_localhost:9090_prometheus=37
$ check_prometheus query -q 'go_goroutines[10s]' -c 50 -w 40
OK - 2 Metrics OK | value_go_goroutines_localhost:9090_prometheus=37 value_go_goroutines_node-exporter:9100_node-exporter=7Checks the status of a Prometheus alert and evaluates the status of the alert.
This plugin honors the severity label for firing alerts when determining the exit status. For firing alerts, the severity label is mapped to exit codes as follows:
| Severity Label | Exit Status |
|---|---|
critical (default) |
CRITICAL (2) |
warning, warn |
WARNING (1) |
info, informational |
OK (0) |
| (no label) | Default based on state |
The plugin checks alert-level labels first, then falls back to rule-level labels.
Note: An alternative flag-based approach (e.g., --honor-severity) was considered, but since our organization's use case always requires severity-aware exit codes, this behavior is enabled by default without additional flags.
The plugin includes summary and description annotations in the alert output when available, providing additional context for on-call engineers to quickly understand and triage alerts.
Usage:
check_prometheus alert [flags]
Examples:
$ check_prometheus alert --name "PrometheusAlertmanagerJobMissing"
CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [PrometheusAlertmanagerJobMissing] - Job: [alertmanager] is firing - value: 1.00
| firing=1 pending=0 inactive=0
$ check_prometheus a alert --name "PrometheusAlertmanagerJobMissing" --name "PrometheusTargetMissing"
CRITICAL - 2 Alerts: 1 Firing - 0 Pending - 1 Inactive
\_[OK] [PrometheusTargetMissing] is inactive
\_[CRITICAL] [PrometheusAlertmanagerJobMissing] - Job: [alertmanager] is firing - value: 1.00
| total=2 firing=1 pending=0 inactive=1
Flags:
--exclude-alert stringArray Alerts to ignore. Can be used multiple times and supports regex.
-h, --help help for alert
--label stringArray Filter alerts by label (key=value format). Only alerts with matching labels will be shown.
This parameter can be repeated e.G.: '--label hostname=server1 --label severity=critical'
If no labels are given, all alerts will be evaluated
-n, --name strings The name of one or more specific alerts to check.
This parameter can be repeated e.G.: '--name alert1 --name alert2'
If no name is given, all alerts will be evaluated
-g, --group strings The name of one or more specific groups to check.
This parameter can be repeated e.G.: '--group group1 --group group2'
If no group is given, all groups will be scanned for alerts
-T, --no-alerts-state string State to assign when no alerts are found (0, 1, 2, 3, OK, WARNING, CRITICAL, UNKNOWN). If not set this defaults to OK (default "OK")
-P, --problems Display only alerts which status is not inactive/OK. Note that in combination with the --name flag this might result in no alerts being displayed
--fetch-probe Always fetch probe response body from blackbox_exporter debug endpoint
--blackbox-url string Blackbox exporter URL (e.g., http://blackbox:9115). Required if --fetch-probe is set
--probe-module string Blackbox module name (e.g., http_2xx). Required if --fetch-probe is set
--probe-target string Target URL for probe (e.g., http://target:8080/health)$ check_prometheus alert
CRITICAL - 6 Alerts: 3 Firing - 0 Pending - 3 Inactive
\_[OK] [PrometheusTargetMissing] is inactive
\_[CRITICAL] [PrometheusAlertmanagerJobMissing] - Job: [alertmanager] is firing - value: 1.00
\_[OK] [HostOutOfMemory] - Job: [alertmanager]
\_[OK] [HostHighCpuLoad] - Job: [alertmanager]
\_[CRITICAL] [HighResultLatency] - Job: [prometheus] on Instance: [localhost:9090] is firing - value: 11.00
\_[CRITICAL] [HighResultLatency] - Job: [node-exporter] on Instance: [node-exporter:9100] is firing - value: 10.00
| total=6 firing=3 pending=0 inactive=3
$ check_prometheus alert --name "HostHighCpuLoad" --name "HighResultLatency"
CRITICAL - 3 Alerts: 2 Firing - 0 Pending - 1 Inactive
\_[OK] [HostHighCpuLoad] is inactive
\_[CRITICAL] [HighResultLatency] - Job: [prometheus] on Instance: [localhost:9090] is firing - value: 11.00
\_[CRITICAL] [HighResultLatency] - Job: [node-exporter] on Instance: [node-exporter:9100] is firing - value: 10.00
| total=3 firing=2 pending=0 inactive=1$ check_prometheus alert --name "HostHighCpuLoad" --name "PrometheusTargetMissing"
OK - Alerts inactive | total=2 firing=0 pending=0 inactive=2Use the --label flag to filter alerts by label values. This is useful when you have alerts that fire per-node and want to show each alert on its corresponding host in Icinga.
# Filter by hostname
$ check_prometheus alert --name "HighCpuUtilization" --label "hostname=server1"
CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [HighCpuUtilization] - Job: [node-exporter] on Instance: [server1:9100] is firing - value: 95.00
| total=1 firing=1 pending=0 inactive=0
# Multiple label filters (AND logic)
$ check_prometheus alert --name "ConsulServiceCritical" --label "node=server1" --label "service_name=api"Notes:
- Multiple
--labelflags use AND logic - the alert must match ALL specified labels - When label filters are specified, inactive alerts are not shown (they don't have instance-level labels)
- Label matching is exact and case-sensitive
Use the --fetch-probe flag along with blackbox configuration to include the HTTP response body in the output. This is useful for displaying detailed health information from endpoints.
$ check_prometheus alert --name "Livesegmenter" --label "hostname=server1" \
--fetch-probe \
--blackbox-url http://blackbox:9115 \
--probe-module http_2xx \
--probe-target http://server1:4999/health
OK - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
\_No alerts retrieved
| total=0 firing=0 pending=0 inactive=0
--- Health Details ---
Live Segmenter
Channel 1: ok
Channel 2: okNotes:
- Requires blackbox_exporter configured with
include_response_body: truein the module - All four flags (
--fetch-probe,--blackbox-url,--probe-module,--probe-target) must be set for probe fetching to work - See
docs/PROBE_BODY_FEATURE.mdfor detailed configuration examples
Copyright (c) 2022 NETWAYS GmbH
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see gnu.org/licenses.