Skip to content

[Bug]: ‘alert -P’ gets an Unknown if everything is fine #65

@wattebausch

Description

@wattebausch

Please try to fill out as much of the information below as you can. Thank you!

  • Yes, I've searched similar issues on GitHub and didn't find any.

Which version contains the bug?

0.2.0 & development

Describe the bug

Hello there

We use check_prometheus in the ‘alert -P’ method. We only want to output errors. If we run without ‘-P’ the SMS/email will be too long with over 100 checks when an error occurs.

We have compared version ‘v0.2.0’ and ‘development’. We noticed that the versions behave differently with ‘alert’. However, both are the same, if our Prometheus is all set to ‘Inactive’, i.e. OK, then we get an ‘UNKNOWN’ back

Can you check if this is the same for you?
Thanks for your great work

How to recreate the bug?

Version: 0.2.0 (Check against a cluster without alerts)

./check_prometheus_v0.2.0_Linux_x86_64 -H metrics.clean-cluster.example.com -p 80 alert 
[OK] - Alerts inactive | total=123 firing=0 pending=0 inactive=123
./check_prometheus_v0.2.0_Linux_x86_64 -H metrics.clean-cluster.example.com -p 80 alert -P
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive
 | 

# echo $?
3

Version: 0.2.0 (check against a cluster with problem)

./check_prometheus_v0.2.0_Linux_x86_64 -H metrics.some-probelm-cluster.example.com -p 80 alert 
[CRITICAL] - 126 Alerts: 1 Firing - 0 Pending - 125 Inactive
 \_[OK] [FluxHelmReleaseNotReady] is inactive
 \_[OK] [FluxGitRepositorySyncFailed] is inactive
...
 \_[CRITICAL] [Watchdog] is firing - value: 1.00
 \_[OK] [InfoInhibitor] is inactive
..
 \_[OK] [K3sCertificateExpiration] is inactive
 | total=126 firing=1 pending=0 inactive=125
./check_prometheus_v0.2.0_Linux_x86_64 -H metrics.some-probelm-cluster.example.com -p 80 alert -P
[CRITICAL] - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
 \_[CRITICAL] [Watchdog] is firing - value: 1.00
 | 

Version: development (Check against a cluster without alerts)

./check_prometheus_develop -H metrics.clean-cluster.example.com -p 80 alert 
[OK] - 125 Alerts: 0 Firing - 0 Pending - 125 Inactive
\_ [OK] [FluxHelmReleaseNotReady] is inactive
\_ [OK] [FluxGitRepositorySyncFailed] is inactive
\_ [OK] [InfoInhibitor] is inactive
..
\_ [OK] [K3sCertificateExpiration] is inactive
|total=125 firing=0 pending=0 inactive=125
./check_prometheus_develop -H metrics.clean-cluster.example.com -p 80 alert -P
[UNKNOWN] - 0 Alerts: 0 Firing - 0 Pending - 0 Inactive

# echo $?
3

Version: development (check against a cluster with problem)

./check_prometheus_develop -H metrics.some-probelm-cluster.example.com  -p 80 alert 
[CRITICAL] - 126 Alerts: 1 Firing - 0 Pending - 125 Inactive
\_ [OK] [FluxHelmReleaseNotReady] is inactive
\_ [OK] [FluxGitRepositorySyncFailed] is inactive
..
\_ [CRITICAL] [Watchdog] is firing - value: 1.00
\_ [OK] [InfoInhibitor] is inactive
..
\_ [OK] [K3sCertificateExpiration] is inactive
|total=126 firing=1 pending=0 inactive=125
./check_prometheus_develop -H metrics.some-probelm-cluster.example.com  -p 80 alert -P
[CRITICAL] - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_ [CRITICAL] [Watchdog] is firing - value: 1.00

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions