Main IDR monitoring upgrade#281
Conversation
Also fix Ansible deprecated syntax
joshmoore
left a comment
There was a problem hiding this comment.
As general, my worry when I hear "breaking" is that we won't be able to ask historical questions. Can you explain why these happened?
Otherwise, 👍 for keeping monitoring happy & healthy.
| "value": [ | ||
| "omero", | ||
| "database", | ||
| "docker" |
There was a problem hiding this comment.
It's been gone for years, ever since we separated the deployment of the VAE from IDR.
There was a problem hiding this comment.
But isn't it coming back with micro services? (and idr-ftp?)
There was a problem hiding this comment.
These are used for selecting a subset of hostnames, it's not actually anything to do with what's running on the system. We don't have nodes named .*docker.*, the current microservice works run the docker micoservices on the omero* servers since some of them require filesystem access anyway.
| "targets": [ | ||
| { | ||
| "expr": "(1 - node_filesystem_free{fstype!~\"(nfs|nfs4|overlay|rootfs|rpc_pipefs|tmpfs)\", instance=\"$hostname\"} / node_filesystem_size{fstype!~\"(nfs|nfs4|overlay|rootfs|rpc_pipefs|tmpfs)\", instance=\"$hostname\"}) * 100", | ||
| "expr": "(1 - node_filesystem_free_bytes{fstype!~\"(nfs|nfs4|overlay|rootfs|rpc_pipefs|tmpfs)\", instance=\"$hostname\"} / node_filesystem_size_bytes{fstype!~\"(nfs|nfs4|overlay|rootfs|rpc_pipefs|tmpfs)\", instance=\"$hostname\"}) * 100", |
There was a problem hiding this comment.
What's the impact of the change? Does that mean that previous values are no longer chartable?
There was a problem hiding this comment.
You'd have to create two charts, one for each metric.
|
The breaking changes are due to upstream standardisation of the metric names: |
|
Any more comments, or shall we merge this? |
Upgrades prometheus and grafana.
Partly related to #264
The main breaking changes are changes to metric names, which is why some of the Grafana dashboards are modified.