Skip to content

inhibit: update inhibition cache when alerts resolve#1309

Closed
simonpasquier wants to merge 4 commits intoprometheus:masterfrom
simonpasquier:fix-inhibition-cache
Closed

inhibit: update inhibition cache when alerts resolve#1309
simonpasquier wants to merge 4 commits intoprometheus:masterfrom
simonpasquier:fix-inhibition-cache

Conversation

@simonpasquier
Copy link
Member

@stuartnelson3 this is the fix for #1153.

I've added another acceptance scenario on inhibition where inhibiting and inhibited alerts fire and resolve together.

@brancz
Copy link
Member

brancz commented Apr 3, 2018

/cc @fabxc

Copy link
Contributor

@stuartnelson3 stuartnelson3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test definitely reliably fails when removing the code you added, which is great. However, the only code change in this PR that has an effect on the test (from running locally on my machine) is the check for a.Resolved(). All of the other code changes can be removed, and the acceptance tests still pass.

I'll admit, I'm confused. Reading the minimum working code from above that passes the test, I'm not sure what would really be happening by only removing the continue if a.Resolved() == true in the run() method.

// Update the inhibition rules' cache.
for _, r := range ih.rules {
if r.SourceMatchers.Match(a.Labels) {
if r.exists(a) || r.SourceMatchers.Match(a.Labels) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of r.exists(a)? It seems like from the code, if it exists, then it gets set (which doesn't accomplish anything, since it already exists?)

Running the tests without this code still passes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the only case where r.exists(a) would be false is when AlertManager processes a resolved alert that hasn't been seen as firing previously (eg after a restart). Not sure it is worth to keep it then.

// Only inhibit if target matchers match but source matchers don't.
if inhibitedByFP, eq := r.hasEqual(lset); !r.SourceMatchers.Match(lset) && r.TargetMatchers.Match(lset) && eq {
ih.marker.SetInhibited(fp, fmt.Sprintf("%d", inhibitedByFP))
ih.marker.SetInhibited(fp, fmt.Sprintf("%s", inhibitedByFP.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can just be ih.marker.SetInhibited(fp, inhibitedByFP.String())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also appears that the original code works as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yes the original code works but the API returns different fingerprint formats between silenced and inhibited alerts:

      ...snip...
      "status": {
        "state": "suppressed",
        "silencedBy": [
          "95c9789c-f7f6-4d57-99d3-66c9eb25a4b8"
        ],
        "inhibitedBy": []
      },
      ...snip..
      ...snip...
      "status": {
        "state": "suppressed",
        "silencedBy": [],
        "inhibitedBy": [
          "12124264373735012735"
        ]
      },
      ...snip...

Copy link
Member Author

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartnelson3 thanks for the review! I've replied to your remarks.

I'm also extending the unit tests for the inhibit package so that we don't solely rely on acceptance testing for this code.

// Only inhibit if target matchers match but source matchers don't.
if inhibitedByFP, eq := r.hasEqual(lset); !r.SourceMatchers.Match(lset) && r.TargetMatchers.Match(lset) && eq {
ih.marker.SetInhibited(fp, fmt.Sprintf("%d", inhibitedByFP))
ih.marker.SetInhibited(fp, fmt.Sprintf("%s", inhibitedByFP.String()))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yes the original code works but the API returns different fingerprint formats between silenced and inhibited alerts:

      ...snip...
      "status": {
        "state": "suppressed",
        "silencedBy": [
          "95c9789c-f7f6-4d57-99d3-66c9eb25a4b8"
        ],
        "inhibitedBy": []
      },
      ...snip..
      ...snip...
      "status": {
        "state": "suppressed",
        "silencedBy": [],
        "inhibitedBy": [
          "12124264373735012735"
        ]
      },
      ...snip...

// Update the inhibition rules' cache.
for _, r := range ih.rules {
if r.SourceMatchers.Match(a.Labels) {
if r.exists(a) || r.SourceMatchers.Match(a.Labels) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the only case where r.exists(a) would be false is when AlertManager processes a resolved alert that hasn't been seen as firing previously (eg after a restart). Not sure it is worth to keep it then.


// NewInhibitor returns a new Inhibitor.
func NewInhibitor(ap provider.Alerts, rs []*config.InhibitRule, mk types.Marker, logger log.Logger) *Inhibitor {
if logger == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier As far as I can tell, logger is only nil for tests, right? I would prefer to create a proper logger in the test code and pass it to NewInhibitor instead of bloating the real code here. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger is only nil for tests

yes. Agree with your recommendation.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@stuartnelson3
Copy link
Contributor

fixed in #1331

@simonpasquier simonpasquier deleted the fix-inhibition-cache branch April 19, 2018 13:06
hh pushed a commit to ii/alertmanager that referenced this pull request Apr 11, 2019
* Add nvme_metrics.sh text collector example

Signed-off-by: Henk <henk@wearespindle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants