Skip to content

Fix resolved alerts still inhibiting#1331

Merged
stuartnelson3 merged 6 commits intomasterfrom
stn/inhibition-cache
Apr 18, 2018
Merged

Fix resolved alerts still inhibiting#1331
stuartnelson3 merged 6 commits intomasterfrom
stn/inhibition-cache

Conversation

@stuartnelson3
Copy link
Contributor

@stuartnelson3 stuartnelson3 commented Apr 17, 2018

Update old alert with result of merge with new

On ingest, alerts with matching fingerprints are
merged if the new alert's start and end times
overlap with the old alert's.

The merge creates a new alert, which is then
updated in the internal alert store.

The original alert is not updated (because merge
creates a copy), so it is never marked as resolved
in the inhibitor's reference to it.

The code within the inhibitor relies on skipping
over resolved alerts, but because the old alert is
never updated it is never marked as resolved. Thus
it continues to inhibit other alerts until it is
cleaned up by the internal GC.

This commit updates the struct of the old alert
with the result of the merge with the new alert.

An alternative would be to always update the
inhibitor's internal cache of alerts regardless of
an alert's resolve status.

This fixes #1153. Thanks to @simonpasquier for doing the bulk of the work :)

EDIT:
This PR has been modified to always update the inhibitors internal cache, whether the incoming alert is resolved or not.

simonpasquier and others added 5 commits April 17, 2018 16:23
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
On ingest, alerts with matching fingerprints are
merged if the new alert's start and end times
overlap with the old alert's.

The merge creates a new alert, which is then
updated in the internal alert store.

The original alert is not updated (because merge
creates a copy), so it is never marked as resolved
in the inhibitor's reference to it.

The code within the inhibitor relies on skipping
over resolved alerts, but because the old alert is
never updated it is never marked as resolved. Thus
it continues to inhibit other alerts until it is
cleaned up by the internal GC.

This commit updates the struct of the old alert
with the result of the merge with the new alert.

An alternative would be to always update the
inhibitor's internal cache of alerts regardless of
an alert's resolve status.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
@stuartnelson3 stuartnelson3 requested review from brancz and fabxc April 17, 2018 16:19
// points to to equal the newly merged alert.
// This is necessary as old may be stored in
// the inhibitor's rules cache.
*old = *alert
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the description, this line could be removed if the inhibitor's internal scache is updated even if an alert is resolved.

I think I would prefer that, as opposed to updating a pointer's value and hoping that it's being referred to elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chat with @grobie and he agreed. The latest commit reflects these changes.

This seems like a better choice than the previous
commit. I think it is more sane to have the
inhibitor update its own cache, rather than having
one of its pointers updated externally.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
Copy link
Member

@grobie grobie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome test coverage!

Looks like a very small optimization we lose here.

@simonpasquier
Copy link
Member

👍

@stuartnelson3 stuartnelson3 merged commit 80f2eeb into master Apr 18, 2018
@stuartnelson3 stuartnelson3 deleted the stn/inhibition-cache branch April 18, 2018 14:26
mxinden pushed a commit to mxinden/alertmanager that referenced this pull request May 5, 2018
* inhibit: update inhibition cache when alerts resolve

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: remove unnecessary fmt.Sprintf

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: add unit tests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: use NopLogger in tests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update old alert with result of merge with new

On ingest, alerts with matching fingerprints are
merged if the new alert's start and end times
overlap with the old alert's.

The merge creates a new alert, which is then
updated in the internal alert store.

The original alert is not updated (because merge
creates a copy), so it is never marked as resolved
in the inhibitor's reference to it.

The code within the inhibitor relies on skipping
over resolved alerts, but because the old alert is
never updated it is never marked as resolved. Thus
it continues to inhibit other alerts until it is
cleaned up by the internal GC.

This commit updates the struct of the old alert
with the result of the merge with the new alert.

An alternative would be to always update the
inhibitor's internal cache of alerts regardless of
an alert's resolve status.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Update inhibitor cache even if alert is resolved

This seems like a better choice than the previous
commit. I think it is more sane to have the
inhibitor update its own cache, rather than having
one of its pointers updated externally.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alertmanager not disinhibiting alert

3 participants