Skip to content

Catch unknown matcher types in gossipped silences#2479

Closed
beorn7 wants to merge 1 commit intorelease-0.21from
beorn7/proto
Closed

Catch unknown matcher types in gossipped silences#2479
beorn7 wants to merge 1 commit intorelease-0.21from
beorn7/proto

Conversation

@beorn7
Copy link
Member

@beorn7 beorn7 commented Feb 5, 2021

I believe that gossibping between AM instances with different versions
of the silence protobuf could create a situation where an unknown
matcher type won't be recognized, thereby possibly inverting the
behavior of the silence.

This makes the matching more robust by error'ing out when facing an
unknown matcher type.

This has come to my attention during the recent work on silences with
negative matching. To make cluster upgrades more robust, this fix is
needed, and therefore I'm proposing it for a bugfix release of v0.21.

Signed-off-by: beorn7 beorn@grafana.com

I believe that gossibping between AM instances with different versions
of the silence protobuf could create a situation where an unknown
matcher type won't be recognized, thereby possibly inverting the
behavior of the silence.

This makes the matching more robust by error'ing out when facing an
unknown matcher type.

This has come to my attention during the recent work on silences with
negative matching. To make cluster upgrades more robust, this fix is
needed, and therefore I'm proposing it for a bugfix release of v0.21.

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7
Copy link
Member Author

beorn7 commented Feb 5, 2021

@beorn7
Copy link
Member Author

beorn7 commented Feb 9, 2021

Thanks everyone for your review.

I have played with this, the unmodified AM, and the new silences with negative matchers. There is good and bad news:

The good news is that even without this change, the AM isn't just inverting the meaning of a silence. A silence with a negative matcher will make it into the storage via gossipping, but it will not silence anything.

The bad news is that with or without this change, the UI just breaks because new silences with negative matching cannot be unmarshaled from protobuf, so the API returns a 500, which breaks the display of silences as a whole.

We could certainly tweak v0.21 to treat the new silences more gracefully, but I think the important part is that the new silences don't silence anything silently (no pun intended) they are not supposed to silence. Even if we created a more graceful degradation, the solution would still be to update all instances in the cluster. So arguably, breaking less gracefully is even better because it is easier to notice.

Therefore, I'll close this and won't cut a v0.21.1. (Conveniently, the protobuf CVE-2021-3121 has just turned out to not affect Alertmanager.).

If you disagree with my verdict, please follow up.

@beorn7 beorn7 closed this Feb 9, 2021
@beorn7 beorn7 deleted the beorn7/proto branch February 9, 2021 20:33
beorn7 added a commit that referenced this pull request Feb 9, 2021
This has been discussed in #2479. Even if the conclusion there was
that we don't need this in a bugfix release, it's still better to have
this kind of robustness. So this introduces the same check into the
main branch.

Signed-off-by: beorn7 <beorn@grafana.com>
simonpasquier pushed a commit that referenced this pull request Feb 10, 2021
This has been discussed in #2479. Even if the conclusion there was
that we don't need this in a bugfix release, it's still better to have
this kind of robustness. So this introduces the same check into the
main branch.

Signed-off-by: beorn7 <beorn@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants