-
Notifications
You must be signed in to change notification settings - Fork 2.4k
doc: Add 'Secure Alertmanager cluster traffic' design document #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| # Secure Alertmanager cluster traffic | ||
|
|
||
| Type: Design document | ||
|
|
||
| Date: 2019-02-21 | ||
|
|
||
| Author: Max Inden <IndenML@gmail.com> | ||
|
|
||
|
|
||
| ## Status Quo | ||
|
|
||
| Alertmanager supports [high | ||
| availability](https://github.com/prometheus/alertmanager/blob/master/README.md#high-availability) | ||
| by interconnecting multiple Alertmanager instances building an Alertmanager | ||
| cluster. Instances of a cluster communicate on top of a gossip protocol managed | ||
| via Hashicorps [_Memberlist_](https://github.com/hashicorp/memberlist) library. | ||
| _Memberlist_ uses two channels to communicate: TCP for reliable and UDP for | ||
| best-effort communication. | ||
|
|
||
| Alertmanager instances use the gossip layer to: | ||
|
|
||
| - Keep track of membership | ||
| - Replicate silence creation, update and deletion | ||
| - Replicate notification log | ||
|
|
||
| As of today the communication between Alertmanager instances in a cluster is | ||
| sent in clear-text. | ||
|
|
||
|
|
||
| ## Goal | ||
|
|
||
| Instances in a cluster should communicate among each other in a secure fashion. | ||
| Alertmanager should guarantee confidentiality, integrity and client authenticity | ||
| for each message touching the wire. While this would improve the security of | ||
| single datacenter deployments, one could see this as a necessity for | ||
| wide-area-network deployments. | ||
|
|
||
|
|
||
| ## Non-Goal | ||
|
|
||
| Even though solutions might also be applicable to the API endpoints exposed by | ||
| Alertmanager, it is not the goal of this design document to secure the API | ||
| endpoints. | ||
|
|
||
|
|
||
| ## Proposed Solution - TLS Memberlist | ||
|
|
||
| _Memberlist_ enables users to implement their own [transport | ||
| layer](https://godoc.org/github.com/hashicorp/memberlist#Transport) without the | ||
| need of forking the library itself. That transport layer needs to support | ||
| reliable as well as best-effort communication. Instead of using TCP and UDP like | ||
| the default transport layer of _Memberlist_, the suggestion is to only use TCP | ||
| for both reliable as well as best-effort communication. On top of that TCP | ||
| layer, one can use mutual TLS to secure all communication. A proof-of-concept | ||
| implementation can be found here: | ||
| https://github.com/mxinden/memberlist-tls-transport. | ||
|
|
||
| The data gossiped between instances does not have a low-latency requirement that | ||
| TCP could not fulfill, same would apply for the relatively low data throughput | ||
| requirements of Alertmanager. | ||
|
|
||
| TCP connections could be kept alive beyond a single message to reduce latency as | ||
| well as handshake overhead costs. While this is feasible in a 3-instance | ||
| Alertmanager cluster, the discussed custom implementation would need to limit | ||
| the amount of open connections for clusters with many instances (#connections = | ||
| n*(n-1)/2). | ||
|
|
||
| As of today, Alertmanager already forces _Memberlist_ to use the reliable TCP | ||
| instead of the best-effort UDP connection to gossip large notification logs and | ||
| silences between instances. The reason is, that those packets would otherwise | ||
| exceed the [MTU](https://en.wikipedia.org/wiki/Maximum_transmission_unit) of | ||
| most UDP setups. Splitting packets is not supported by _Memberlist_ and was not | ||
| considered worth the effort to be implemented in Alertmanager either. For more | ||
| info see this [Github | ||
| issue](https://github.com/prometheus/alertmanager/issues/1412). | ||
|
|
||
| With the last [Prometheus developer | ||
| summit](https://docs.google.com/document/d/1-C5PycocOZEVIPrmM1hn8fBelShqtqiAmFptoG4yK70/edit) | ||
| in mind, the Prometheus projects preferred security mechanism seems to be mutual | ||
| TLS. Having Alertmanager use the same mechanism would ease deployment with the | ||
| rest of the Prometheus stack. | ||
|
|
||
| As a side effect (benefit) Alertmanager would only need a single open port (TCP | ||
mxinden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| traffic) instead of two open ports (TCP and UDP traffic) for cluster | ||
| communication. This does not affect the API endpoint which remains a separate | ||
| TCP port. | ||
|
|
||
|
|
||
| ## Alternative Solutions | ||
|
|
||
| ### Symmetric Memberlist | ||
|
|
||
| _Memberlist_ supports [symmetric key | ||
| encryption](https://godoc.org/github.com/hashicorp/memberlist#Keyring) via | ||
| AES-128, AES-192 or AES-256 ciphers. One can specify multiple keys for rolling | ||
| updates. Securing the cluster traffic via symmetric encryption would just | ||
| involve small configuration changes in the Alertmanager code base. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If both methods require generating a key, what is the downside of this method vs. the proposed method?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this would be a valid approach -- but we would need to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And it'd be a different way of doing auth than we're going to use elsewhere.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we contribute our approach upstream?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@stuartnelson3 sorry for not covering that properly in the document:
What are your thoughts @stuartnelson3?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Which one?
Does that answer the question @roidelapluie?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My initial thought was wondering about the cost of developing and maintaining our own transport (and being consistent within the prometheus org) vs. using the keyring (and being inconsistent). The points you list here seem like enough to warrant creating our own Transport.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Re-reading the DTLS RFC, it does prevent replay attacks via an epoch and sequence number. I am sorry for the confusion. |
||
|
|
||
|
|
||
| ### Replace Memberlist | ||
|
|
||
| Coordinating membership might not be required by the Alertmanager cluster | ||
| component. Instead this could be bound to static configuration or e.g. DNS | ||
| service discovery. On the other hand, gossiping silences and notifications is | ||
| ideally done in an eventual consistent gossip fashion, given that Alertmanager | ||
| is supposed to scale beyond a 3-instance cluster and beyond local-area-network | ||
| deployments. With these requirements in mind, replacing _Memberlist_ with an | ||
| entirely self-built communication layer is a great undertaking. | ||
|
|
||
|
|
||
| ### TLS Memberlist with DTLS | ||
|
|
||
| Instead of redirecting all best-effort traffic via the reliable channel as | ||
| proposed above, one could also secure the best-effort channel itself using UDP | ||
| and [DTLS](https://en.wikipedia.org/wiki/Datagram_Transport_Layer_Security) in | ||
| addition to securing the reliable traffic via TCP and TLS. DTLS is not supported | ||
| by the Golang standard library. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is one per other AM really a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memberlist wants to have one connection as a reliable connection all to itself. Thereby we need at least two, one reliable TCP and one pseudo-best-effort connection unless we want to go down the road of multiplexing a single TCP connection.
@brian-brazil what maximum cluster size would you expect in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle someone might run two per datacenter, and tens of datacenters isn't that unusual. Say 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I will make sure to include that in the performance testing (in case we decide for this route).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "full sync" tcp request happens relatively infrequently, and send reliable is only used for especially large gossip messages (which is probably also relatively infrequent. it happens <<1% of the time at SC). Practically speaking, each instance would only maintain one connection to the other instances.