Skip to content

Comments

doc: Propose using Memberlist Keyring to protect a cluster#2216

Closed
zecke wants to merge 1 commit intoprometheus:mainfrom
zecke:zecke/enable-memberlist-keyring
Closed

doc: Propose using Memberlist Keyring to protect a cluster#2216
zecke wants to merge 1 commit intoprometheus:mainfrom
zecke:zecke/enable-memberlist-keyring

Conversation

@zecke
Copy link
Contributor

@zecke zecke commented Mar 21, 2020

Create a document to propose an easy (implementation and operation) way
to protect the production cluster from accidental and unwanted members.

Provide a reference implementation in addition to the design document.

TODO(zecke): Figure out how to test this feature properly.

Signed-off-by: Holger Hans Peter Freyther automatic+am@freyther.de

Create a document to propose an easy (implementation and operation) way
to protect the production cluster from accidental and unwanted members.

Provide a reference implementation in addition to the design document.

TODO(zecke): Figure out how to test this feature properly.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
@zecke zecke force-pushed the zecke/enable-memberlist-keyring branch from b2d9507 to 81ef7af Compare March 21, 2020 16:06
@brian-brazil
Copy link
Contributor

Thanks for your PR. We're looking at adding TLS generally, so don't want to add other auth systems.

@zecke
Copy link
Contributor Author

zecke commented Mar 22, 2020

Thank you for your reply and sorry for being late to the party. I have seen the design document and wanted to propose a more simple design for a narrower problem. If we focus on integrity and authentication (e.g. something provided by an HMAC) and leave out confidentiality (e.g. ignore known plaintext in gossiped message) we end up with a solution orders of magnitude easier to implement and operate.

Going all in on TCP + X509 + TLS is nice but has certain consequences for operating an AM:

  • Certificates will expire "unexpectedly". I listed three major companies not able to renew certificates in time. It's bound to happen for many users/orgs. The failure for AM will be less dramatic as it fails open but is a failure mode never the less.

  • TLS is difficult to implement. Connections must be broken when certificates expire, are revoked... Time needs to be roughly synchronized (not sure what requirement on time we have today).

  • TCP for everything brings wanted and unwanted side-effects. The exposure to head of line blocking is one of them.

@mxinden
Copy link
Member

mxinden commented Mar 22, 2020

Thanks for putting work into this and writing a design document.

Just for documentation purposes I am linking the initial issue #1322 the design doc for Membership over TLS and the corresponding work-in-progress pull request #1819 here.

TCP for everything brings wanted and unwanted side-effects. The exposure to head of line blocking is one of them.

TCP head of line blocking is happening per connection. Given the low bandwidth usage of the gossip protocol I doubt this would be an issue. Please correct me if I am missing something.

Having a simple solution for the problem of distinct clusters merging would be great. On the contrary I do see the maintenance overhead of eventually maintaining two solutions.

@stale stale bot added the stale label May 21, 2020
@TheMeier
Copy link
Contributor

TheMeier commented Nov 3, 2025

The TLS based securing of the cluster has been implemented in #1322 in the meantime (still marked EXPERIMENTAL though).

Regarding certificate handling. A lot has changed here, automatics issuing (think ACME, vault, cert-manager) is much more common today.

@zecke do you still want to pursue this?

@SoloJacobs SoloJacobs self-requested a review November 14, 2025 19:55
Copy link
Contributor

@SoloJacobs SoloJacobs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Hi @zecke ,
-
-It seems that the TLS approach has since been implemented, thus this approach is obsolete. Do you feel like something is missing aka. would updating this PR provide any value?

+Seems like I missed @TheMeier comment, sorry!

Kind regards, Solomon Jacobs

@zecke zecke closed this Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants