1- = ADR028: Discovery revision
1+ = ADR028: Service Discovery (Revision)
22Sebastian Bernauer <sebastian .bernauer@stackable.tech >
33v0.1, 2023-03-30
44:status: draft
@@ -68,9 +68,15 @@ We have some common use-cases that we need to express via the discovery mechanis
6868
6969== Considered Options
7070
71- === [1] Discovery Object: Use ConfigMap
71+ === Discovery Object
7272
73- Use a ConfigMap:
73+ This Discovery Object is written by the Product Operator. It describes how the product can be accessed by other
74+ connecting entities (clients). These clients can be inside the same Kubernetes cluster, in an external Kubernetes
75+ cluster or deployed using a different approach like running inside a VM or on bare metal hardware. The discovery object
76+ is used both internally by the SDP and by other actors (for example users). Each Discovery Object is created in the same
77+ namespace the product runs in. This means cross-namespace discovery needs special attention.
78+
79+ ==== Discovery Object: Option 1 - ConfigMap
7480
7581[source,yaml]
7682----
@@ -94,22 +100,20 @@ spec:
94100 authenticationSecretClass: client-kerberos
95101----
96102
97- ==== Pros
103+ ===== Pros
98104
99105* Easier to use for consuming applications outside of Stackable, as they can simply mount the CM or download it to a
100106 file. On the other hand when we start to put in yaml objects that need to be parsed we would loose the benefit.
101107* Single API call to retrieve all running products
102108
103- ==== Cons
109+ ===== Cons
104110
105111* No complex structure such as enums (we can mitigate this by sticking in custom yaml into the CM). Users don't have any
106112 form of validation when creating their own discovery CM e.g. pointing to their existing HDFS.
107113* Cannot have two products with the same name, as the discovery CM name clashes. One solution could be to prefix the
108114 product name (e.g. trino-simple), This can impose other problems such as too long CM names.
109115
110- === [1] Discovery Object: Use dedicated CRD object for every product
111-
112- Or use a dedicated HdfsClusterDiscovery crd:
116+ ==== Discovery Object: Option 2 - Dedicated CRD Object for every Product
113117
114118[source,yaml]
115119----
@@ -133,30 +137,28 @@ spec:
133137 secretClass: client-tls # Use this SecretClass to obtain a keytab
134138----
135139
136- ==== Pros
140+ ===== Pros
137141
138142* Validation by using e.g. complex enums
139143* Commons structure can be shared between all operators, such as `Listener` endpoints or tls server certificate
140144 information
141145
142- ==== Cons
146+ ===== Cons
143147
144- * Operator A needs to compile against operator b to have access to it's discovery struct. An alternative would be to put
148+ * Operator A needs to compile against operator B to have access to it's discovery struct. An alternative would be to put
145149 the Discovery CRDs in operator-rs.
146150* Operator versioning hell. On the other hand we have the same problem with ConfigMaps, as e.g. a newly introduced key
147151 is missing because of an older hdfs operator version.
148152* Dependant Pods (such as hbase on hdfs) can not simply mount a CM containing the hdfs-site and core-site. Instead the
149153 hbase-operator needs to read the HdfsClusterDiscovery, copy the hdfs-site and core-site into a CM and than mount that
150154 into the hbase Pods. This can be solved by the HdfsClusterDiscovery to point to a CM that contains hdfs-site and
151- core-site xmls .
155+ core-site XML files .
152156* Multiple API calls need to retrieve all running Stackable service (in stackablectl or cockpit). This would be a single
153157 API call in case of discovery CM or a shared CRD for all product discoveries.
154158* Side-Note: `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are
155159 external systems, where we don't know anything about.
156160
157- === [1] Discovery Object: Use dedicated CRD object for every product - in combination with ConfigMap
158-
159- Or use a dedicated HdfsClusterDiscovery crd:
161+ ==== Discovery Object: Option 3 - Dedicated CRD Object + ConfigMap for every Product
160162
161163[source,yaml]
162164----
@@ -210,15 +212,13 @@ spec:
210212# No CM needed
211213----
212214
213- ==== Pros
215+ ===== Pros
214216
215217* Fixes mount problem from `Discovery Object: Use dedicated CRD object for every product`
216218
217- ==== Cons
218-
219- === [1] Discovery Object: Use dedicated CRD object shared between all products
219+ ===== Cons
220220
221- Or use a dedicated ClusterDiscovery crd:
221+ ==== Discovery Object: Option 4 - Dedicated CRD Object shared between all Products
222222
223223[source,yaml]
224224----
@@ -245,13 +245,13 @@ spec:
245245 # ...
246246----
247247
248- ==== Pros
248+ ===== Pros
249249
250250* Only one struct in operator-rs => No cross-operator dependencies.
251251* Single API call to retrieve all stackable products. Question is if this really helps a lot, as callers probably also
252252 are interested in the status of the product, which needs further API calls (irrelevant - see Cons).
253253
254- ==== Cons
254+ ===== Cons
255255
256256* All product discoveries are versioned together. E.g. a new mandatory field for hdfs requires all operators to bump the
257257 Discovery CRD to `v2`. We hope that this does not happen too often.
@@ -260,25 +260,30 @@ spec:
260260 systems, where we don't know anything about. So in case we want to introduce a `Stacklet` object listing anyway, so
261261 the `Pro` regarding the API calls is irrelevant.
262262
263- === [1] Discovery Object: Write the discovery to Product CR status
263+ ==== Discovery Object: Option 5 - Write Discovery to Product CR Status
264264
265265Instead of writing discovery information to dedicated objects - such as CM or custom CR - we "simply" write the
266266discovery information to the status of the Cluster CR.
267267
268- ==== Pros
268+ ===== Pros
269269
270- ==== Cons
270+ * None currently
271+
272+ ===== Cons
271273
272274* It does not enable users to bring their own product and talk to it from Stackable, e.g. a user-provided HDFS.
273275* It does not allow things such as a ZNode for Zookeeper as we either use the Zookeeper CR for discovery or we use a
274276 ZNode but than can't use a Zookeeper CR. Currently we have the freedom of either connection to a Zookeeper root dir or
275277 a ZNode transparently.
276278
277- === [2] TLS: Discovery config contains SecretClass
279+ '''
278280
279- The discovery includes the SecretClass used to obtain the ca.crt used to validate the *server* certificate
281+ === TLS
282+
283+ ==== TLS: Option 1 - Discovery Config contains SecretClass
284+
285+ The discovery includes the SecretClass used to obtain the ca.crt used to validate the *server* certificate.
280286
281- Trino discovery:
282287[source,yaml]
283288----
284289apiVersion: trino.stackable.tech/v1alpha1
@@ -305,18 +310,20 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
305310 discovery: my-trino
306311----
307312
308- ==== Pros
313+ ===== Pros
309314
310- ==== Cons
315+ * Currently none
311316
312- === [2] TLS: Client needs to specify SecretClass
313- ---
317+ ===== Cons
318+
319+ * Currently none
320+
321+ ==== TLS: Option 2 - Client specifies SecretClass
314322
315323The discovery does *not* include the SecretClass used to obtain the *server* certificate. Instead the client must
316324specify which SecretClass should be used to verify the *server* certificate. For usability reasons it can be omitted and
317325defaults to the SecretClass the client uses for itself.
318326
319- Trino discovery:
320327[source,yaml]
321328----
322329apiVersion: trino.stackable.tech/v1alpha1
@@ -343,22 +350,21 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
343350 tlsSecretClass: my-second-pki
344351----
345352
346- ==== Pros
353+ ===== Pros
347354
348355* Operator does not need to read/look at the DiscoveryConfig (as we can statically set up the secret-op tls secretClass
349356 volumes rather than retrieving them from the DiscoveryConfig).
350357* Some clients only support a single pki, in that case we could not give the ability to overwrite the secretClass coming
351358 from the product itself.
352359
353- ==== Cons
360+ ===== Cons
354361
355362* The client has to know what pki/secretClass the server is using.
356363* Superset TrinoConnection could not only say "Connect this Superset and this Trino", but need to say "using this ca.crt
357364 to validate the Trino server"
358365
359- === [2] TLS: Include caCert in Discovery config
366+ ==== TLS: Option 3 - Include caCert in Discovery Config
360367
361- Trino discovery:
362368[source,yaml]
363369----
364370metadata:
@@ -376,23 +382,21 @@ endpoint:
376382 === END CERTIFICATE ===
377383----
378384
379- ==== Pros
385+ ===== Pros
380386
381387* Easier for external clients to use as they don't need to know the concept of SecretClasses and don't even need to run
382388 withing k8s.
383389* The client has to *not* know what pki/secretClass the server is using.
384390
385- ==== Cons
391+ ===== Cons
386392
387- * BIG QUESTION: How should the product operator get the ca cert from the SecretClass it uses to get the *server* cert
393+ * BIG QUESTION: How should the product operator get the CA cert from the SecretClass it uses to get the *server* cert
388394 from?
389395** The secret-op could e.g. offer an HTTP api to fetch the ca.crt of a given SecretClass or e.g. write the ca.crt into
390396 the status of a SecretClass
391397
392398
393- === [2] TLS: Include SecretClass in discovery, user can override it
394-
395- Trino discovery:
399+ ==== TLS: Option 4 - Include SecretClass in Discovery (User can override it)
396400
397401[source,yaml]
398402----
@@ -422,17 +426,20 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
422426 tlsSecretClass: my-second-pki
423427----
424428
425- ==== Pros
429+ ===== Pros
426430
427431* Compromise with all usability and flexibility
428432
429- ==== Cons
433+ ===== Cons
430434
431435* Less secure by default
432436
433- === [3] Authentication: Add AuthenticationClass to Discovery Config
437+ '''
438+
439+ === Authentication
440+
441+ ==== Authentication: Option 1 - Add AuthenticationClass to Discovery Config
434442
435- Trino discovery:
436443[source,yaml]
437444----
438445metadata:
@@ -441,23 +448,23 @@ authentication:
441448 authenticationClass: my-class
442449----
443450
444- ==== Pros
451+ ===== Pros
452+
445453* IMPORTANT: This is the only thing the server can know (how he is verifying client identities). He can not recommend an
446454 SecretClass used to obtain the client credentials. E.g. he uses an LDAP AuthenticationClass, there is no way it can
447455 now what SecretClass provides credentials accepted by LDAP. (Most cases it will be a user logging into a WebUI and the
448456 LDAP credentials of the user are not even stored anywhere but just remembered by the user)
449457
450- ==== Cons
458+ ===== Cons
451459
452460* Operator has to read the AuthenticationClass to determine its type (pw/tls/keytab) and set up the needed volumes and
453461 commands.
454462// * The AuthenticationClass is meant to describe "how should a server verify connecting clients" and re-purpose it to
455463// mean "how a client should authenticate itself". Image a user creates a Secret `trino-users` with *only* a ca.crt
456464// and a SecretClass `trino-users` on top. The connecting client than has no way of knowing how to get a client cert.
457465
458- === [3] Authentication: Add SecretClass to Discovery Config
466+ ==== Authentication: Option 2 - Add SecretClass to Discovery Config
459467
460- Trino discovery:
461468[source,yaml]
462469----
463470metadata:
@@ -466,13 +473,17 @@ authentication:
466473 secretClass: client-tls # Use this SecretClass to obtain your credentials (regardless of type of SecretClass)
467474----
468475
469- ==== Cons
476+ ===== Pros
477+
478+ * Currently none
479+
480+ ===== Cons
470481
471482* Operator has to read the SecretClass to determine its type (pw/tls/keytab) and set up the needed volumes and commands.
472483* Image then SecretClass is of type `k8sSearch`. The connection client (e.g. controlled via superset-operator) than has
473484 no idea if he should expect a tls.crd or a keytab when mounting the SecretClass.
474485
475- === [3] Authentication: Add needed details
486+ ==== Authentication: Option 3 - Add needed Details
476487
477488Trino discovery:
478489[source,yaml]
@@ -488,27 +499,33 @@ authentication:
488499 secretClass: client-kerberos # Use this SecretClass to obtain a keytab
489500----
490501
491- ==== Pros
502+ ===== Pros
492503
493504* Operator has *not* to read the SecretClass to determine its type (pw/tls/keytab), as the type is already encoded in
494505 the Discovery config.
495506
496- ==== Cons
507+ ===== Cons
497508
498- === [3] Authentication: Don't add information how to authenticate
509+ * Currently none
510+
511+ ==== Authentication: Option 4 - Provide no Authentication Information
499512
500513Trino discovery does not provide any information on how to authenticate
501514
502- ==== Cons
515+ ===== Pros
516+
517+ * Currently none
518+
519+ ===== Cons
503520
504521* Not viable, as users need to know how to connect, and are not expected to try 50 different auth methods. We need to
505522 give them a AuthenticationClass, that says them e.g. what LDAP or PKI is used.
506523
507524== Decision Outcome
508525
509- [1] Discovery Object: `Discovery Object: Use dedicated CRD object for every product - in combination with ConfigMap`
510- [2] Server tls cert: TODO
511- [3] Authentication: `Authentication: Add AuthenticationClass to Discovery Config`
526+ * * Discovery Object:* Option 3
527+ * * Server TLS:* TBD, but leaning towards Option 2
528+ * * Authentication:* Option 1
512529
513530=== Appendix A
514531
0 commit comments