api: enhance v3 CommonTlsContext for agentless support#11061
Conversation
htuch
left a comment
There was a problem hiding this comment.
Thanks, some comments to get started.
/wait
There was a problem hiding this comment.
When https://github.com/envoyproxy/envoy/pull/10908/files#r420136445 lands, we should switch to this here. This should happen before this PR merges.
There was a problem hiding this comment.
Okay, will use TypedExtensionConfig instead of Any once that one is in.
|
Can you please elaborate on the description of this change? It sounds like there was a previous conversation related to this, but for people (such as myself) that weren't part of the conversation, it isn't clear what the purpose of this change is. |
Sure. gRPC recently added xDS support for load balancing and other routing features. This enables users to create proxyless service meshes. However for transport security (TLS/mTLS etc) the xDS protocol assumes the presence of SDS servers (for example Istio Agent or Node agent). To make the solution agentless as well as proxyless we are considering an SDS-server-less approach where we bypass the SDS server and have a CA-client in gRPC directly to talk to a service mesh CA to acquire certificates. This PR is related to that. Hope that answers your question. |
|
Thank you, that's very helpful context. So is this logically similar to talking to an SDS server, but with a different, previously existing/implemented protocol? |
In case of SDS, the SDS server creates the CSR and returns cert + private key to the client (Envoy) . With the CA client approach, the client (Envoy) creates the CSR so private key is not shared with anybody. |
There was a problem hiding this comment.
Should this be inside oneof validation_context_type and be mutually exclusive with SdsSecretConfig validation_context_sds_secret_config = 7;?
There was a problem hiding this comment.
Good catch. Yes it should be moved under oneof validation_context_type. Once moved there it will be mutually exclusive with all the other members in there not just validation_context_sds_secret_config. I assume there is no backward compatibility issue with adding a new member to a "oneof"
There was a problem hiding this comment.
Cool. And maybe the one above (CAClientPluginConfig tls_certificate_ca_client_plugin = 9) also needs to be mutually exclusive with repeated SdsSecretConfig tls_certificate_sds_secret_configs = 6. I'm not very sure though since tls_certificate_sds_secret_configs is a repeated field. Maybe Envoy meant to support fetching tls certificate from multiple sources; in that case the current way is probably ok as it just adds one more source to the list.
There was a problem hiding this comment.
The main problem here is backward compatibility: moving an existing field (tls_certificate_sds_secret_configs) to oneof creates backward compatibility issues so we decided against it. However this might be a good candidate for the field_migrate annotations. @htuch do you agree?
There was a problem hiding this comment.
Just use the oneof promotion annotation for that, document in the interim.
There was a problem hiding this comment.
Does it make sense to make the different fields for getting the tls_certificates also a oneof? If so, could we mark them with appropriate annotation to promote to oneof as well.
There was a problem hiding this comment.
Yes, I think I will go ahead and just do that for all oneof candidates.
There was a problem hiding this comment.
Thinking more about it, I see that they already had oneof validation_context_type to select only one among the 3 existing options (before I added the 4th one). But they didn't do the same for tls_certificates and tls_certificate_sds_secret_configs which could have been put in oneof.
@htuch could you throw some light on this? Should my field_migrate.oneof_promotion annotation include all 3 (tls_certificates, tls_certificate_sds_secret_configs and tls_certificate_ca_client_plugin) or just the last 2 or you think oneof is not warranted here? And the reason for that?
|
I think this PR is premature. I don't think we have enough consensus yet on exactly how this should work. I think we need to finalize our design before we add anything to the xDS API. |
There was a problem hiding this comment.
One of the problems with SDS was the verbose config - if you have 5000 services you'll need to duplicate that message for each cluster and listener. It would be ideal to not repeat it :-)
There are discussions about better naming and referencing of resources - could this be expressed as just a reference (the name of the provider) ? For most servers there will be a single 'plugin config' - maybe few, but not one per cluster and listener.
There was a problem hiding this comment.
That's a good point and I have thought about it too but it's not easy to do it in a stateless way. If you have any ideas that do not impose any architectural requirements, let me know.
There was a problem hiding this comment.
What do others think about creating a new resource type just for this? That way the config is available as a resource and its name can be referenced here. I would prefer not to do it in this PR.
There was a problem hiding this comment.
Yeah, let's punt this to another PR. Ideally we motivate any additional API decomposition with concrete use cases.
|
Agreed with @markdroth; can we have a public design document describing how this will work? Thanks. |
Sure, I'll create a design doc and attach to this PR. Hopefully that will help us reach broad consensus. |
|
FWIW I'm definitely in favor of doing this change. I have been thinking about it for a while. It will make it much easier to natively implement ACME (#96) as an extension. We need to think about async reload support for these extensions also, so there is some plumbing overlap with SDS. |
@mattklein123 thanks. Can you explain what you mean by "async reload" support and "plumbing overlap with SDS" in this context? We have used "async reload" at the transport/handshake level where a cert is acquired dynamically and asynchronously for a new (m)TLS connection. But this is implied in xDS via the *TlsContext fields for clusters/listeners and that semantic is not changed by this proposal. |
Sorry I just meant that we need to make sure that the Envoy side implementation supports async fetch from the extension. This would allow certificates to be fetched over the network, refreshed, etc., much like how SDS works. Optimally, SDS would be implemented in a future version as an extension, but I think that is out of scope for this set of changes. |
|
Is there a preferred format for a design doc? Is Google doc okay? |
|
The requested design doc is here https://docs.google.com/document/d/1A1_QVCrfwgkFY2YxNqbVe_eKt54kkJ6-iQq5EA_QaNc/edit?usp=sharing |
|
Please make the doc world-readable. |
Just did. I had to make a copy to do that. I edited my comment to include the new link. |
|
@sanjaypujare sorry can you please make it world commentable? |
Done. |
|
/wait |
I notice this added the "waiting" label - curious what it signifies. |
It allows filtering PRs by label. PRs with the Pushing a commit automatically clears the label. In this case, since the current work is happening in the linked design doc, I added the waiting label here. |
|
@htuch and others: I have addressed your comments in the design doc as well as in the PR. |
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
27dc9dd to
71b3508
Compare
@mattklein123 addressed the pending comments and rebased to master. Hopefully all okay now. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, LGTM with small PGV comments. @htuch @markdroth any further comments?
/wait
| message CertificateProvider { | ||
| // opaque name used to specify certificate instances or types. For example, "ROOTCA" to specify | ||
| // a root-certificate (validation context) or "TLS" to specify a new tls-certificate. | ||
| string name = 1; |
There was a problem hiding this comment.
Should this be required by validation?
There was a problem hiding this comment.
yes it should be required. Adding that.
| // to maintain a single certificate-provider instance. The sharing can happen, for | ||
| // example, among multiple clusters or between the tls_certificate and validation_context | ||
| // certificate providers of a cluster. | ||
| oneof config { |
There was a problem hiding this comment.
Can you add a required validation to the oneof?
|
I'm still a little nervous that we haven't fleshed out the details for ACME, but if Matt is confident enough that this API will work for that when the time comes, then I'm happy with it as well. |
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
|
Can you check CI? /wait |
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
htuch
left a comment
There was a problem hiding this comment.
LGTM modulo two tiny docs comments.
| SdsSecretConfig validation_context_sds_secret_config = 2 | ||
| [(validate.rules).message = {required: true}]; | ||
| // Config for fetching validation context via SDS API. Note SDS API allows certificates to be | ||
| // fetched/refreshed over the network "asynchronously" with respect to the TLS handshake. |
There was a problem hiding this comment.
Nit: why quotes around "asynchronously"? This seems to imply it's not really async?
There was a problem hiding this comment.
removed quotes
| (udpa.annotations.field_migrate).oneof_promotion = "dynamic_validation_context" | ||
| ]; | ||
|
|
||
| // Certificate provider for fetching validation context |
There was a problem hiding this comment.
For documentation purposes, best to spell out the mutex relationship between the SDS and certificate provider. The oneof_promotion annotations are removed during doc generation.
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
fixed the issue. Hope it can be merged now |
This is a follow up from envoyproxy#11061 which added support for sending arbitrary certificateProvider plugin configuration. Signed-off-by: Easwar Swaminathan <easwars@google.com>
Signed-off-by: Sanjay Pujare sanjaypujare@users.noreply.github.com
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Commit Message: api: enhance v3 CommonTlsContext for agentless support
Additional Description: add protos/fields needed to support agentless (without SDS) secret access
Risk Level: low
Testing: pre-submit and do_ci fix format checks
Docs Changes: none
Release Notes: -NA-
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
@htuch @costinm @markdroth PR for agentless secrets as discussed.