From b38f166cc47934d6c1f57ce13950614245f92947 Mon Sep 17 00:00:00 2001 From: Eric Anderson Date: Wed, 11 Aug 2021 16:06:22 -0700 Subject: [PATCH 1/5] A36: Document expected behavior when observing RouteConfiguration The text permitted certain carve-outs if RouteConfiguration was not being observed, but it didn't describe the behavior if RouteConfiguration _was_ observed. --- A36-xds-for-servers.md | 44 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/A36-xds-for-servers.md b/A36-xds-for-servers.md index 3ceb885a3..020a34480 100644 --- a/A36-xds-for-servers.md +++ b/A36-xds-for-servers.md @@ -5,7 +5,7 @@ A36: xDS-Enabled Servers * Approver: markdroth * Status: Ready for Implementation * Implemented in: -* Last updated: 2021-07-21 +* Last updated: 2021-08-11 * Discussion at: https://groups.google.com/g/grpc-io/c/CDjGypQi1J0 ## Abstract @@ -180,8 +180,6 @@ Envoy's behavior. HttpConnectionManager support is required. HttpConnectionManager must have valid `http_filters`, as defined by [A39: xDS HTTP Filter Support][A39]. -RouteConfiguration (via `route_config` or indirect fields like `rds`) is not -used nor validated at this time, but may be in the future. If the `envoy.extensions.filters.http.router.v3.Router` is not present in `http_filters`, A39 calls for inserting a special filter that fails all RPCs. If @@ -192,8 +190,48 @@ required. This is to allow implementations that only support L4 xDS features to avoid L7 plumbing and implementation. This has no impact on the resource validation and NACKing behavior called for in A39. +If an XdsServer implementation uses RouteConfiguration or supports any HTTP +filters other than the hard-coded Router, then +`HttpConnectionManager.route_config` and `HttpConnectionManager.rds` must be +supported and RouteConfigurations must be validated. RouteConfiguration +validation logic inherits all previous validations made for client-side usage +as RDS does not distinguish between client-side and server-side. That is +predomenently defined in [gRFC A28][A28-validation], although note that +configuration for all VirtualHosts have been validated on client-side since +sharing the XdsClient was introduced, yet was not documented in a gRFC. The +validation must be updated to allow [Route.non_forwarding_action][] as a valid +`action`. The VirtualHost is selected on a per-RPC basis using the RPC's +requested `:authority`. Routes are matched the same as on client-side. +`Route.non_forwarding_action` is expected for all Routes used on server-side and +`Route.route` continues to be expected for all Routes used on client-side; a +Route with an inappropriate `action` causes RPCs matching that route to fail +with UNAVAILABLE. If `HttpConnectionManager.rds` references a NACKed resource +without a previous good version, an unavailable resource because of +communication failures with control plane or a triggered loading timeout, or +a non-existent resource, then all RPCs processed by that HttpConnectionManager +will fail with UNAVAILABLE. + +[Like in Envoy][envoy lds], updates to a Listener cause all older connections on +that Listener to be gracefully shut down (i.e., "drained") with a default grace +period of 10 minutes for long-lived RPCs, such that clients will reconnect and +have the updated configuration apply. This applies equally to an update in a +RouteConfiguration provided inline via the `route_config` field as it is part of +the Listener, but it does not apply to an updated RouteConfiguration provided by +reference via `rds` field. Infrastructure for "not serving" mode may be used for +the graceful shutdown, but only if it can be used without causing spurious RPC +or connection failures. Applying updates to a Listener should be delayed until +dependent resources have been attempted to be loaded (e.g., via RDS). The +existing resource loading timeout in XdsClient prevents the update from being +delayed indefinitely and the duplicate resource update detection in XdsClient +prevents replacing the Listener when nothing changes. The grace period should be +adjustable when building the XdsServer and should be described as the "drain +grace time." + [Filter.typed_config]: https://github.com/envoyproxy/envoy/blob/928a62b7a12c4d87ce215a7c4ebd376f69c2e080/api/envoy/config/listener/v3/listener_components.proto#L40 [TypedStruct.type_url]: https://github.com/cncf/udpa/blob/cc1b757b3eddccaaaf0743cbb107742bb7e3ee4f/udpa/type/v1/typed_struct.proto#L38 +[A28-validation]: A28-xds-traffic-splitting-and-routing.md#response-validation +[Route.non_forwarding_action]: https://github.com/envoyproxy/envoy/blob/5963beae8842982803af1bef04fb5a2a0893c613/api/envoy/config/route/v3/route_components.proto#L242 +[envoy lds]: https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/lds ### FilterChainMatch From 599313a4a58c8b5f556be7c0811eb3a78597394a Mon Sep 17 00:00:00 2001 From: Eric Anderson Date: Thu, 26 Aug 2021 16:59:45 -0700 Subject: [PATCH 2/5] Drop mention of "not serving" mode from draining --- A36-xds-for-servers.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/A36-xds-for-servers.md b/A36-xds-for-servers.md index 020a34480..aaee656a9 100644 --- a/A36-xds-for-servers.md +++ b/A36-xds-for-servers.md @@ -217,9 +217,9 @@ period of 10 minutes for long-lived RPCs, such that clients will reconnect and have the updated configuration apply. This applies equally to an update in a RouteConfiguration provided inline via the `route_config` field as it is part of the Listener, but it does not apply to an updated RouteConfiguration provided by -reference via `rds` field. Infrastructure for "not serving" mode may be used for -the graceful shutdown, but only if it can be used without causing spurious RPC -or connection failures. Applying updates to a Listener should be delayed until +reference via `rds` field. Draining must not cause the server to spuriously fail +RPCs or connections, so the listening port must not be closed as part of the +process. Applying updates to a Listener should be delayed until dependent resources have been attempted to be loaded (e.g., via RDS). The existing resource loading timeout in XdsClient prevents the update from being delayed indefinitely and the duplicate resource update detection in XdsClient From e7423ca1224497f60b2667e4fc36fe5b5eb074fa Mon Sep 17 00:00:00 2001 From: Eric Anderson Date: Thu, 26 Aug 2021 17:00:06 -0700 Subject: [PATCH 3/5] Cross-link A36 into A28, since it changes some behavior --- A28-xds-traffic-splitting-and-routing.md | 1 + 1 file changed, 1 insertion(+) diff --git a/A28-xds-traffic-splitting-and-routing.md b/A28-xds-traffic-splitting-and-routing.md index 14a580acb..01d33fa88 100644 --- a/A28-xds-traffic-splitting-and-routing.md +++ b/A28-xds-traffic-splitting-and-routing.md @@ -6,6 +6,7 @@ * Implemented in: C-core, Java, and Go * Last updated: 2020-05-20 * Discussion at: https://groups.google.com/g/grpc-io/c/2Ot9HA3Rjeg +* Updated by: [A36: xDS-Enabled Servers](A36-xds-for-servers.md) From ed79542cdc5935d16b3273b00bf501a200e7df1e Mon Sep 17 00:00:00 2001 From: Eric Anderson Date: Thu, 2 Sep 2021 17:41:19 -0700 Subject: [PATCH 4/5] Unsupported actions still require filter chain processing --- A36-xds-for-servers.md | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/A36-xds-for-servers.md b/A36-xds-for-servers.md index aaee656a9..6050749dd 100644 --- a/A36-xds-for-servers.md +++ b/A36-xds-for-servers.md @@ -204,12 +204,12 @@ validation must be updated to allow [Route.non_forwarding_action][] as a valid requested `:authority`. Routes are matched the same as on client-side. `Route.non_forwarding_action` is expected for all Routes used on server-side and `Route.route` continues to be expected for all Routes used on client-side; a -Route with an inappropriate `action` causes RPCs matching that route to fail -with UNAVAILABLE. If `HttpConnectionManager.rds` references a NACKed resource -without a previous good version, an unavailable resource because of -communication failures with control plane or a triggered loading timeout, or -a non-existent resource, then all RPCs processed by that HttpConnectionManager -will fail with UNAVAILABLE. +Route with an inappropriate `action` causes RPCs matching that route and +reaching the end of the filter chain to fail with UNAVAILABLE. If +`HttpConnectionManager.rds` references a NACKed resource without a previous good +version, an unavailable resource because of communication failures with control +plane or a triggered loading timeout, or a non-existent resource, then all RPCs +processed by that HttpConnectionManager will fail with UNAVAILABLE. [Like in Envoy][envoy lds], updates to a Listener cause all older connections on that Listener to be gracefully shut down (i.e., "drained") with a default grace @@ -578,6 +578,13 @@ waiting on the C core implementation. would work for C++/wrapped languages but there is currently no associated implementation work. + * This design changes the client-side behavior for an inappropriate `action` + and requires the RPC be processed by the filters before being failed. C will + initially fail the client-side RPCs without filter processing. Implementing + the full behavior will be follow-up work because the behavior difference + isn't important for the currently-supported filters and the change is more + invasive in C than other languages. + * Java. Implementation work primarily by @sanjaypujare, along with gRFC A29. Added classes are in the `grpc-xds` artifact and `io.grpc.xds` package. From 49feb6ff32001543668af777c13a42f69472c445 Mon Sep 17 00:00:00 2001 From: Eric Anderson Date: Thu, 2 Sep 2021 17:42:43 -0700 Subject: [PATCH 5/5] Define configuration error logging. This is being done instead of a new variation on the notification API. --- A36-xds-for-servers.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/A36-xds-for-servers.md b/A36-xds-for-servers.md index 6050749dd..7ece53a0c 100644 --- a/A36-xds-for-servers.md +++ b/A36-xds-for-servers.md @@ -211,6 +211,16 @@ version, an unavailable resource because of communication failures with control plane or a triggered loading timeout, or a non-existent resource, then all RPCs processed by that HttpConnectionManager will fail with UNAVAILABLE. +There are situations when an XdsServer can clearly tell the configuration will +cause errors, yet it still applies the configuration. In these situations the +XdsServer should log a warning each time it receives updates for configuration +in this state. This is known as "configuration error logging." If an XdsServer +logs such a warning, then it should also log a single warning once there are no +longer any such errors. Configuration error logging is currently limited to +broken RDS resources and an unsupported Route `action` (i.e., is not +`non_forwarding_action`), both of which cause RPCs to fail with UNAVAILABLE as +described above. + [Like in Envoy][envoy lds], updates to a Listener cause all older connections on that Listener to be gracefully shut down (i.e., "drained") with a default grace period of 10 minutes for long-lived RPCs, such that clients will reconnect and