A27: xDS-Based Global Load Balancing#170
Conversation
ejona86
left a comment
There was a problem hiding this comment.
LGTM (assuming FIXME is resolved, and that's not even that big of a deal.)
| to query for the `Cluster` resource with the cluster name from the LB | ||
| policy configuration. | ||
|
|
||
| Initially, the CDS policy will support only hard-coded cluster names, so |
There was a problem hiding this comment.
Do you mean "policy names" instead of "cluster names"?
There was a problem hiding this comment.
No, it was correct as written, but the wording was a bit confusing. I've attempted to clarify.
| client will make a CDS request asking for that specific cluster name. | ||
|
|
||
| Because we are requesting one specific resource by name, the CDS response | ||
| should include exactly one `Listener`. However, many existing xDS servers |
dfawley
left a comment
There was a problem hiding this comment.
Just one nit (in 3 places) otherwise LGTM pending the FIXME.
| should be tolerant of servers that may ignore the requested resource name; | ||
| if the server returns multiple resources, the client should look for the one | ||
| with the name it asked for, and it should ignore the rest of the entries. |
There was a problem hiding this comment.
Use "must" instead of "should"? Or "will" may better match the wording in the rest of the document.
| should be tolerant of servers that may ignore the resource name in the | ||
| request; if the server returns multiple resources, the client should look | ||
| for the one with the name it asked for, and it should ignore the rest of |
| should be tolerant of servers that may ignore the requested resource name; | ||
| if the server returns multiple resources, the client should look for the one | ||
| with the name it asked for, and it should ignore the rest of the entries. |
| When the channel attempts to connect, the xds resolver will | ||
| instantiate an XdsClient object and use it to send an xDS request for |
There was a problem hiding this comment.
Are XdsClients of different gRPC channels going to share underlying TCP connections to XDS backend and/or XDS ADS stream?
For instance, what is a behavior in case if single process has 100 different gRPC channels powered by xds resolver/balancer?
There was a problem hiding this comment.
Each channel that uses an "xds:" URI will have its own channel to the xds server. For Java and Go, this means that there will actually be a separate TCP connection to the xds server. For C-core, however, because we share subchannels between channels, if there are multiple channels open to the same xds server, we will share the underlying TCP connection. However, even when the TCP connection is shared, there will be a separate ADS stream for each channel.
We have discussed in the past the possibility of sharing the XdsClient between channels. We may be able to implement that in the future if the duplicate TCP connections or duplicate ADS streams become a problem.
| does not specify a port. | ||
|
|
||
| Because we are requesting one specific resource by name, the LDS response | ||
| should include exactly one `Listener`. However, many existing xDS servers |
There was a problem hiding this comment.
There is some confusion for legacy management servers: I am assuming "do not support requesting one specific resource" means "do not honor resource names in the request". Currently we require the management server to always send client newly requested resources, this implies that server needs to honor what the client requests for. Therefore, I doubt if this kind of legacy management servers would still work for us.
There was a problem hiding this comment.
I am assuming "do not support requesting one specific resource" means "do not honor resource names in the request".
Correct.
Currently we require the management server to always send client newly requested resources, this implies that server needs to honor what the client requests for. Therefore, I doubt if this kind of legacy management servers would still work for us.
For LDS and CDS, legacy management servers will always send all resources in every response. Therefore, even if the server ignores the resource_names field, sending a new request will cause the server to send a new response containing all resources, which must by definition include the new one being requested.
We should not have a problem supporting legacy management servers.
| subscribing to a different type of configuration resource. In the xDS | ||
| API flow, the client uses the following main APIs, in this order: | ||
|
|
||
| - __Listener Discovery Service (LDS)__: Returns `Listener` resources. |
There was a problem hiding this comment.
Do you want to link back to some reference or proto for each of these resources?
|
|
||
| The `node` field will be populated with the [xDS `Node` | ||
| proto](https://github.com/envoyproxy/data-plane-api/blob/1adb5d54abb0e28ca409254d26fad1cf5535239b/envoy/api/v2/core/base.proto#L85). | ||
| Note that the `build_version` field should not be specified in the |
There was a problem hiding this comment.
Do you want to also mention the user agent name/version here, since build_version is deprecated?
There was a problem hiding this comment.
For now, we're going to populate build_version, since that's what existing management servers will be looking at. We can add support for the new fields as management servers start to look at them.
There was a problem hiding this comment.
We're actually going to populate build_version and user_agent_name and user_agent_version, for forward and backward compatibility. I've noted all three fields here.
| When the channel attempts to connect, the xds resolver will | ||
| instantiate an XdsClient object and use it to send an xDS request for | ||
| a `Listener` resource with the name of the server that the client | ||
| wishes to connect to (in the example above, "example.com:123"). If the |
There was a problem hiding this comment.
There is a design choice, including port alongside hostname. This is somewhat related to how we ended up doing naming of resources for VHDS. Some resource types naturally lend themselves to a certain form. It might be worth elaborating on why you included port here, and also updating the xDS protocols docs to indicate that on-demand listeners should be named this way.
There was a problem hiding this comment.
The reasoning for this is covered in the section about LDS below. It's basically being done to allow the xDS server to define the default port. That way, we don't need to hard-code a default in the gRPC client itself.
I'm not sure that this is really relevant to the current xDS protocol docs. It's really more a data model issue than a transport protocol issue. Do we have a doc for the data model somewhere? This could probably be documented as part of describing HTTP API listeners.
There was a problem hiding this comment.
Existing API docs conflate transport/data model somewhat (which is something I wanted to move away from in UDPA). I am curious whether this design might change with future namespace/attribute support in xDS.
There was a problem hiding this comment.
I wouldn't think so, but I'm certainly open to discussing it, especially as part of UDPA. Let's talk as we move forward.
| config](https://github.com/grpc/grpc/blob/master/doc/service_config.md), | ||
| which the resolver will return to the channel. | ||
|
|
||
| In particular, the resulting service config will select use of the CDS |
There was a problem hiding this comment.
Not sure how the "In particular" relates to the above (this is more a grammar nit).
|
|
||
| #### RDS | ||
|
|
||
| If the `Listener` does not include the `RouteConfiguration` in-line, |
| `RouteConfiguration` name specified in the `Listener`. | ||
|
|
||
| Because we are requesting one specific resource by name, the RDS response | ||
| should include exactly one RouteConfiguration. However, the gRPC client |
There was a problem hiding this comment.
Changed to "no more than", since that more accurately conveys the real point here.
| client will make a CDS request asking for that specific cluster name. | ||
|
|
||
| Because we are requesting one specific resource by name, the CDS response | ||
| should include exactly one `Cluster`. However, many existing xDS servers |
There was a problem hiding this comment.
Another place to consider weakening form exactly to at least one. I don't think it's a good to idea to start mandating that management servers behave a certain way when they can handle the more general case and specialize for gRPC.
| - If the `health_status` field has any value other than HEALTHY or | ||
| UNKNOWN, the entry will be ignored. | ||
| - The `endpoint` field must be set. Inside of it: | ||
| - The `address` field must be set. |
There was a problem hiding this comment.
Which parts of address? What about Unix Domain Sockets etc?
There was a problem hiding this comment.
Good point -- added details on this.
We don't currently support UDS.
|
|
||
| FIXME: Is this true? Need to resolve internal discussion and decide | ||
| whether to remove this before merging this gRFC. | ||
| - The `policy.disable_overprovisioning` field must be set to true. |
There was a problem hiding this comment.
I feel this is a good candidate for client feature capabilities.
There was a problem hiding this comment.
Haven't we already added the disable_overprovisioning field to xDS? Wouldn't the capability basically replace that field?
I'm open to doing this via a capability, as long as we can figure out the right transition plan. The reason for the FIXME comment here is that we've been trying to reach consensus on this with @vishalpowar and his team, so we should discuss this further.
There was a problem hiding this comment.
We have decided to do this via a client capability. I added the capability to the xDS API docs in envoyproxy/envoy#10136, and I've updated this gRFC to reflect that choice.
| subscribing to a different type of configuration resource. In the xDS | ||
| API flow, the client uses the following main APIs, in this order: | ||
|
|
||
| - __Listener Discovery Service (LDS)__: Returns `Listener` resources. |
|
|
||
| The `node` field will be populated with the [xDS `Node` | ||
| proto](https://github.com/envoyproxy/data-plane-api/blob/1adb5d54abb0e28ca409254d26fad1cf5535239b/envoy/api/v2/core/base.proto#L85). | ||
| Note that the `build_version` field should not be specified in the |
There was a problem hiding this comment.
For now, we're going to populate build_version, since that's what existing management servers will be looking at. We can add support for the new fields as management servers start to look at them.
| config](https://github.com/grpc/grpc/blob/master/doc/service_config.md), | ||
| which the resolver will return to the channel. | ||
|
|
||
| In particular, the resulting service config will select use of the CDS |
|
|
||
| In particular, the resulting service config will select use of the CDS | ||
| LB policy, described below. The configuration of the CDS policy will | ||
| indicate which `Cluster` resource will be used. |
There was a problem hiding this comment.
I've attempted to clarify the wording here.
| `RouteConfiguration` name specified in the `Listener`. | ||
|
|
||
| Because we are requesting one specific resource by name, the RDS response | ||
| should include exactly one RouteConfiguration. However, the gRPC client |
There was a problem hiding this comment.
Changed to "no more than", since that more accurately conveys the real point here.
| client will make a CDS request asking for that specific cluster name. | ||
|
|
||
| Because we are requesting one specific resource by name, the CDS response | ||
| should include exactly one `Cluster`. However, many existing xDS servers |
| - If the `health_status` field has any value other than HEALTHY or | ||
| UNKNOWN, the entry will be ignored. | ||
| - The `endpoint` field must be set. Inside of it: | ||
| - The `address` field must be set. |
There was a problem hiding this comment.
Good point -- added details on this.
We don't currently support UDS.
|
|
||
| FIXME: Is this true? Need to resolve internal discussion and decide | ||
| whether to remove this before merging this gRFC. | ||
| - The `policy.disable_overprovisioning` field must be set to true. |
There was a problem hiding this comment.
Haven't we already added the disable_overprovisioning field to xDS? Wouldn't the capability basically replace that field?
I'm open to doing this via a capability, as long as we can figure out the right transition plan. The reason for the FIXME comment here is that we've been trying to reach consensus on this with @vishalpowar and his team, so we should discuss this further.
| The `RouteConfiguration` includes a list of zero or more __VirtualHost__ | ||
| resources. The gRPC client will look through this list to find an element | ||
| whose domains field matches the server name specified in the "xds:" URI | ||
| (with port, if any, stripped off). If no matching VirtualHost is found in |
There was a problem hiding this comment.
We are now keeping the port.
|
|
||
| The `RouteConfiguration` includes a list of zero or more __VirtualHost__ | ||
| resources. The gRPC client will look through this list to find an element | ||
| whose domains field matches the server name specified in the "xds:" URI |
There was a problem hiding this comment.
Include how domains are matched?
There was a problem hiding this comment.
I don't think it's necessary to spell that out here, since we basically just do the same thing as Envoy.
a11r
left a comment
There was a problem hiding this comment.
Thanks for the proposal and an engaging discussion . Please update the last-updated date before merging.
htuch
left a comment
There was a problem hiding this comment.
LGTM, I've left some minor optional comments and resolved some threads. Fine to merge as is, would be interested in hearing your take on those comments.
|
|
||
| In the future, we may add support for the incremental ADS variant of | ||
| xDS. However, we have no plans to support any non-aggregated variants | ||
| of xDS, nor do we plan to support REST. |
| However, the different phases of the API flow are still typically | ||
| referred to using the separate service names described above. | ||
|
|
||
| In the future, we may add support for the incremental ADS variant of |
There was a problem hiding this comment.
Are you planning on making any comments wrt v2 vs. v3 xDS now that this is a GA thing?
There was a problem hiding this comment.
Good point. I've added a paragraph saying that we currently support v2 but will upgrade to v3 or beyond at some point in the future, with appropriate transition period.
| proto](https://github.com/envoyproxy/data-plane-api/blob/1adb5d54abb0e28ca409254d26fad1cf5535239b/envoy/api/v2/core/base.proto#L85). | ||
| Note that the `build_version`, `user_agent_name`, `user_agent_version`, and | ||
| `client_features` fields should not be specified in the bootstrap file, since | ||
| they will be populated automatically by the gRPC xDS client. |
There was a problem hiding this comment.
Will gRPC announce the enhanced Node structure with user name/versioning information as per https://github.com/envoyproxy/envoy/pull/9301/files#diff-ebbcbb34815ee818f64401154d5b1de6R161?
There was a problem hiding this comment.
Yes. The new user_agent_name and user_agent_version fields are noted a couple of lines above this.
| When the channel attempts to connect, the xds resolver will | ||
| instantiate an XdsClient object and use it to send an xDS request for | ||
| a `Listener` resource with the name of the server that the client | ||
| wishes to connect to (in the example above, "example.com:123"). If the |
There was a problem hiding this comment.
Existing API docs conflate transport/data model somewhat (which is something I wanted to move away from in UDPA). I am curious whether this design might change with future namespace/attribute support in xDS.
| if the server returns multiple resources, the client will look for the one | ||
| with the name it asked for, and it will ignore the rest of the entries. | ||
| Note that this means that the resource returned by the xDS server must | ||
| have exactly the name specified by the client. |
There was a problem hiding this comment.
Given the above point you make about requiring that management servers be API listener aware, why not also ask them to support LDS selection hints?
There was a problem hiding this comment.
FWIW, I don't think this buys us anything, but it's a bit inconsistent on what can/can't be asked of the management server to the reader.
There was a problem hiding this comment.
The intention here was to try to minimize the set of changes that existing management servers need to make to support gRPC. Existing management servers only support wildcard queries for LDS requests, because that's what Envoy does. So while it's true that management servers will need to add support for API listeners, they don't need to support queries for specific resource names if they don't want to.
It may very well be that this is of limited value, since it's not really hard to support queries for specific resource names. But it also wasn't hard to support this in the client either.
In general, I do certainly agree that the semantics of selecting which resources the client is interested in are fairly messy in xDS. I think this is something we need to continue to discuss going forward.
markdroth
left a comment
There was a problem hiding this comment.
Thanks for the review, Harvey!
| However, the different phases of the API flow are still typically | ||
| referred to using the separate service names described above. | ||
|
|
||
| In the future, we may add support for the incremental ADS variant of |
There was a problem hiding this comment.
Good point. I've added a paragraph saying that we currently support v2 but will upgrade to v3 or beyond at some point in the future, with appropriate transition period.
|
|
||
| In the future, we may add support for the incremental ADS variant of | ||
| xDS. However, we have no plans to support any non-aggregated variants | ||
| of xDS, nor do we plan to support REST. |
| proto](https://github.com/envoyproxy/data-plane-api/blob/1adb5d54abb0e28ca409254d26fad1cf5535239b/envoy/api/v2/core/base.proto#L85). | ||
| Note that the `build_version`, `user_agent_name`, `user_agent_version`, and | ||
| `client_features` fields should not be specified in the bootstrap file, since | ||
| they will be populated automatically by the gRPC xDS client. |
There was a problem hiding this comment.
Yes. The new user_agent_name and user_agent_version fields are noted a couple of lines above this.
| When the channel attempts to connect, the xds resolver will | ||
| instantiate an XdsClient object and use it to send an xDS request for | ||
| a `Listener` resource with the name of the server that the client | ||
| wishes to connect to (in the example above, "example.com:123"). If the |
There was a problem hiding this comment.
I wouldn't think so, but I'm certainly open to discussing it, especially as part of UDPA. Let's talk as we move forward.
| if the server returns multiple resources, the client will look for the one | ||
| with the name it asked for, and it will ignore the rest of the entries. | ||
| Note that this means that the resource returned by the xDS server must | ||
| have exactly the name specified by the client. |
There was a problem hiding this comment.
The intention here was to try to minimize the set of changes that existing management servers need to make to support gRPC. Existing management servers only support wildcard queries for LDS requests, because that's what Envoy does. So while it's true that management servers will need to add support for API listeners, they don't need to support queries for specific resource names if they don't want to.
It may very well be that this is of limited value, since it's not really hard to support queries for specific resource names. But it also wasn't hard to support this in the client either.
In general, I do certainly agree that the semantics of selecting which resources the client is interested in are fairly messy in xDS. I think this is something we need to continue to discuss going forward.
No description provided.