design: subset load balancer design doc#1774
Conversation
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
htuch
left a comment
There was a problem hiding this comment.
This is great doco. The main feedback I have is that I still don't fully grok the trie data structure in use, and I think a simple diagram or worked example showing how it is traversed would be useful. Thanks for your patience here; I can see you have a complete explanation already in the doc but I like to get intuition via simple examples, I'd expect other developers would also find that useful.
| subsets of hosts. The selectors exist to limit the combinations of endpoint metadata used for | ||
| creating subsets. We precompute the subsets outside the load balancing path to avoid locking. | ||
|
|
||
| Currently the only mechanism for specifying a selector is to provide a list of metadata keys: |
There was a problem hiding this comment.
This got me thinking; do we really need to worry about recursive match of values in the implementation? Or should we only do a flat match on metadata values?
There was a problem hiding this comment.
We only do a flat match. If an endpoint's metadata has mapping from "k" to ProtobufWkt::Struct we treat the struct as the value and the route would have to pass an identical struct to match.
There was a problem hiding this comment.
Yeah, I was actually suggesting we just don't both comparing struct values, only consider simple string -> {bool, numeric, string} mappings. It's fine to do struct value though, since we don't pay the cost unless someone uses it.
| {`x=3`}). The same keys may appear in multiple selector entries: it is feasible to have both an | ||
| `{a=1, b=2}` subset and an `{a=1}` subset. | ||
|
|
||
| On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate |
There was a problem hiding this comment.
Nit: s/udpate/update/ (maybe just run spell check).
| `{a=1, b=2}` subset and an `{a=1}` subset. | ||
|
|
||
| On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate | ||
| events on the filtered host sets. The SLB also manages the optional "local HostSet" used for |
There was a problem hiding this comment.
Each LB has a HostSet and may have a *HostSet if zone-aware routing is enabled. (If I understand correctly.)
|
|
||
| The CDS configuration for the subset selectors is meant to allow future extension. For example: | ||
|
|
||
| 1. selecting endpoint metadata keys by a prefix or other string matching algorithm |
There was a problem hiding this comment.
Nit: prefer capital letters at start of sentences.
| The CDS configuration for the subset selectors is meant to allow future extension. For example: | ||
|
|
||
| 1. selecting endpoint metadata keys by a prefix or other string matching algorithm | ||
| 2. using a list-typed metadata value to allow a single endpoint to have multiple values for a |
There was a problem hiding this comment.
This means we don't want to do recursive matching when comparing metadata values then?
There was a problem hiding this comment.
Currently we don't.
This is a way to allow a single endpoint to be part of multiple subsets based on a single key. So an endpoint A with k=1 and endpoint B with k=1,2 would both be part of a subset for k=1 (and B would be in a second subset as well). An argument could be made that this is just how it should always work, but I was thinking that the metadata values would be treated more or less opaquely -- we need to be able to hash them and compare them for equality but otherwise it's just a blob.
| metadata key | ||
|
|
||
| Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An | ||
| `LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an |
| Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An | ||
| `LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an | ||
| `unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The | ||
| `LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates |
| Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An | ||
| `LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an | ||
| `unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The | ||
| `LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates |
There was a problem hiding this comment.
Can you add a diagram of this data structure continuing the above example?
| If not found, exit the loop. | ||
| 3. Assign the `LbSubsetEntry`'s `LbSubsetMap` to `subsets`. (It may be empty.) | ||
| 4. If this is the last key-value pair, assign the `LbSubsetEntry` to `entry`. | ||
| 3. If `entry` has been set has a `Subset` value, we found a matching subset, delegate balancing to |
| metadata key | ||
|
|
||
| Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An | ||
| `LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an |
There was a problem hiding this comment.
What do these string keys represent? Keys of the subset selectors presumably? Is there then one LbSubsetMap per subset selector? (These would be good to address in the doc rather than comments).
There was a problem hiding this comment.
An LbSubsetMap is effectively a std::string -> wrapped(ProtobufWkt::Value) -> LbSubsetEntry. Each LbSubsetEntry may have a nested LbSubsetMap. The strings are values from the subset selector. I'll try to clean this up a bit more -- the picture you suggested will probably help a lot.
| metadata key | ||
|
|
||
| Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An | ||
| `LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an |
@zuercher I think I understand the data structure (though maybe I don't, and I agree more description would be useful). If I do understand the data structure, I do wonder if it's a premature optimization vs. just linear scan. I wonder how many subsets people are actually going to be dealing with. Dunno. Possibly the trie could be a follow up? |
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
|
As far as switching to a linear search goes, I'd rather stick with what I wrote. We know people are already creating large numbers of clusters, so I don't think it's unreasonable to expect large numbers of subsets. The code specific to creating the structure and looking up values in it is confined to two functions that total about 80 lines of code. The rest of the construction code is related to extracting metadata from hosts and keeping host sets synchronized, and that won't change if we switch to another data structure. |
OK that's fine. Will review the new text/diagrams to make sure I actually understand what you are proposing. Thanks for the extra detail. |
| ``` json | ||
| { | ||
| "name": "c1", | ||
| "lb_policy": "ROUND_ROBIN", |
There was a problem hiding this comment.
Do we maintain LB stats for each possible subset? Especially for things like round robin. Same goes for things such as outlier detection (which is needed on per LB pool basis)
There was a problem hiding this comment.
I think that makes sense, but I haven't done anything with the stats yet.
|
|
||
| The following headers may then be used to select subsets: | ||
|
|
||
| `x-custom-version: 1.2-pre` causes requests to be routed e7. This is an example of routing requests |
There was a problem hiding this comment.
You might want to add that these headers have nothing to do with lb.metadata names you defined above. OR for clarity, you could change the header values from 1.2-pre to something else, as during first read, it looks like you can specify the metadata selectors in http headers [while I would love that, its not the focus of this doc :) ]
There was a problem hiding this comment.
I'll make it clear they don't have to match.
rshriram
left a comment
There was a problem hiding this comment.
This is pretty good!. Just two clarification questions.
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
|
Thanks the diagram is how I (roughly) that it would work and makes sense to me.
A fair amount of thought is going to have to be put into stats. Also, the way this is designed, outlier detection would be across all subsets as a group, not individuals. If we want outlier detection to be subset aware that is I think its own work item. |
|
This diagram is great, thanks for adding this, it conveys the intuition I was after on how this works. |
|
Small correction to diagram; should |
|
Yeah, I messed it up. Will fix it. |
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
* UID override in EDS Signed-off-by: Kuat Yessenov <kuat@google.com> * define constants Signed-off-by: Kuat Yessenov <kuat@google.com> * update field name Signed-off-by: Kuat Yessenov <kuat@google.com> * review Signed-off-by: Kuat Yessenov <kuat@google.com> * fix readme Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Co-authored-by: Alan Chiu <achiu@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Co-authored-by: Alan Chiu <achiu@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
**Description** Switches OTLP default transport from HTTP to gRPC in test fixtures and examples. This prepares for Envoy Gateway integration which currently only supports OTLP/gRPC for access logs. Once Envoy Gateway exposes OTLP/HTTP (pending upstream changes in Envoy), we can switch back. **Related Issues/PRs (if applicable)** - #42445 (OTLP/HTTP access logs in Envoy) - envoyproxy/gateway#7674 (OTLP headers support) Signed-off-by: Adrian Cole <adrian@tetrate.io>
This is intended to encapsulate the design from #1735 along with decisions taken in the comments on that PR and the PR for the CDS changes.
Signed-off-by: Stephan Zuercher stephan@turbinelabs.io