design: subset load balancer design doc by zuercher · Pull Request #1774 · envoyproxy/envoy

zuercher · 2017-09-28T21:09:13Z

This is intended to encapsulate the design from #1735 along with decisions taken in the comments on that PR and the PR for the CDS changes.

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

htuch

This is great doco. The main feedback I have is that I still don't fully grok the trie data structure in use, and I think a simple diagram or worked example showing how it is traversed would be useful. Thanks for your patience here; I can see you have a complete explanation already in the doc but I like to get intuition via simple examples, I'd expect other developers would also find that useful.

htuch · 2017-09-29T02:58:08Z

+subsets of hosts. The selectors exist to limit the combinations of endpoint metadata used for
+creating subsets. We precompute the subsets outside the load balancing path to avoid locking.
+
+Currently the only mechanism for specifying a selector is to provide a list of metadata keys:


This got me thinking; do we really need to worry about recursive match of values in the implementation? Or should we only do a flat match on metadata values?

We only do a flat match. If an endpoint's metadata has mapping from "k" to ProtobufWkt::Struct we treat the struct as the value and the route would have to pass an identical struct to match.

Yeah, I was actually suggesting we just don't both comparing struct values, only consider simple string -> {bool, numeric, string} mappings. It's fine to do struct value though, since we don't pay the cost unless someone uses it.

htuch · 2017-09-29T02:59:15Z

+{`x=3`}). The same keys may appear in multiple selector entries: it is feasible to have both an
+`{a=1, b=2}` subset and an `{a=1}` subset.
+
+On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate


Nit: s/udpate/update/ (maybe just run spell check).

htuch · 2017-09-29T03:00:37Z

+`{a=1, b=2}` subset and an `{a=1}` subset.
+
+On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate
+events on the filtered host sets. The SLB also manages the optional "local HostSet" used for


Code/doc link for local HostSet?

https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/load_balancer_impl.cc#L20

Each LB has a HostSet and may have a *HostSet if zone-aware routing is enabled. (If I understand correctly.)

htuch · 2017-09-29T03:01:07Z

+
+The CDS configuration for the subset selectors is meant to allow future extension. For example:
+
+1. selecting endpoint metadata keys by a prefix or other string matching algorithm


Nit: prefer capital letters at start of sentences.

htuch · 2017-09-29T03:02:10Z

+The CDS configuration for the subset selectors is meant to allow future extension. For example:
+
+1. selecting endpoint metadata keys by a prefix or other string matching algorithm
+2. using a list-typed metadata value to allow a single endpoint to have multiple values for a


This means we don't want to do recursive matching when comparing metadata values then?

Currently we don't.

This is a way to allow a single endpoint to be part of multiple subsets based on a single key. So an endpoint A with k=1 and endpoint B with k=1,2 would both be part of a subset for k=1 (and B would be in a second subset as well). An argument could be made that this is just how it should always work, but I was thinking that the metadata values would be treated more or less opaquely -- we need to be able to hash them and compare them for equality but otherwise it's just a blob.

htuch · 2017-09-29T03:03:41Z

+   metadata key
+
+Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
+`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an


Nit: s/is of/of/

htuch · 2017-09-29T03:04:41Z

+Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
+`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
+`unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The
+`LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates


Nit: LbSubsetMap

htuch · 2017-09-29T03:05:08Z

+Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
+`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
+`unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The
+`LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates


Can you add a diagram of this data structure continuing the above example?

htuch · 2017-09-29T03:14:26Z

+      If not found, exit the loop.
+   3. Assign the `LbSubsetEntry`'s `LbSubsetMap` to `subsets`. (It may be empty.)
+   4. If this is the last key-value pair, assign the `LbSubsetEntry` to `entry`.
+3. If `entry` has been set has a `Subset` value, we found a matching subset, delegate balancing to


Nit: s/has been set//

htuch · 2017-09-29T03:16:45Z

+   metadata key
+
+Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
+`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an


What do these string keys represent? Keys of the subset selectors presumably? Is there then one LbSubsetMap per subset selector? (These would be good to address in the doc rather than comments).

An LbSubsetMap is effectively a std::string -> wrapped(ProtobufWkt::Value) -> LbSubsetEntry. Each LbSubsetEntry may have a nested LbSubsetMap. The strings are values from the subset selector. I'll try to clean this up a bit more -- the picture you suggested will probably help a lot.

mattklein123

@zuercher thanks this is great. IMO a bunch of this will need to end up in the main RST docs, with just implementation details left here, but I think that can be done as part of your other changes once the feature is done. @htuch @rshriram?

mattklein123 · 2017-09-29T03:18:13Z

+   metadata key
+
+Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
+`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an


typo "is of"

mattklein123 · 2017-09-29T15:02:44Z

The main feedback I have is that I still don't fully grok the trie data structure in use, and I think a simple diagram or worked example showing how it is traversed would be useful.

@zuercher I think I understand the data structure (though maybe I don't, and I agree more description would be useful). If I do understand the data structure, I do wonder if it's a premature optimization vs. just linear scan. I wonder how many subsets people are actually going to be dealing with. Dunno. Possibly the trie could be a follow up?

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

zuercher · 2017-10-02T18:51:07Z

As far as switching to a linear search goes, I'd rather stick with what I wrote. We know people are already creating large numbers of clusters, so I don't think it's unreasonable to expect large numbers of subsets. The code specific to creating the structure and looking up values in it is confined to two functions that total about 80 lines of code. The rest of the construction code is related to extracting metadata from hosts and keeping host sets synchronized, and that won't change if we switch to another data structure.

mattklein123 · 2017-10-02T19:44:19Z

As far as switching to a linear search goes, I'd rather stick with what I wrote. We know people are already creating large numbers of clusters, so I don't think it's unreasonable to expect large numbers of subsets. The code specific to creating the structure and looking up values in it is confined to two functions that total about 80 lines of code. The rest of the construction code is related to extracting metadata from hosts and keeping host sets synchronized, and that won't change if we switch to another data structure.

OK that's fine. Will review the new text/diagrams to make sure I actually understand what you are proposing. Thanks for the extra detail.

rshriram · 2017-10-02T20:21:36Z

+``` json
+{
+  "name": "c1",
+  "lb_policy": "ROUND_ROBIN",


Do we maintain LB stats for each possible subset? Especially for things like round robin. Same goes for things such as outlier detection (which is needed on per LB pool basis)

I think that makes sense, but I haven't done anything with the stats yet.

rshriram · 2017-10-02T20:25:37Z

+
+The following headers may then be used to select subsets:
+
+`x-custom-version: 1.2-pre` causes requests to be routed e7. This is an example of routing requests


You might want to add that these headers have nothing to do with lb.metadata names you defined above. OR for clarity, you could change the header values from 1.2-pre to something else, as during first read, it looks like you can specify the metadata selectors in http headers [while I would love that, its not the focus of this doc :) ]

I'll make it clear they don't have to match.

rshriram

This is pretty good!. Just two clarification questions.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

mattklein123 · 2017-10-02T22:01:44Z

Thanks the diagram is how I (roughly) that it would work and makes sense to me.

Do we maintain LB stats for each possible subset? Especially for things like round robin. Same goes for things such as outlier detection (which is needed on per LB pool basis)

A fair amount of thought is going to have to be put into stats. Also, the way this is designed, outlier detection would be across all subsets as a group, not individuals. If we want outlier detection to be subset aware that is I think its own work item.

htuch · 2017-10-03T02:43:47Z

This diagram is great, thanks for adding this, it conveys the intuition I was after on how this works.

htuch · 2017-10-03T02:57:38Z

Small correction to diagram; should {version=1.0, xlarge=true} by just {e1} instead of {e1, e3}?

zuercher · 2017-10-03T16:19:12Z

Yeah, I messed it up. Will fix it.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

* UID override in EDS Signed-off-by: Kuat Yessenov <kuat@google.com> * define constants Signed-off-by: Kuat Yessenov <kuat@google.com> * update field name Signed-off-by: Kuat Yessenov <kuat@google.com> * review Signed-off-by: Kuat Yessenov <kuat@google.com> * fix readme Signed-off-by: Kuat Yessenov <kuat@google.com>

Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Co-authored-by: Alan Chiu <achiu@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

**Description** Switches OTLP default transport from HTTP to gRPC in test fixtures and examples. This prepares for Envoy Gateway integration which currently only supports OTLP/gRPC for access logs. Once Envoy Gateway exposes OTLP/HTTP (pending upstream changes in Envoy), we can switch back. **Related Issues/PRs (if applicable)** - #42445 (OTLP/HTTP access logs in Envoy) - envoyproxy/gateway#7674 (OTLP headers support) Signed-off-by: Adrian Cole <adrian@tetrate.io>

design: subset load balancer design doc

ef397d5

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

htuch reviewed Sep 29, 2017

View reviewed changes

mattklein123 reviewed Sep 29, 2017

View reviewed changes

zuercher added 4 commits September 29, 2017 15:24

fix typos

b288278

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

add LbSubsetMap diagram

e045257

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

fix rendering

4494507

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

fix json format

c9c9934

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

rshriram reviewed Oct 2, 2017

View reviewed changes

rshriram approved these changes Oct 2, 2017

View reviewed changes

clarify example

3d217b1

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

fix mistake in diagram & table

775797d

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

htuch approved these changes Oct 3, 2017

View reviewed changes

mattklein123 approved these changes Oct 3, 2017

View reviewed changes

mattklein123 merged commit 208c099 into envoyproxy:master Oct 3, 2017

zuercher deleted the subset-lb-docs branch October 3, 2017 18:38

jpsim pushed a commit that referenced this pull request Nov 28, 2022

extending propercasing to the last missed cluster (#1774)

8b8dcf2

Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Co-authored-by: Alan Chiu <achiu@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

jpsim pushed a commit that referenced this pull request Nov 29, 2022

extending propercasing to the last missed cluster (#1774)

1dc126d

Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Co-authored-by: Alan Chiu <achiu@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

adisuissa mentioned this pull request Aug 16, 2023

upstream subset lb: add redundant keys support #28874

Merged


		The CDS configuration for the subset selectors is meant to allow future extension. For example:

		1. selecting endpoint metadata keys by a prefix or other string matching algorithm


		The following headers may then be used to select subsets:

		`x-custom-version: 1.2-pre` causes requests to be routed e7. This is an example of routing requests

Conversation

zuercher commented Sep 28, 2017

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch Oct 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 commented Sep 29, 2017

Uh oh!

zuercher commented Oct 2, 2017

Uh oh!

mattklein123 commented Oct 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rshriram left a comment

Choose a reason for hiding this comment

Uh oh!

mattklein123 commented Oct 2, 2017

Uh oh!

htuch commented Oct 3, 2017

Uh oh!

htuch commented Oct 3, 2017

Uh oh!

zuercher commented Oct 3, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

htuch Oct 3, 2017 •

edited

Loading