Skip to content

design: subset load balancer design doc#1774

Merged
mattklein123 merged 7 commits intoenvoyproxy:masterfrom
turbinelabs:subset-lb-docs
Oct 3, 2017
Merged

design: subset load balancer design doc#1774
mattklein123 merged 7 commits intoenvoyproxy:masterfrom
turbinelabs:subset-lb-docs

Conversation

@zuercher
Copy link
Copy Markdown
Member

This is intended to encapsulate the design from #1735 along with decisions taken in the comments on that PR and the PR for the CDS changes.

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great doco. The main feedback I have is that I still don't fully grok the trie data structure in use, and I think a simple diagram or worked example showing how it is traversed would be useful. Thanks for your patience here; I can see you have a complete explanation already in the doc but I like to get intuition via simple examples, I'd expect other developers would also find that useful.

subsets of hosts. The selectors exist to limit the combinations of endpoint metadata used for
creating subsets. We precompute the subsets outside the load balancing path to avoid locking.

Currently the only mechanism for specifying a selector is to provide a list of metadata keys:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got me thinking; do we really need to worry about recursive match of values in the implementation? Or should we only do a flat match on metadata values?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only do a flat match. If an endpoint's metadata has mapping from "k" to ProtobufWkt::Struct we treat the struct as the value and the route would have to pass an identical struct to match.

Copy link
Copy Markdown
Member

@htuch htuch Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was actually suggesting we just don't both comparing struct values, only consider simple string -> {bool, numeric, string} mappings. It's fine to do struct value though, since we don't pay the cost unless someone uses it.

Comment thread source/docs/subset_load_balancer.md Outdated
{`x=3`}). The same keys may appear in multiple selector entries: it is feasible to have both an
`{a=1, b=2}` subset and an `{a=1}` subset.

On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/udpate/update/ (maybe just run spell check).

`{a=1, b=2}` subset and an `{a=1}` subset.

On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate
events on the filtered host sets. The SLB also manages the optional "local HostSet" used for
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code/doc link for local HostSet?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/load_balancer_impl.cc#L20

Each LB has a HostSet and may have a *HostSet if zone-aware routing is enabled. (If I understand correctly.)

Comment thread source/docs/subset_load_balancer.md Outdated

The CDS configuration for the subset selectors is meant to allow future extension. For example:

1. selecting endpoint metadata keys by a prefix or other string matching algorithm
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer capital letters at start of sentences.

Comment thread source/docs/subset_load_balancer.md Outdated
The CDS configuration for the subset selectors is meant to allow future extension. For example:

1. selecting endpoint metadata keys by a prefix or other string matching algorithm
2. using a list-typed metadata value to allow a single endpoint to have multiple values for a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we don't want to do recursive matching when comparing metadata values then?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't.

This is a way to allow a single endpoint to be part of multiple subsets based on a single key. So an endpoint A with k=1 and endpoint B with k=1,2 would both be part of a subset for k=1 (and B would be in a second subset as well). An argument could be made that this is just how it should always work, but I was thinking that the metadata values would be treated more or less opaquely -- we need to be able to hash them and compare them for equality but otherwise it's just a blob.

Comment thread source/docs/subset_load_balancer.md Outdated
metadata key

Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/is of/of/

Comment thread source/docs/subset_load_balancer.md Outdated
Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
`unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The
`LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LbSubsetMap

Comment thread source/docs/subset_load_balancer.md Outdated
Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
`unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The
`LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a diagram of this data structure continuing the above example?

Comment thread source/docs/subset_load_balancer.md Outdated
If not found, exit the loop.
3. Assign the `LbSubsetEntry`'s `LbSubsetMap` to `subsets`. (It may be empty.)
4. If this is the last key-value pair, assign the `LbSubsetEntry` to `entry`.
3. If `entry` has been set has a `Subset` value, we found a matching subset, delegate balancing to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/has been set//

Comment thread source/docs/subset_load_balancer.md Outdated
metadata key

Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these string keys represent? Keys of the subset selectors presumably? Is there then one LbSubsetMap per subset selector? (These would be good to address in the doc rather than comments).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An LbSubsetMap is effectively a std::string -> wrapped(ProtobufWkt::Value) -> LbSubsetEntry. Each LbSubsetEntry may have a nested LbSubsetMap. The strings are values from the subset selector. I'll try to clean this up a bit more -- the picture you suggested will probably help a lot.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher thanks this is great. IMO a bunch of this will need to end up in the main RST docs, with just implementation details left here, but I think that can be done as part of your other changes once the feature is done. @htuch @rshriram?

Comment thread source/docs/subset_load_balancer.md Outdated
metadata key

Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An
`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "is of"

@mattklein123
Copy link
Copy Markdown
Member

The main feedback I have is that I still don't fully grok the trie data structure in use, and I think a simple diagram or worked example showing how it is traversed would be useful.

@zuercher I think I understand the data structure (though maybe I don't, and I agree more description would be useful). If I do understand the data structure, I do wonder if it's a premature optimization vs. just linear scan. I wonder how many subsets people are actually going to be dealing with. Dunno. Possibly the trie could be a follow up?

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher
Copy link
Copy Markdown
Member Author

zuercher commented Oct 2, 2017

As far as switching to a linear search goes, I'd rather stick with what I wrote. We know people are already creating large numbers of clusters, so I don't think it's unreasonable to expect large numbers of subsets. The code specific to creating the structure and looking up values in it is confined to two functions that total about 80 lines of code. The rest of the construction code is related to extracting metadata from hosts and keeping host sets synchronized, and that won't change if we switch to another data structure.

@mattklein123
Copy link
Copy Markdown
Member

As far as switching to a linear search goes, I'd rather stick with what I wrote. We know people are already creating large numbers of clusters, so I don't think it's unreasonable to expect large numbers of subsets. The code specific to creating the structure and looking up values in it is confined to two functions that total about 80 lines of code. The rest of the construction code is related to extracting metadata from hosts and keeping host sets synchronized, and that won't change if we switch to another data structure.

OK that's fine. Will review the new text/diagrams to make sure I actually understand what you are proposing. Thanks for the extra detail.

``` json
{
"name": "c1",
"lb_policy": "ROUND_ROBIN",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we maintain LB stats for each possible subset? Especially for things like round robin. Same goes for things such as outlier detection (which is needed on per LB pool basis)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, but I haven't done anything with the stats yet.

Comment thread source/docs/subset_load_balancer.md Outdated

The following headers may then be used to select subsets:

`x-custom-version: 1.2-pre` causes requests to be routed e7. This is an example of routing requests
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add that these headers have nothing to do with lb.metadata names you defined above. OR for clarity, you could change the header values from 1.2-pre to something else, as during first read, it looks like you can specify the metadata selectors in http headers [while I would love that, its not the focus of this doc :) ]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it clear they don't have to match.

Copy link
Copy Markdown
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty good!. Just two clarification questions.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@mattklein123
Copy link
Copy Markdown
Member

Thanks the diagram is how I (roughly) that it would work and makes sense to me.

Do we maintain LB stats for each possible subset? Especially for things like round robin. Same goes for things such as outlier detection (which is needed on per LB pool basis)

A fair amount of thought is going to have to be put into stats. Also, the way this is designed, outlier detection would be across all subsets as a group, not individuals. If we want outlier detection to be subset aware that is I think its own work item.

@htuch
Copy link
Copy Markdown
Member

htuch commented Oct 3, 2017

This diagram is great, thanks for adding this, it conveys the intuition I was after on how this works.

@htuch
Copy link
Copy Markdown
Member

htuch commented Oct 3, 2017

Small correction to diagram; should {version=1.0, xlarge=true} by just {e1} instead of {e1, e3}?

@zuercher
Copy link
Copy Markdown
Member Author

zuercher commented Oct 3, 2017

Yeah, I messed it up. Will fix it.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@mattklein123 mattklein123 merged commit 208c099 into envoyproxy:master Oct 3, 2017
@zuercher zuercher deleted the subset-lb-docs branch October 3, 2017 18:38
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
* UID override in EDS

Signed-off-by: Kuat Yessenov <kuat@google.com>

* define constants

Signed-off-by: Kuat Yessenov <kuat@google.com>

* update field name

Signed-off-by: Kuat Yessenov <kuat@google.com>

* review

Signed-off-by: Kuat Yessenov <kuat@google.com>

* fix readme

Signed-off-by: Kuat Yessenov <kuat@google.com>
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Co-authored-by: Alan Chiu <achiu@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Co-authored-by: Alan Chiu <achiu@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
mathetake pushed a commit that referenced this pull request Mar 3, 2026
**Description**

Switches OTLP default transport from HTTP to gRPC in test fixtures and
examples.

This prepares for Envoy Gateway integration which currently only
supports OTLP/gRPC for access logs. Once Envoy Gateway exposes OTLP/HTTP
(pending upstream changes in Envoy), we can switch back.

**Related Issues/PRs (if applicable)**

- #42445 (OTLP/HTTP access logs in Envoy)
- envoyproxy/gateway#7674 (OTLP headers support)

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants