From ef397d5e7f52ccf6cf66ee47a5d52afa73833f6f Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Thu, 28 Sep 2017 14:06:47 -0700 Subject: [PATCH 1/7] design: subset load balancer design doc Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer.md | 263 ++++++++++++++++++++++++++++ 1 file changed, 263 insertions(+) create mode 100644 source/docs/subset_load_balancer.md diff --git a/source/docs/subset_load_balancer.md b/source/docs/subset_load_balancer.md new file mode 100644 index 0000000000000..12dd28e61a753 --- /dev/null +++ b/source/docs/subset_load_balancer.md @@ -0,0 +1,263 @@ +### Overview + +The subset load balancer (SLB) divides the upstream hosts in a cluster into one or more subsets. At +request time the SLB uses information from the `LoadBalancerContext` to choose one of its +subsets. Choosing a host is then delegated to the subset's load balancer. If no subset matches the +context, the SLB falls back (depending on configuration) to balancing over a default subset, +balancing over any upstream host in the cluster, or returning no host. + +Load balancing within a subset is accomplished by constructing one of the existing load balancer +types with `Upstream::HostSet` that presents a filtered copy of the upstream hosts. All load +balancer types except the Original DST load balancer may be used for subset load balancing. + +### Fallback + +The SLB can be configured with one of three fallback policies. If no subset matching the +`LoadBalancerContext` is found: + +1. `NO_ENDPOINT` specifies that `chooseHost` returns `nullptr` and load balancing fails. +2. `ANY_ENDPOINT` specifies that load balancing occurs over the entire set of upstream hosts. +3. `DEFAULT_SUBSET` specifies that load balancing occurs over a specific subset of upstream + hosts. If the default subset is empty, `chooseHost` returns `nullptr` and load balancing fails. + +During construction, if the fallback policy is `ANY_ENDPOINT`, a default subset is constructed +using the original `Upstream::HostSet`. If the fallback policy is `DEFAULT_SUBSET`, but the +configuration does not specify any metadata (e.g. all hosts match), the SLB changes the fallback +policy to `ANY_ENDPOINT`. + +### Selecting Subsets + +The initial implementation supports selecting subsets by endpoint metadata provided via EDS. + +The configuration specifies a list of subset selectors. Each selector is used, in turn, to create +subsets of hosts. The selectors exist to limit the combinations of endpoint metadata used for +creating subsets. We precompute the subsets outside the load balancing path to avoid locking. + +Currently the only mechanism for specifying a selector is to provide a list of metadata keys: + +``` json +{ + "subset_selectors": [ + { "keys": [ "a", "b" ] }, + { "keys": [ "x" ] } + ] +} +``` + +For each selector, the SLB iterates over the hosts and inspects the host's metadata for the +`"envoy.lb"` filter. If a host's metadata provides values for each key, a subset is created for the +metadata. For example, given the selectors above, if a host's metadata contains `{a=1, b=2}`, a +subset is created for `{a=1, b=2}`. Other hosts with `{a=1, b=2}` are also included in the subset. +A host with metadata like `{a=1, b=2, x=3}` is included in two subsets (`{a=1, b=2}` and +{`x=3`}). The same keys may appear in multiple selector entries: it is feasible to have both an +`{a=1, b=2}` subset and an `{a=1}` subset. + +On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate +events on the filtered host sets. The SLB also manages the optional "local HostSet" used for +zone-aware routing. + +The CDS configuration for the subset selectors is meant to allow future extension. For example: + +1. selecting endpoint metadata keys by a prefix or other string matching algorithm +2. using a list-typed metadata value to allow a single endpoint to have multiple values for a + metadata key + +Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An +`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an +`unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The +`LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates +the filtered `Upstream::HostSet` and `Upstream::LoadBalancer` for a subset. + +`ProtobufWkt::Value` is wrapped to provide a cached hash value for the value. Currently, +`ProtobufWkt::Value` is hashed by first encoding the value as a string and then hashing the +string. By wrapping it, we can compute the hash value outside the request path for both the +metadata values provided in `LoadBalancerContext` and those used internally by the SLB. + +### Subset Lookup + +Currently we require the metadata provided in `LoadBalancerContext` to match a subset exactly in +order to select the subset for load balancing. Changing this behavior has implications for the +performance of the subset selection algorithm. The current algorithm, described below, runs in +`O(N)` time with respect to the number of metadata key-value pairs in the `LoadBalancerContext`. + +The metadata key-value pairs from `LoadBalancerContext` must be sorted by key for the algorithm to +work. Currently we expect lexical order, but the sort order doesn't matter as long as both the +context and load balancer use the same ordering. Sorting of the `LoadBalancerContext` keys is +currently handled by `Router::RouteEntryImplBase`. + +Given a sequence of N metadata keys and values (previously sorted lexically by key) from +`LoadBalancerContext`, we can look up the appropriate subset in `O(N)` time as follows: + +1. Initialize `subsets` to refer to the root `LbSubsetMap` and `entry` to point at a null + `LbSubsetEntryPtr`. +2. For each key-value in the metadata: + 1. Lookup the key in `subsets` to find a `ValueSubsetMap`. (Average constant time.) If not + found, exit the loop. + 2. Lookup the value in the `ValueSubsetMap` to find an `LbSubsetEntry`. (Average constant time.) + If not found, exit the loop. + 3. Assign the `LbSubsetEntry`'s `LbSubsetMap` to `subsets`. (It may be empty.) + 4. If this is the last key-value pair, assign the `LbSubsetEntry` to `entry`. +3. If `entry` has been set has a `Subset` value, we found a matching subset, delegate balancing to + the subset's load balancer. +4. Otherwise, execute the fallback policy. + +N.B. `O(N)` complexity presumes that the delegate load balancer executes in constant time. + +### Example + +Assume a set of hosts from EDS with the following metadata, assigned to a single cluster. + +Endpoint | stage | version | type +---------|-------|---------|------- +e1 | prod | 1.0 | std +e2 | prod | 1.0 | std +e3 | prod | 1.1 | std +e4 | prod | 1.1 | std +e5 | prod | 1.0 | bigmem +e6 | prod | 1.1 | bigmem +e7 | dev | 1.2-pre | std + +Given this CDS `envoy::api::v2::Cluster`: + +``` json +{ + "name": "c1", + "lb_policy": "ROUND_ROBIN", + "lb_subset_config": { + "fallback_policy": "DEFAULT_SUBSET", + "default_subset": { + "stage": "prod", + "version": "1.0", + "type": "std" + }, + "subset_selectors": [ + { "keys": [ "stage", "type" ] }, + { "keys": [ "stage", "version" ] }, + { "keys": [ "version" ] } + ] + } +} +``` + +The following subsets are created: + +`stage=prod, type=std` (e1, e2, e3, e4) +`stage=prod, type=bigmem` (e5, e6) +`stage=dev, type=std` (e7) +`stage=prod, version=1.0` (e1, e2, e5) +`stage=prod, version=1.1` (e3, e4, e6) +`stage=dev, version=1.2-pre` (e7) +`version=1.0` (e1, e2, e5) +`version=1.1` (e3, e4, e6) +`version=1.2-pre` (e7) + +In addition, a default subset is created: + +`stage=prod, type=std, version=1.0` (e1, e2) + +Given these `envoy::api::v2::Route` entries: + +``` json +"routes": [ + { + "match": { + "prefix": "/", + "headers": [ + { + "name": "x-custom-version", + "value": "1.2-pre" + } + ] + }, + "route": { + "cluster": "c1", + "metadata_match": { + "filter_metadata": { + "envoy.lb": { + "version": "1.2-pre", + "stage": "dev" + } + } + } + } + }, + { + "match": { + "prefix": "/", + "headers": [ + { + "name": "x-hardware-test", + "value": "bigmem" + } + ] + }, + "route": { + "cluster": "c1", + "metadata_match": { + "filter_metadata": { + "envoy.lb": { + "type": "bigmem", + "stage": "prod" + } + } + } + } + }, + { + "match": { + "prefix": "/" + }, + "route": { + "weighted_clusters": { + clusters: [ + { + "name": "c1", + "weight": 90, + "metadata_match": { + "filter_metadata": { + "envoy.lb": { + "version": "1.0" + } + } + } + }, + { + "name": "c1", + "weight": 10, + "metadata_match": { + "filter_metadata": { + "envoy.lb": { + "version": "1.1" + } + } + } + } + ] + }, + "metadata_match": { + "filter_metadata": { + "envoy.lb": { + "stage": "prod", + } + } + } + } + } +] +``` + +The following headers may then be used to select subsets: + +`x-custom-version: 1.2-pre` causes requests to be routed e7. This is an example of routing requests +to a developer launched instance for pre-release testing. If the e7 upstream leaves the cluster, +the subset is removed and further requests with this header are routed to the default subset +(containing e1 and e2). + +`x-hardware-test: bigmem` causes requests to be load balanced over the e5 and e6 endpoints. This is +an example of routing requests to upstreams running on a particular class of hardware, perhaps for +load testing. If the bigmem hosts are removed from service, further requests with this header are +routed to the default subset. + +Otherwise, requests without those headers are split between two subsets. 90% of the requests are +routed to `stage=prod, version=1.0` (e1, e2, e5). 10% of the requests are routed to `stage=prod, +version=1.1` (e3, e4, e6). This is an example of gradually shifting traffic to a new version. From b288278acd3961722de89104a513efab520c13e4 Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Fri, 29 Sep 2017 15:24:28 -0700 Subject: [PATCH 2/7] fix typos Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/source/docs/subset_load_balancer.md b/source/docs/subset_load_balancer.md index 12dd28e61a753..3e1ec7ec6869b 100644 --- a/source/docs/subset_load_balancer.md +++ b/source/docs/subset_load_balancer.md @@ -52,21 +52,21 @@ A host with metadata like `{a=1, b=2, x=3}` is included in two subsets (`{a=1, b {`x=3`}). The same keys may appear in multiple selector entries: it is feasible to have both an `{a=1, b=2}` subset and an `{a=1}` subset. -On update, the SLB divides the hosts added into the appropriate subset(s) and triggers udpate +On update, the SLB divides the hosts added into the appropriate subset(s) and triggers update events on the filtered host sets. The SLB also manages the optional "local HostSet" used for zone-aware routing. The CDS configuration for the subset selectors is meant to allow future extension. For example: -1. selecting endpoint metadata keys by a prefix or other string matching algorithm -2. using a list-typed metadata value to allow a single endpoint to have multiple values for a - metadata key +1. Selecting endpoint metadata keys by a prefix or other string matching algorithm, or +2. Using a list-typed metadata value to allow a single endpoint to have multiple values for a + metadata key. Subsets are stored in a trie-like fashion. Keys in the selectors are lexically sorted. An -`LbSubsetMap` is an `unordered_map` is of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an +`LbSubsetMap` is an `unordered_map` of string keys to `ValueSubsetMap`. `ValueSubsetMap` is an `unordered_map` of (wrapped, see below) `ProtobufWkt::Value` to `LbSubsetEntry`. The -`LbSubsetEntry` may contain an `LbSubetMap` of additional keys or a `Subset`. `Subset` encapsulates -the filtered `Upstream::HostSet` and `Upstream::LoadBalancer` for a subset. +`LbSubsetEntry` may contain an `LbSubsetMap` of additional keys or a `Subset`. `Subset` +encapsulates the filtered `Upstream::HostSet` and `Upstream::LoadBalancer` for a subset. `ProtobufWkt::Value` is wrapped to provide a cached hash value for the value. Currently, `ProtobufWkt::Value` is hashed by first encoding the value as a string and then hashing the @@ -97,8 +97,8 @@ Given a sequence of N metadata keys and values (previously sorted lexically by k If not found, exit the loop. 3. Assign the `LbSubsetEntry`'s `LbSubsetMap` to `subsets`. (It may be empty.) 4. If this is the last key-value pair, assign the `LbSubsetEntry` to `entry`. -3. If `entry` has been set has a `Subset` value, we found a matching subset, delegate balancing to - the subset's load balancer. +3. If `entry` has been set and has a `Subset` value, we found a matching subset, delegate balancing + to the subset's load balancer. 4. Otherwise, execute the fallback policy. N.B. `O(N)` complexity presumes that the delegate load balancer executes in constant time. From e045257dfa57835c8497c07b46f40e2116205192 Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Mon, 2 Oct 2017 11:17:31 -0700 Subject: [PATCH 3/7] add LbSubsetMap diagram Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer.md | 32 +++++++++++++------- source/docs/subset_load_balancer_diagram.svg | 4 +++ 2 files changed, 25 insertions(+), 11 deletions(-) create mode 100644 source/docs/subset_load_balancer_diagram.svg diff --git a/source/docs/subset_load_balancer.md b/source/docs/subset_load_balancer.md index 3e1ec7ec6869b..055bab263e48a 100644 --- a/source/docs/subset_load_balancer.md +++ b/source/docs/subset_load_balancer.md @@ -86,7 +86,8 @@ context and load balancer use the same ordering. Sorting of the `LoadBalancerCon currently handled by `Router::RouteEntryImplBase`. Given a sequence of N metadata keys and values (previously sorted lexically by key) from -`LoadBalancerContext`, we can look up the appropriate subset in `O(N)` time as follows: +`LoadBalancerContext`, we can look up the appropriate subset in `O(N)` time as follows. It may be +helpful to look at the [diagram](#diagram) provided in the example. 1. Initialize `subsets` to refer to the root `LbSubsetMap` and `entry` to point at a null `LbSubsetEntryPtr`. @@ -107,15 +108,17 @@ N.B. `O(N)` complexity presumes that the delegate load balancer executes in cons Assume a set of hosts from EDS with the following metadata, assigned to a single cluster. -Endpoint | stage | version | type ----------|-------|---------|------- -e1 | prod | 1.0 | std -e2 | prod | 1.0 | std -e3 | prod | 1.1 | std -e4 | prod | 1.1 | std -e5 | prod | 1.0 | bigmem -e6 | prod | 1.1 | bigmem -e7 | dev | 1.2-pre | std +Endpoint | stage | version | type | xlarge +---------|-------|---------|--------|------- +e1 | prod | 1.0 | std | true +e2 | prod | 1.0 | std | +e3 | prod | 1.1 | std | true +e4 | prod | 1.1 | std | +e5 | prod | 1.0 | bigmem | +e6 | prod | 1.1 | bigmem | +e7 | dev | 1.2-pre | std | + +Note: Only e1 and e3 have the "xlarge" metadata key. Given this CDS `envoy::api::v2::Cluster`: @@ -133,7 +136,8 @@ Given this CDS `envoy::api::v2::Cluster`: "subset_selectors": [ { "keys": [ "stage", "type" ] }, { "keys": [ "stage", "version" ] }, - { "keys": [ "version" ] } + { "keys": [ "version" ] }, + { "keys": [ "xlarge", "version" ] }, ] } } @@ -150,11 +154,17 @@ The following subsets are created: `version=1.0` (e1, e2, e5) `version=1.1` (e3, e4, e6) `version=1.2-pre` (e7) +`version=1.0, xlarge=true` (e1, e3) In addition, a default subset is created: `stage=prod, type=std, version=1.0` (e1, e2) +After loading this configuration, the SLB's `LbSubsetMap` looks like this: + + +![LbSubsetMap Diagram](subset_load_balancer_diagram.svg) + Given these `envoy::api::v2::Route` entries: ``` json diff --git a/source/docs/subset_load_balancer_diagram.svg b/source/docs/subset_load_balancer_diagram.svg new file mode 100644 index 0000000000000..cafccaae696aa --- /dev/null +++ b/source/docs/subset_load_balancer_diagram.svg @@ -0,0 +1,4 @@ + + + + From 44945079569189975e7ab9c63966dafc69e7f96d Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Mon, 2 Oct 2017 11:23:47 -0700 Subject: [PATCH 4/7] fix rendering Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer_diagram.svg | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/docs/subset_load_balancer_diagram.svg b/source/docs/subset_load_balancer_diagram.svg index cafccaae696aa..6217f824f632d 100644 --- a/source/docs/subset_load_balancer_diagram.svg +++ b/source/docs/subset_load_balancer_diagram.svg @@ -1,4 +1,4 @@ - + From c9c9934ec898ad2afcc56435b32abcaa277f833b Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Mon, 2 Oct 2017 11:25:07 -0700 Subject: [PATCH 5/7] fix json format Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/docs/subset_load_balancer.md b/source/docs/subset_load_balancer.md index 055bab263e48a..c72b784007361 100644 --- a/source/docs/subset_load_balancer.md +++ b/source/docs/subset_load_balancer.md @@ -219,7 +219,7 @@ Given these `envoy::api::v2::Route` entries: }, "route": { "weighted_clusters": { - clusters: [ + "clusters": [ { "name": "c1", "weight": 90, From 3d217b197cdcb7150dd23e331ba490f18be89230 Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Mon, 2 Oct 2017 13:52:10 -0700 Subject: [PATCH 6/7] clarify example Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/source/docs/subset_load_balancer.md b/source/docs/subset_load_balancer.md index c72b784007361..8ecf326a95ac9 100644 --- a/source/docs/subset_load_balancer.md +++ b/source/docs/subset_load_balancer.md @@ -175,7 +175,7 @@ Given these `envoy::api::v2::Route` entries: "headers": [ { "name": "x-custom-version", - "value": "1.2-pre" + "value": "pre-release" } ] }, @@ -197,7 +197,7 @@ Given these `envoy::api::v2::Route` entries: "headers": [ { "name": "x-hardware-test", - "value": "bigmem" + "value": "memory" } ] }, @@ -258,12 +258,12 @@ Given these `envoy::api::v2::Route` entries: The following headers may then be used to select subsets: -`x-custom-version: 1.2-pre` causes requests to be routed e7. This is an example of routing requests -to a developer launched instance for pre-release testing. If the e7 upstream leaves the cluster, -the subset is removed and further requests with this header are routed to the default subset -(containing e1 and e2). +`x-custom-version: pre-release` causes requests to be routed e7. This is an example of routing +requests to a developer launched instance for pre-release testing. If the e7 upstream leaves the +cluster, the subset is removed and further requests with this header are routed to the default +subset (containing e1 and e2). -`x-hardware-test: bigmem` causes requests to be load balanced over the e5 and e6 endpoints. This is +`x-hardware-test: memory` causes requests to be load balanced over the e5 and e6 endpoints. This is an example of routing requests to upstreams running on a particular class of hardware, perhaps for load testing. If the bigmem hosts are removed from service, further requests with this header are routed to the default subset. From 775797dd066363b3ace6b8af9792c944919b4e71 Mon Sep 17 00:00:00 2001 From: Stephan Zuercher Date: Tue, 3 Oct 2017 10:39:09 -0700 Subject: [PATCH 7/7] fix mistake in diagram & table Signed-off-by: Stephan Zuercher --- source/docs/subset_load_balancer.md | 6 +++--- source/docs/subset_load_balancer_diagram.svg | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/source/docs/subset_load_balancer.md b/source/docs/subset_load_balancer.md index 8ecf326a95ac9..8041eab8238b5 100644 --- a/source/docs/subset_load_balancer.md +++ b/source/docs/subset_load_balancer.md @@ -112,13 +112,13 @@ Endpoint | stage | version | type | xlarge ---------|-------|---------|--------|------- e1 | prod | 1.0 | std | true e2 | prod | 1.0 | std | -e3 | prod | 1.1 | std | true +e3 | prod | 1.1 | std | e4 | prod | 1.1 | std | e5 | prod | 1.0 | bigmem | e6 | prod | 1.1 | bigmem | e7 | dev | 1.2-pre | std | -Note: Only e1 and e3 have the "xlarge" metadata key. +Note: Only e1 has the "xlarge" metadata key. Given this CDS `envoy::api::v2::Cluster`: @@ -154,7 +154,7 @@ The following subsets are created: `version=1.0` (e1, e2, e5) `version=1.1` (e3, e4, e6) `version=1.2-pre` (e7) -`version=1.0, xlarge=true` (e1, e3) +`version=1.0, xlarge=true` (e1) In addition, a default subset is created: diff --git a/source/docs/subset_load_balancer_diagram.svg b/source/docs/subset_load_balancer_diagram.svg index 6217f824f632d..414c7174367b8 100644 --- a/source/docs/subset_load_balancer_diagram.svg +++ b/source/docs/subset_load_balancer_diagram.svg @@ -1,4 +1,4 @@ - +