Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/design/SYSTEM_DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
## System Design

### Goals
* Define the system components needed to satisfy the requirements of Envoy Gateway.

### Non-Goals
* Create a detailed design and interface specification for each system component.

### Architecture
Comment thread
danehans marked this conversation as resolved.
![Architecture](../images/architecture.png)
Comment thread
arkodg marked this conversation as resolved.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we all agree on this arch, will work with getting a better illustration out

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeShu @skriss @youngnick please comment here so @arkodg can nail down the diagram.

Copy link
Copy Markdown
Contributor

@youngnick youngnick Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to do this @arkodg, but I looked over this a bit, and I think we should consider if we want the xds server and provisioner to be running inside the same container or be separate processes.

This diagram implies they're running inside the same container, but I think that the separate-process model has some advantages, in that it unlocks the ability to easily have one provisioner instance manage many xds control planes (since they are 1:1 with an Envoy fleet). I think it might be possible to have a single xds server process serve different config to different Envoy fleets, but that seems very risky to me - getting that wrong could very easily lead to big security problems.

I think a better representation will have two boxes, talking to the apiserver, one that is the provisioner, that has a "creates" relationship with the one (or more) xDS server boxes, which contain the architecture outlined above.

This probably isn't a blocker, for this PR merging, but I think that since we want Envoy Gateway as a whole to be able to handle multiple GatewayClasses, we'll end up with something like what I describe here anyway.,


### Configuration

#### Bootstrap Config
Comment thread
arkodg marked this conversation as resolved.
This is the configuration provided by the Infrastructure Administrator that allows them to bootstrap and configure various internal aspects of Envoy Gateway.
It can be defined using a CLI argument similar to what [Envoy Proxy has](https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-c).
For e.g. users wanting to run Envoy Gateway in Kubernetes and use a custom [Envoy Proxy bootstrap config](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-bootstrap) could define their Boostrap Config as -
```
platform: kubernetes
envoyProxy:
bootstrap:
......
```

#### User Config
This configuration is based on the [Gateway API](https://gateway-api.sigs.k8s.io) and will provide:
* Infrastructure Management capabilities to provision the infrastructure required to run the data plane, Envoy Proxy.
This is expressed using [GatewayClass](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass) and [Gateway](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway) resources.
* Ingress and API Gateway capabilities for the application developer to define networking and security intent for their incoming traffic.
This is expressed using [HTTPRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#httproute) and [TLSRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#tlsroute).

#### Workflow
1. The Infrastructure Administrator spawns an Envoy Gateway process using a [Bootstrap Config](#bootstrap-config) to manage a fleet of Envoy Proxies.
2. They will configure a [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass), that represents a class of Envoy Proxies.
Envoy Gateway consumes this configuration and provisions a unique fleet of Envoy Proxies. the [GatawayClass parameters](https://gateway-api.sigs.k8s.io/v1alpha2/api-types/gatewayclass/#gatewayclass-parameters) section allows the infrastructure administrator to further modify attributes of the data plane.
3. They will configure a [Gateway resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway) linking it to a specific GatewayClass
with information such as hostnames, protocol and ports stating which traffic flows are of interest.
4. Application developers can now expose their APIs by configuring [HTTPRoute resources](https://gateway-api.sigs.k8s.io/concepts/api-overview/#httproute).
These routes are attached to a specific [Gateway resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway) allowing application traffic to reach
the upstream application.

### Components

#### Config Sources
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions about how we'll handle merging config across different sources - seems like it can be a thorny issue if not everything is in etcd. OK with deferring the details for now, but we'll have to figure it out sooner rather than later if we do want to support multiple sources.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all opt-in, based on the bootstrap spec, so imho the platform admin knows what they are signing up for if they include more config sources.
I added a line message service - It can also aggregate resources from multiple publishers allowing configuration from individual config sources to be aggregated before being processed by the translation layers.. But Im thinking of removing it since it adds details not needed in this generic document. We can delay this decision when we build out the config source

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #30 to capture the details of config merging.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to multi-cluster. I realize MC is P1, but how does that fit into all of this? Should we at least mention where we see MC fitting into this? Will there be a meta controller? Will each bootstrap have to manually point at multiple config sources with distributed policy? Etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we havent brainstormed on MC yet 🙈 , mainly because we have not discussed the requirements yet - is this MultiCluster Ingress use case (purely north south) with external clients sending traffic to Tier1/Edge/Front Proxies which route to Tier2/BU Proxies or east-west with internal services routing to other internal services in another cluster using the Tier2/BU in each cluster or both 😄 . Once we know the WHAT, it will require us to answer the following for the HOW

  • Architecture - is it a peer-to-peer arch or a hub-spoke architecture
  • Config Source - What is the source of truth for intent
  • Service Resolver - Which service registry does this resolver use to resolve endpoints
  • Service Registry - How is this service registry computed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we transition the MC discussion to #9 where I define

... Gateway should support multi-cluster service networking methods defined by sig-multicluster..."

@mattklein123 @arkodg I have provided additional information to #9, so PTAL.

This component is responsible for consuming the user configuration from various platforms. Data persistence should be tied to the specific config source’s capabilities. For e.g. in Kubernetes, the resources will persist in `etcd`, if using the `path-watcher`, the resources will persist in a file.

##### Kubernetes
It watches the Kubernetes API Server for resources, fetches them, and publishes it to the translators for further processing.

##### Path Watcher
It watches for file changes in a path, allowing the user to configure Envoy Gateway using resource configurations saved in a file or directory.

##### Config Server
This is a HTTP/gRPC Server allowing Envoy Gateway to be configured from a remote endpoint.

#### Intermediate Representation (IR)
This is an internal data model that user facing APIs are translated into allowing for internal services & components to be decoupled.

#### Config Manager
This component consumes the [Bootstrap Config](#bootstrap-config), and spawns the appropriate internal services in Envoy Gateway based on the config specification.
For e.g. if the platform field in the Bootstrap Config is set to `kubernetes`, the Config Manager will instantiate kubernetes controller services that implement the
[Config Source](#config-source), [Service Resolver](#service-resolver) and the [Envoy Provisioner](#provisioner) interfaces.

#### Message Service
This component allows internal services to publish messages as well as subscribe to them. The message service's interface is used by the [Config Manager](#config-manager) to
allow communication between the services instantiated by it.
A message bus architecture allows components to be loosely coupled, work in an asynchronous manner and also scale out into multiple processes if needed.
For e.g. the [Config Source](#config-source) and the [Provisioner](#provisoner) could run as separate processes in different environments decoupling user configuration consumption
from the environment where the Envoy Proxy infrastructure is being provisioned.

#### Service Resolver
This optional component preprocesses the IR resources and resolves the services into endpoints enabling precise load balancing and resilience policies.
For e.g. in Kubernetes, a controller service could watch for EndpointSlice resources, converting Services to Endpoints, allowing for Envoyproxy to skip kube-proxy’s
load balancing layer. This component is tied to the platform where it is running. When disabled, the services will be resolved by the underlying DNS resolver or
by explicitly specifying IPs.

Comment thread
arkodg marked this conversation as resolved.
#### Gateway API Translator
This is a platform agnostic translator that translates Gateway API resources to an Intermediate Representation.

#### xDS Translator
This component translates the IR into Envoy Proxy xDS Resources.

#### xDS Server
This component is a xDS gRPC Server based on the [Envoy Go Control Plane](https://github.com/envoyproxy/go-control-plane) project that implements the xDS Server Protocol
and is responsible for configuring xDS resources in Envoy Proxy.

#### Provisioner
The provisioner configures any infrastruture needed based on the IR.

* Envoy - This is a platform specific component that provisions all the infrastructure required to run the managed Envoy Proxy fleet.
For example, a Terraform or Ansible provisioner could be added in the future to provision the Envoy infrastructure in a non-Kubernetes environment.

* Auxiliary Control Planes - These optional components are services needed to implement API Gateway features that require external integrations with Envoy Proxy. A good example is [Global Ratelimiting](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_features/global_rate_limiting) which would require instatiating and
configuring the [Envoy Rate Limit Service](https://github.com/envoyproxy/ratelimit) as well the [Rate Limit filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ratelimit/v3/rate_limit.proto#envoy-v3-api-msg-extensions-filters-http-ratelimit-v3-ratelimit) using the IR passed to this component. Such features would
be exposed to the user using [Custom Route Filters](https://gateway-api.sigs.k8s.io/v1alpha2/api-types/httproute/#filters-optional) defined in the Gateway API.

### Design Decisions
* Each Envoy Gateway will consume one or more [GatewayClass resources](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass) to manage a fleet of Envoy Proxies
with different configurations i.e. each [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass) will map to a unique set of Envoy Proxies
created by the Provisioner.
* The goal is to make the Provisioner & Translator layers extensible, but for the near future, extensibility can be achieved using xDS support that Envoy Gateway
will provide.

The draft for this document is [here](https://docs.google.com/document/d/1riyTPPYuvNzIhBdrAX8dpfxTmcobWZDSYTTB5NeybuY/edit)
Binary file added docs/images/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.