-
Notifications
You must be signed in to change notification settings - Fork 742
System Design Doc #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e360a71
3244396
2fc64ed
5f571d6
51cdf9b
65081c0
d41a60e
e12226c
7b0c7bc
5b5e564
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| ## System Design | ||
|
|
||
| ### Goals | ||
| * Define the system components needed to satisfy the requirements of Envoy Gateway. | ||
|
|
||
| ### Non-Goals | ||
| * Create a detailed design and interface specification for each system component. | ||
|
|
||
| ### Architecture | ||
|  | ||
|
arkodg marked this conversation as resolved.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we all agree on this arch, will work with getting a better illustration out
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @LukeShu @skriss @youngnick please comment here so @arkodg can nail down the diagram.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry to do this @arkodg, but I looked over this a bit, and I think we should consider if we want the xds server and provisioner to be running inside the same container or be separate processes. This diagram implies they're running inside the same container, but I think that the separate-process model has some advantages, in that it unlocks the ability to easily have one provisioner instance manage many xds control planes (since they are 1:1 with an Envoy fleet). I think it might be possible to have a single xds server process serve different config to different Envoy fleets, but that seems very risky to me - getting that wrong could very easily lead to big security problems. I think a better representation will have two boxes, talking to the apiserver, one that is the provisioner, that has a "creates" relationship with the one (or more) xDS server boxes, which contain the architecture outlined above. This probably isn't a blocker, for this PR merging, but I think that since we want Envoy Gateway as a whole to be able to handle multiple GatewayClasses, we'll end up with something like what I describe here anyway., |
||
|
|
||
| ### Configuration | ||
|
|
||
| #### Bootstrap Config | ||
|
arkodg marked this conversation as resolved.
|
||
| This is the configuration provided by the Infrastructure Administrator that allows them to bootstrap and configure various internal aspects of Envoy Gateway. | ||
| It can be defined using a CLI argument similar to what [Envoy Proxy has](https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-c). | ||
| For e.g. users wanting to run Envoy Gateway in Kubernetes and use a custom [Envoy Proxy bootstrap config](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-bootstrap) could define their Boostrap Config as - | ||
| ``` | ||
| platform: kubernetes | ||
| envoyProxy: | ||
| bootstrap: | ||
| ...... | ||
| ``` | ||
|
|
||
| #### User Config | ||
| This configuration is based on the [Gateway API](https://gateway-api.sigs.k8s.io) and will provide: | ||
| * Infrastructure Management capabilities to provision the infrastructure required to run the data plane, Envoy Proxy. | ||
| This is expressed using [GatewayClass](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass) and [Gateway](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway) resources. | ||
| * Ingress and API Gateway capabilities for the application developer to define networking and security intent for their incoming traffic. | ||
| This is expressed using [HTTPRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#httproute) and [TLSRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#tlsroute). | ||
|
|
||
| #### Workflow | ||
| 1. The Infrastructure Administrator spawns an Envoy Gateway process using a [Bootstrap Config](#bootstrap-config) to manage a fleet of Envoy Proxies. | ||
| 2. They will configure a [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass), that represents a class of Envoy Proxies. | ||
| Envoy Gateway consumes this configuration and provisions a unique fleet of Envoy Proxies. the [GatawayClass parameters](https://gateway-api.sigs.k8s.io/v1alpha2/api-types/gatewayclass/#gatewayclass-parameters) section allows the infrastructure administrator to further modify attributes of the data plane. | ||
| 3. They will configure a [Gateway resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway) linking it to a specific GatewayClass | ||
| with information such as hostnames, protocol and ports stating which traffic flows are of interest. | ||
| 4. Application developers can now expose their APIs by configuring [HTTPRoute resources](https://gateway-api.sigs.k8s.io/concepts/api-overview/#httproute). | ||
| These routes are attached to a specific [Gateway resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway) allowing application traffic to reach | ||
| the upstream application. | ||
|
|
||
| ### Components | ||
|
|
||
| #### Config Sources | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have questions about how we'll handle merging config across different sources - seems like it can be a thorny issue if not everything is in etcd. OK with deferring the details for now, but we'll have to figure it out sooner rather than later if we do want to support multiple sources.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these are all opt-in, based on the bootstrap spec, so imho the platform admin knows what they are signing up for if they include more config sources.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. xref #30 to capture the details of config merging.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is also related to multi-cluster. I realize MC is P1, but how does that fit into all of this? Should we at least mention where we see MC fitting into this? Will there be a meta controller? Will each bootstrap have to manually point at multiple config sources with distributed policy? Etc.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we havent brainstormed on MC yet 🙈 , mainly because we have not discussed the requirements yet - is this MultiCluster Ingress use case (purely north south) with external clients sending traffic to Tier1/Edge/Front Proxies which route to Tier2/BU Proxies or east-west with internal services routing to other internal services in another cluster using the Tier2/BU in each cluster or both 😄 . Once we know the WHAT, it will require us to answer the following for the HOW
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest we transition the MC discussion to #9 where I define
@mattklein123 @arkodg I have provided additional information to #9, so PTAL. |
||
| This component is responsible for consuming the user configuration from various platforms. Data persistence should be tied to the specific config source’s capabilities. For e.g. in Kubernetes, the resources will persist in `etcd`, if using the `path-watcher`, the resources will persist in a file. | ||
|
|
||
| ##### Kubernetes | ||
| It watches the Kubernetes API Server for resources, fetches them, and publishes it to the translators for further processing. | ||
|
|
||
| ##### Path Watcher | ||
| It watches for file changes in a path, allowing the user to configure Envoy Gateway using resource configurations saved in a file or directory. | ||
|
|
||
| ##### Config Server | ||
| This is a HTTP/gRPC Server allowing Envoy Gateway to be configured from a remote endpoint. | ||
|
|
||
| #### Intermediate Representation (IR) | ||
| This is an internal data model that user facing APIs are translated into allowing for internal services & components to be decoupled. | ||
|
|
||
| #### Config Manager | ||
| This component consumes the [Bootstrap Config](#bootstrap-config), and spawns the appropriate internal services in Envoy Gateway based on the config specification. | ||
| For e.g. if the platform field in the Bootstrap Config is set to `kubernetes`, the Config Manager will instantiate kubernetes controller services that implement the | ||
| [Config Source](#config-source), [Service Resolver](#service-resolver) and the [Envoy Provisioner](#provisioner) interfaces. | ||
|
|
||
| #### Message Service | ||
| This component allows internal services to publish messages as well as subscribe to them. The message service's interface is used by the [Config Manager](#config-manager) to | ||
| allow communication between the services instantiated by it. | ||
| A message bus architecture allows components to be loosely coupled, work in an asynchronous manner and also scale out into multiple processes if needed. | ||
| For e.g. the [Config Source](#config-source) and the [Provisioner](#provisoner) could run as separate processes in different environments decoupling user configuration consumption | ||
| from the environment where the Envoy Proxy infrastructure is being provisioned. | ||
|
|
||
| #### Service Resolver | ||
| This optional component preprocesses the IR resources and resolves the services into endpoints enabling precise load balancing and resilience policies. | ||
| For e.g. in Kubernetes, a controller service could watch for EndpointSlice resources, converting Services to Endpoints, allowing for Envoyproxy to skip kube-proxy’s | ||
| load balancing layer. This component is tied to the platform where it is running. When disabled, the services will be resolved by the underlying DNS resolver or | ||
| by explicitly specifying IPs. | ||
|
|
||
|
arkodg marked this conversation as resolved.
|
||
| #### Gateway API Translator | ||
| This is a platform agnostic translator that translates Gateway API resources to an Intermediate Representation. | ||
|
|
||
| #### xDS Translator | ||
| This component translates the IR into Envoy Proxy xDS Resources. | ||
|
|
||
| #### xDS Server | ||
| This component is a xDS gRPC Server based on the [Envoy Go Control Plane](https://github.com/envoyproxy/go-control-plane) project that implements the xDS Server Protocol | ||
| and is responsible for configuring xDS resources in Envoy Proxy. | ||
|
|
||
| #### Provisioner | ||
| The provisioner configures any infrastruture needed based on the IR. | ||
|
|
||
| * Envoy - This is a platform specific component that provisions all the infrastructure required to run the managed Envoy Proxy fleet. | ||
| For example, a Terraform or Ansible provisioner could be added in the future to provision the Envoy infrastructure in a non-Kubernetes environment. | ||
|
|
||
| * Auxiliary Control Planes - These optional components are services needed to implement API Gateway features that require external integrations with Envoy Proxy. A good example is [Global Ratelimiting](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_features/global_rate_limiting) which would require instatiating and | ||
| configuring the [Envoy Rate Limit Service](https://github.com/envoyproxy/ratelimit) as well the [Rate Limit filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ratelimit/v3/rate_limit.proto#envoy-v3-api-msg-extensions-filters-http-ratelimit-v3-ratelimit) using the IR passed to this component. Such features would | ||
| be exposed to the user using [Custom Route Filters](https://gateway-api.sigs.k8s.io/v1alpha2/api-types/httproute/#filters-optional) defined in the Gateway API. | ||
|
|
||
| ### Design Decisions | ||
| * Each Envoy Gateway will consume one or more [GatewayClass resources](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass) to manage a fleet of Envoy Proxies | ||
| with different configurations i.e. each [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass) will map to a unique set of Envoy Proxies | ||
| created by the Provisioner. | ||
| * The goal is to make the Provisioner & Translator layers extensible, but for the near future, extensibility can be achieved using xDS support that Envoy Gateway | ||
| will provide. | ||
|
|
||
| The draft for this document is [here](https://docs.google.com/document/d/1riyTPPYuvNzIhBdrAX8dpfxTmcobWZDSYTTB5NeybuY/edit) | ||
Uh oh!
There was an error while loading. Please reload this page.