Advanced HTTP/2 support

# HTTP/2 support in Knative

PR #2539 introduced the basic ability to use a Knative Service with HTTP/2. There have been numerous discussions on how to "properly" support HTTP/2 (and other stream based protocols) in Knative. This will focus on the different aspects of HTTP/2 only and how we could implement it to the benefit of our users. Some of this might also be applicable to other protocols such as Websockets or gRPC.

## Why HTTP/2?

The [official spec](https://http2.github.io/faq/#what-are-the-key-differences-to-http1x) outlines the key differences to HTTP/1.x as follows: HTTP/2

1. is binary, instead of textual
2. is fully multiplexed, instead of ordered and blocking
3. can therefore use one connection for parallelism
4. uses header compression to reduce overhead
5. allows servers to “push” responses proactively into client caches

Single connection parallelism is further superior according to the spec in:

> In the past, browsers have used multiple TCP connections to issue parallel requests. However, there are limits to this; if too many connections are used, it’s both counter-productive (TCP congestion control is effectively negated, leading to congestion events that hurt performance and the network), and it’s fundamentally unfair (because browsers are taking more than their share of network resources).
> At the same time, the large number of requests means a lot of duplicated data “on the wire”.
> Both of these factors means that HTTP/1.1 requests have a lot of overhead associated with them; if too many requests are made, it hurts performance.

For the purpose of this document, I'll divide these properties into two buckets:

1. **Wire-protocol properties:** These include the binary nature of the protocol itself (1) and the compression of the headers (4).
2. **Connection properties:** These include the multiplex (2/3) and server "push" (5) properties.

## What do we need to do to gain advantage of these properties?

Taking advantage of the wire-protocol properties is relatively simple. Given that our routing layers and applications correctly support HTTP/2 end-to-end, we will get these for "free". This should already been done and tested with #2539.

Supporting the connection properties though is a different beast. Especially single connection parallelism could be a neck-breaker for autoscaling and routing in Knative. I therefore propose different "modes" of HTTP/2 support, which lets the user define some additional information so Knative can decide how to properly handle incoming HTTP/2 connections.

### HTTP/2 end-to-end

To support server "push" and fully take advantage of HTTP/2's parallelism properties, we need to support HTTP/2 end-to-end. That means, we need to route a connection to the user-application as-is. Since we want to take advantage of the per-connection multiplexing, we need to allow a parallelism of greater than one per connection. This then means, that we have no opportunity to _reroute_ one of these requests once a pod becomes overloaded. Once a connection is routed to a pod, it sticks and all requests sent over it are going to that pod, no matter what.

If `containerConcurrency` is set to `0` (allowing infinite parallelism), this is not really an issue, as we have no defined limit of how many concurrent requests we can handle, thus we don't need to enforce one. Vertical scalability could become crucial on this path, as one connection could potentially overload a pod and we cannot reroute individual requests on the connection to relieve that pod.

If `containerConcurrency` is set to `> 0` (allowing only a set amount of parallelism) things get a little more tricky. The HTTP/2 spec knows a [`SETTINGS_MAX_CONCURRENT_STREAMS`](https://httpwg.org/specs/rfc7540.html#n-stream-concurrency) setting to control the maximum amount of active concurrent streams on one connection. As long as we allow only one HTTP/2 connection per pod, this should work well to indicate to the maximum allowed concurrency per client-connection. If we stick to one connection per pod, autoscaling would naturally scale to one pod per connection and leave all request/stream based concurrency to the pod itself. The `SETTINGS_MAX_CONCURRENT_STREAMS` would be sent by the queue-proxy.

If a single connection from a single client does not saturate a pod though, we are left with unused capacity. Once we allow multiple HTTP/2 connections per pod, we'll have to deal with sizing each of them properly relative to each other. Autoscaling in this case also needs to account for total active streams across pods.

### HTTP/2 to gateway

An alternative approach is to support HTTP/2 until we reach the gateway of a service. The gateway then demultiplexes the connections and sends HTTP/1.1 requests to the application pods themselves. Scaling granularity is not an issue in this case and we need no changes to the user-pods at all (which includes the queue-proxy). This approach takes advantage of HTTP/2's reduced overhead until the user's requests are routed to the gateway. The "last mile" then has the usual overhead of HTTP/1.1, which is hopefully less crucial in an in-cluster network vs. user-requests coming from somewhere on the planet (although multi-region HA services would see the same overhead potentially).

## Proposal

Given the different cases laid out above, I feel like it's hard to infer which kind of HTTP/2 support a user wants for her application. If anything, we can try to infer decent defaults based on the `containerConcurrency` setting. We should always allow to override this default though.

Based on the above, I see at least 3 modes:

1. **Manual:** We route HTTP/2 through to the application and it needs to handle everything accordingly itself (status quo).
2. **Convert:** Converts to HTTP/1.1 at the gateway. Routing and loadbalancing logic stays intact.
3. **Single:** Allows only a single HTTP/2 connection per pod, which is properly sized in concurrency for the allowed `containerConcurrency`. Trying to resize connections etc. seems error prone and "hard to guess right" to me. We could maybe implement a **Multiple** mode later?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced HTTP/2 support #2916

HTTP/2 support in Knative

Why HTTP/2?

What do we need to do to gain advantage of these properties?

HTTP/2 end-to-end

HTTP/2 to gateway

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Advanced HTTP/2 support #2916

Description

HTTP/2 support in Knative

Why HTTP/2?

What do we need to do to gain advantage of these properties?

HTTP/2 end-to-end

HTTP/2 to gateway

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions