QP request batcher

/area API

## Describe the feature

### Requirements

* User should be able to specify certain criteria like latency and batch size per service so that QP can decide when to submit the batch to the user container. This would require to annotate a specific service accordingly.
* Each request of the batch is handled independently and transparently. In order to achieve this a protocol should be defined between QP and the user container so that user container should comply with it in order to receive data in batch mode.
* This does not cover the case where requests need to be sent as a batch across the whole Knative data plane
* The feature will be an extension, not enabled by default

### Use cases

* There are scenarios where http requests need to be delivered as a batch instead of one by one. A common scenario is model serving where you get [better performance](https://medium.com/modern-nlp/101-for-serving-ml-models-10217c9f0764) if requests are collected as a batch from the backend in order to apply an operation per data vector and not per data instance. An implementation for Knative Serving that uses an intermediate container can be found in KServe [here](https://kserve.github.io/website/modelserving/batcher/batcher/).
* As discussed [here](https://cloud-native.slack.com/archives/C04LMU0AX60/p1675769223854919) users coming from other systems such as [AWS SQS ](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html) might expect a batch configuration option to consume more than one requests at least at the user container/backend side.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QP request batcher #13691

Describe the feature

Requirements

Use cases

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QP request batcher #13691

Description

Describe the feature

Requirements

Use cases

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions