Skip to content

QP request batcher #13691

@skonto

Description

@skonto

/area API

Describe the feature

Requirements

  • User should be able to specify certain criteria like latency and batch size per service so that QP can decide when to submit the batch to the user container. This would require to annotate a specific service accordingly.
  • Each request of the batch is handled independently and transparently. In order to achieve this a protocol should be defined between QP and the user container so that user container should comply with it in order to receive data in batch mode.
  • This does not cover the case where requests need to be sent as a batch across the whole Knative data plane
  • The feature will be an extension, not enabled by default

Use cases

  • There are scenarios where http requests need to be delivered as a batch instead of one by one. A common scenario is model serving where you get better performance if requests are collected as a batch from the backend in order to apply an operation per data vector and not per data instance. An implementation for Knative Serving that uses an intermediate container can be found in KServe here.
  • As discussed here users coming from other systems such as AWS SQS might expect a batch configuration option to consume more than one requests at least at the user container/backend side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/APIAPI objects and controllerskind/featureWell-understood/specified features, ready for coding.triage/acceptedIssues which should be fixed (post-triage)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions