Skip to content

[Roadmap] vLLM Production Stack roadmap for 2025 Q3 #640

@YuhanLiu11

Description

@YuhanLiu11

CRD and Gateway Inference Extension

  • (P0) Scale down to zero with KEDA and CRD integration (@Romero027 )
  • (P0) Implement disaggregated prefill in CRD
  • (P0) Implementing KV cache aware and prefix aware routing logic in gateway inference extension
  • (P0) Implement disaggregated prefill in Gateway Inference Extension

Router “frontend“

  • (P2) Router performance enhancements
    • Nuitka compilation for the current router
    • Rust/Go/Nginx-based frontend for router

Router core logic

  • (P0) XpYd support
  • (P2) Load balancing for disaggregated prefill
  • (P2) Priority routing
  • (P2) Routing to external providers like OpenAI or Anthropic
  • (P2) Request migration when the vLLM instance fails

CI/CD and misc.

  • (P0) Release bot to automatically release new versions (helm chart + k8s controller packages + docker images)
  • (P2) Github actions for building router docker images for different architectures
  • (P2) Support for multi-modality models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions