[Roadmap] vLLM Production Stack roadmap for 2025 Q3

## CRD and Gateway Inference Extension

- [ ] (P0) Scale down to zero with KEDA and CRD integration (@Romero027 ) 
- [ ] (P0) Implement disaggregated prefill in CRD 
- [ ] (P0) Implementing KV cache aware and prefix aware routing logic in gateway inference extension 
- [ ] (P0) Implement disaggregated prefill in Gateway Inference Extension

## Router “frontend“

- [ ] (P2) Router performance enhancements
     - [ ] Nuitka compilation for the current router
     - [ ] Rust/Go/Nginx-based frontend for router

## Router core logic

- [ ] (P0) XpYd support 
- [ ] (P2) Load balancing for disaggregated prefill
- [ ] (P2) Priority routing
- [ ] (P2) Routing to external providers like OpenAI or Anthropic
- [ ] (P2) Request migration when the vLLM instance fails

## CI/CD and misc.

- [ ] (P0) Release bot to automatically release new versions (helm chart + k8s controller packages + docker images)
- [ ] (P2) Github actions for building router docker images for different architectures
- [ ] (P2) Support for multi-modality models 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] vLLM Production Stack roadmap for 2025 Q3 #640

CRD and Gateway Inference Extension

Router “frontend“

Router core logic

CI/CD and misc.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] vLLM Production Stack roadmap for 2025 Q3 #640

Description

CRD and Gateway Inference Extension

Router “frontend“

Router core logic

CI/CD and misc.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions