-
Notifications
You must be signed in to change notification settings - Fork 349
Open
Description
CRD and Gateway Inference Extension
- (P0) Scale down to zero with KEDA and CRD integration (@Romero027 )
- (P0) Implement disaggregated prefill in CRD
- (P0) Implementing KV cache aware and prefix aware routing logic in gateway inference extension
- (P0) Implement disaggregated prefill in Gateway Inference Extension
Router “frontend“
- (P2) Router performance enhancements
- Nuitka compilation for the current router
- Rust/Go/Nginx-based frontend for router
Router core logic
- (P0) XpYd support
- (P2) Load balancing for disaggregated prefill
- (P2) Priority routing
- (P2) Routing to external providers like OpenAI or Anthropic
- (P2) Request migration when the vLLM instance fails
CI/CD and misc.
- (P0) Release bot to automatically release new versions (helm chart + k8s controller packages + docker images)
- (P2) Github actions for building router docker images for different architectures
- (P2) Support for multi-modality models
kobe0938, JiangJiaWei1103, zerofishnoodles, QiyuanY, ikaadil and 4 more
Metadata
Metadata
Assignees
Labels
No labels