Feature Request
Currently, our API scaling is either static or based on CPU utilization. CPU is not always an accurate proxy for traffic load, especially for I/O-bound requests (waiting on DB/Network).
I want to implement Traffic-Based Autoscaling to ensure low latency during traffic spikes. The system should scale the number of API pods based on the incoming Requests Per Second (RPS).
Acceptance Criteria
Technical Implementation Details
- Metric Source: Prometheus Query (
rate(http_requests_total[2m]))
- Scaler: KEDA Prometheus Scaler.
- Min Replicas: 2 (for high availability).
- Max Replicas: 10 (to prevent runaway costs).
Notes
This decouples our scaling logic from CPU limits, allowing the API to remain responsive even if requests are lightweight but high-volume.
Feature Request
Currently, our API scaling is either static or based on CPU utilization. CPU is not always an accurate proxy for traffic load, especially for I/O-bound requests (waiting on DB/Network).
I want to implement Traffic-Based Autoscaling to ensure low latency during traffic spikes. The system should scale the number of API pods based on the incoming Requests Per Second (RPS).
Acceptance Criteria
prometheus-fastapi-instrumentator).100requests per second per pod.Technical Implementation Details
rate(http_requests_total[2m]))Notes
This decouples our scaling logic from CPU limits, allowing the API to remain responsive even if requests are lightweight but high-volume.