What happened?
When setting minReplicas: 0 on a component (Engine or Router), the controller incorrectly triggers Serverless deployment mode even when the user has explicitly set deploymentMode: RawDeployment via annotation.
The component-level deployment mode determination ignores the user's explicit global deployment mode annotation.
What did you expect to happen?
When deploymentMode: RawDeployment is explicitly set via annotation, minReplicas: 0 should NOT trigger Serverless mode. The explicit annotation should take precedence over component-level inference from minReplicas.
Users setting minReplicas: 0 with RawDeployment mode expect a standard Kubernetes Deployment that can scale to zero (e.g., with KEDA), not a Knative Service.
How can we reproduce it (as minimally and precisely as possible)?
- Deploy OME on a cluster (with or without Knative installed)
- Create an InferenceService with explicit
RawDeployment mode but minReplicas: 0:
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: test-deployment-mode
namespace: default
annotations:
ome.io/deployment-mode: RawDeployment # Explicit: use RawDeployment
spec:
predictor:
engine:
minReplicas: 0 # This incorrectly triggers Serverless!
maxReplicas: 3
container:
image: test-image:latest
- Observe that the controller:
- Tries to create a Knative Service instead of a Deployment
- If Knative is not installed, fails with error:
no kind is registered for the type v1.Service in scheme
- If Knative is installed, creates a Knative Service when user expected a Deployment
The same issue occurs with Router component:
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: test-router-mode
annotations:
ome.io/deployment-mode: RawDeployment
spec:
predictor:
engine:
minReplicas: 1
container:
image: engine:latest
router:
minReplicas: 0 # Also incorrectly triggers Serverless for Router
container:
image: router:latest
Anything else we need to know?
Root Cause
In pkg/controller/v1beta1/inferenceservice/utils/deployment.go, the functions DetermineEngineDeploymentMode() and determineComponentDeploymentMode() check for minReplicas == 0 and return Serverless without considering the global deployment mode set via annotation.
// Current behavior - ignores global mode
if engine.MinReplicas != nil && *engine.MinReplicas == 0 {
return constants.Serverless // Ignores explicit deploymentMode annotation!
}
Impact
- Users cannot use
minReplicas: 0 with external autoscalers (like KEDA) when using RawDeployment mode
- Confusing error messages when Knative is not installed
- User's explicit configuration is silently ignored
Workaround
Set minReplicas: 1 or higher when using RawDeployment mode. This prevents scale-to-zero functionality with external autoscalers.
Environment
- OME version: v0.1.x
- Kubernetes version: v1.28+
- Cloud provider or hardware configuration: Any
- OS: Any
- Runtime: Any (SGLang, vLLM, etc.)
- Model being served: Any
- Install method: Helm or kubectl
What happened?
When setting
minReplicas: 0on a component (Engine or Router), the controller incorrectly triggers Serverless deployment mode even when the user has explicitly setdeploymentMode: RawDeploymentvia annotation.The component-level deployment mode determination ignores the user's explicit global deployment mode annotation.
What did you expect to happen?
When
deploymentMode: RawDeploymentis explicitly set via annotation,minReplicas: 0should NOT trigger Serverless mode. The explicit annotation should take precedence over component-level inference fromminReplicas.Users setting
minReplicas: 0withRawDeploymentmode expect a standard Kubernetes Deployment that can scale to zero (e.g., with KEDA), not a Knative Service.How can we reproduce it (as minimally and precisely as possible)?
RawDeploymentmode butminReplicas: 0:no kind is registered for the type v1.Service in schemeThe same issue occurs with Router component:
Anything else we need to know?
Root Cause
In
pkg/controller/v1beta1/inferenceservice/utils/deployment.go, the functionsDetermineEngineDeploymentMode()anddetermineComponentDeploymentMode()check forminReplicas == 0and returnServerlesswithout considering the global deployment mode set via annotation.Impact
minReplicas: 0with external autoscalers (like KEDA) when using RawDeployment modeWorkaround
Set
minReplicas: 1or higher when usingRawDeploymentmode. This prevents scale-to-zero functionality with external autoscalers.Environment