feat: Add support for multi-document YAML in InferenceService creation#169
feat: Add support for multi-document YAML in InferenceService creation#169Prateekbala wants to merge 1 commit intokserve:masterfrom
Conversation
Signed-off-by: Prateek Bala <prateekbala28@gmail.com>
|
I've also added Cypress and Jest tests for this implementation. |
| if (kind === 'InferenceService') { | ||
| validationErrors.push(...this.validateInferenceService(resource)); | ||
| inferenceServices.push(resource as InferenceServiceK8s); | ||
| } else if (kind === 'TrainedModel') { | ||
| validationErrors.push(...this.validateTrainedModel(resource)); | ||
| trainedModels.push(resource as TrainedModelK8s); | ||
| } else { | ||
| validationErrors.push( | ||
| `Unsupported resource kind: "${ | ||
| kind || 'unknown' | ||
| }". Only InferenceService and TrainedModel are supported.`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
The PR title is a little misleading. This is only allowing support for submitting InferenceServices and TrainedModels. I'm not sure we would want to do just that if we allow multi-document YAML
| if (inferenceServices.length > 1) { | ||
| validationErrors.push( | ||
| 'Only one InferenceService document is allowed per submission.', | ||
| ); |
There was a problem hiding this comment.
Why the limit to only one InferenceService?
|
Not super familiar with the multiple TrainedModel CRs mapping to a single InferenceService. This is quite an old feature for Kserve, so I'm ok with adding the support here. I'd just like to get more description on this PR of why we are doing this and specifics of what we are supporting (ex: 1 isvc and multiple trainedmodels in one go, detailing that only the two CRDs can be submitted like this, plans for supporting more in the future?, etc) |
Thanks for the review! . Let me clarify the scope and design thinking here.
-Current scope: Exactly 1 InferenceService + 0 or more TrainedModels in a single submission. The validation is intentional—in the multi-model serving pattern, one Triton backend (1 ISVC) serves multiple models (N TrainedModels). Multiple InferenceServices in one submission would represent independent serving endpoints, which should be deployed separately. Only InferenceService and TrainedModel CRDs are accepted. Other resource types will be rejected with a clear error message.
I kept this scoped narrowly to solve the immediate need, but I'm curious about the right approach for the future. A few questions for your input:
Looking forward to your thoughts! |
|
I'd encourage a look at the KServe docs https://kserve.github.io/website/docs/intro. I've never used TrainedModel before so I'm not familiar with who is using it and for what use cases especially in the UI. For context, you'll probably find in the docs that KServe supports a lot of different deployments and some involve multiple resources at deployment. I think it might be better to do this feature from a more general point of view to support how flexible KServe is. This also takes into account what should this project do in the long term. Should we have opinionated deployments? Should we support everything KServe has? These are good things for us to answer in a proposal about multi-document deployments. I'm not against this feature, but it brings on tech debt / maintenance that we need to consider if it's worth it when we could do a larger feature from the start that is less specific to TrainedModel |
|
You're absolutely right. I agree this should have a more general approach rather than being TrainedModel-specific. The current implementation will create technical debt for maintainers. Let me come up with a better design that supports KServe's flexibility more broadly. Thanks for your time! |
Description
This PR implements support for multi-document YAML when creating InferenceServices, enabling users to define and deploy InferenceService alongside their associated TrainedModel resources in a single multi-document YAML file and deploy everything in one go.
Problem
Previously, users could only submit a single InferenceService resource at a time. When using Multi-Model Serving with Triton, this required multiple separate deployments, making it inconvenient to manage related resources together. Users wanted the ability to define everything in a single multi-document YAML file (similar to
kubectl apply -f) and deploy all resources in one operation.Solution
Technical Implementation:
loadAll()to handle multiple K8s resources from a single YAML fileforkJoin()to deploy all resources in parallel for improved performanceCloses #147
Usage Example
Users can now create a multi-document YAML: