Skip to content

feat: Add support for multi-document YAML in InferenceService creation#169

Open
Prateekbala wants to merge 1 commit intokserve:masterfrom
Prateekbala:feat/multi-doc-YAML
Open

feat: Add support for multi-document YAML in InferenceService creation#169
Prateekbala wants to merge 1 commit intokserve:masterfrom
Prateekbala:feat/multi-doc-YAML

Conversation

@Prateekbala
Copy link
Copy Markdown
Contributor

@Prateekbala Prateekbala commented Mar 13, 2026

Description

This PR implements support for multi-document YAML when creating InferenceServices, enabling users to define and deploy InferenceService alongside their associated TrainedModel resources in a single multi-document YAML file and deploy everything in one go.

Problem

Previously, users could only submit a single InferenceService resource at a time. When using Multi-Model Serving with Triton, this required multiple separate deployments, making it inconvenient to manage related resources together. Users wanted the ability to define everything in a single multi-document YAML file (similar to kubectl apply -f) and deploy all resources in one operation.

Solution

Technical Implementation:

  • Replaced single-document YAML parsing with multi-document parsing using loadAll() to handle multiple K8s resources from a single YAML file
  • Implemented resource type routing to validate and segregate InferenceService and TrainedModel documents during parsing
  • Added batched resource creation using RxJS forkJoin() to deploy all resources in parallel for improved performance
  • Introduced granular validation logic for each resource type with specific error messages and field requirements
  • Added TrainedModel GVK definition in backend to enable TrainedModel resource creation
  • Implemented new API endpoint to handle TrainedModel POST requests
  • Refactored notification system with reusable error/success handlers for flexible user feedback
  • Enforced business logic: exactly one InferenceService (required) with zero or more TrainedModels (optional)
  • Implemented namespace propagation to all resources at deployment time

Closes #147

Usage Example

Users can now create a multi-document YAML:

---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: my-model
spec:
  predictor:
    model:
      modelFormat:
        name: triton
      storageUri: gs://bucket/triton-model
---
apiVersion: serving.kserve.io/v1alpha1
kind: TrainedModel
metadata:
  name: model-variant-1
spec:
  inferenceService: my-model
  model:
    framework: triton
    storageUri: gs://bucket/model-variant-1
    memory: "1Gi"
---
apiVersion: serving.kserve.io/v1alpha1
kind: TrainedModel
metadata:
  name: model-variant-2
spec:
  inferenceService: my-model
  model:
    framework: triton
    storageUri: gs://bucket/model-variant-2
    memory: "1Gi"
    

Signed-off-by: Prateek Bala <prateekbala28@gmail.com>
@Prateekbala
Copy link
Copy Markdown
Contributor Author

I've also added Cypress and Jest tests for this implementation.

@juliusvonkohout
Copy link
Copy Markdown
Contributor

@LogicalGuy77

Comment on lines +81 to +93
if (kind === 'InferenceService') {
validationErrors.push(...this.validateInferenceService(resource));
inferenceServices.push(resource as InferenceServiceK8s);
} else if (kind === 'TrainedModel') {
validationErrors.push(...this.validateTrainedModel(resource));
trainedModels.push(resource as TrainedModelK8s);
} else {
validationErrors.push(
`Unsupported resource kind: "${
kind || 'unknown'
}". Only InferenceService and TrainedModel are supported.`,
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title is a little misleading. This is only allowing support for submitting InferenceServices and TrainedModels. I'm not sure we would want to do just that if we allow multi-document YAML

Comment on lines +102 to +105
if (inferenceServices.length > 1) {
validationErrors.push(
'Only one InferenceService document is allowed per submission.',
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the limit to only one InferenceService?

@Griffin-Sullivan
Copy link
Copy Markdown
Contributor

Not super familiar with the multiple TrainedModel CRs mapping to a single InferenceService. This is quite an old feature for Kserve, so I'm ok with adding the support here. I'd just like to get more description on this PR of why we are doing this and specifics of what we are supporting (ex: 1 isvc and multiple trainedmodels in one go, detailing that only the two CRDs can be submitted like this, plans for supporting more in the future?, etc)

@Prateekbala
Copy link
Copy Markdown
Contributor Author

Prateekbala commented Mar 16, 2026

Not super familiar with the multiple TrainedModel CRs mapping to a single InferenceService. This is quite an old feature for Kserve, so I'm ok with adding the support here. I'd just like to get more description on this PR of why we are doing this and specifics of what we are supporting (ex: 1 isvc and multiple trainedmodels in one go, detailing that only the two CRDs can be submitted like this, plans for supporting more in the future?, etc)

Thanks for the review! . Let me clarify the scope and design thinking here.

  1. What We're Supporting

-Current scope: Exactly 1 InferenceService + 0 or more TrainedModels in a single submission.

The validation is intentional—in the multi-model serving pattern, one Triton backend (1 ISVC) serves multiple models (N TrainedModels). Multiple InferenceServices in one submission would represent independent serving endpoints, which should be deployed separately.

Only InferenceService and TrainedModel CRDs are accepted. Other resource types will be rejected with a clear error message.

  1. On Future Extensibility

I kept this scoped narrowly to solve the immediate need, but I'm curious about the right approach for the future. A few questions for your input:

  1. Should we expand CRD support as use cases emerge?

  2. What's the best way to handle this without creating tight coupling? Currently, validation is in the frontend. Should we consider:

    • Backend-driven validation (backend defines what CRDs/patterns it supports)?
    • A plugin-style validator pattern?
    • Keep it explicit and add new handlers case-by-case?
  3. Are there other multi-resource deployment patterns you'd want to support that we should design for now?

Looking forward to your thoughts!

@Griffin-Sullivan
Copy link
Copy Markdown
Contributor

Griffin-Sullivan commented Mar 18, 2026

I'd encourage a look at the KServe docs https://kserve.github.io/website/docs/intro. I've never used TrainedModel before so I'm not familiar with who is using it and for what use cases especially in the UI.

For context, you'll probably find in the docs that KServe supports a lot of different deployments and some involve multiple resources at deployment. I think it might be better to do this feature from a more general point of view to support how flexible KServe is. This also takes into account what should this project do in the long term. Should we have opinionated deployments? Should we support everything KServe has? These are good things for us to answer in a proposal about multi-document deployments. I'm not against this feature, but it brings on tech debt / maintenance that we need to consider if it's worth it when we could do a larger feature from the start that is less specific to TrainedModel

@Prateekbala
Copy link
Copy Markdown
Contributor Author

You're absolutely right. I agree this should have a more general approach rather than being TrainedModel-specific. The current implementation will create technical debt for maintainers. Let me come up with a better design that supports KServe's flexibility more broadly.

Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for multi-document YAML when creating an InferenceService

3 participants