-
Notifications
You must be signed in to change notification settings - Fork 350
Bugfix/482 helm rayspec fix #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/482 helm rayspec fix #483
Conversation
…ith modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Signed-off-by: ahinsutime <ahinsutime@gmail.com>
|
Confirmed intended behavior (to create a single ray cluster and a typical deployment): cc. @bennorris123 |
|
I will set this pr ready after I check it is working on multi-node (with more than 2 GPUs total). |
|
Tested and confirmed working with multi nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
instance-20250503-060921 Ready control-plane 19d v1.32.4
insudevmachine Ready <none> 118m v1.32.4Installed helm chart helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml
NAME: vllm
LAST DEPLOYED: Thu Jun 5 09:16:59 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: NoneUsed values servingEngineSpec:
runtimeClassName: ""
modelSpec:
- name: "distilgpt2-raycluster"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "distilbert/distilgpt2"
replicaCount: 1
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1
vllmConfig:
tensorParallelSize: 1
pipelineParallelSize: 2
shmSize: "20Gi"
raySpec:
headNode:
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1
- name: "opt125m-deployment"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "facebook/opt-125m"
replicaCount: 1
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1
vllmConfig:
tensorParallelSize: 1
shmSize: "20Gi"
Check multiple deployments kubectl get pods
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GAT
ES
kuberay-operator-f89ddb644-w5cdt 1/1 Running 0 18d 192.168.190.22 instance-20250503-060921 <none> <none>
vllm-deployment-router-76977dcc4d-rhr4k 1/1 Running 0 6m11s 192.168.165.203 insudevmachine <none> <none>
vllm-distilgpt2-raycluster-raycluster-head-k5msz 1/1 Running 0 6m11s 192.168.190.51 instance-20250503-060921 <none> <none>
vllm-distilgpt2-raycluster-raycluster-ray-worker-l69xc 1/1 Running 0 6m11s 192.168.165.205 insudevmachine <none> <none>
vllm-opt125m-deployment-deployment-vllm-f4c9c9bb4-llm72 1/1 Running 0 6m11s 192.168.165.204 insudevmachine <none> <none>Tested deployments and traffic kubectl port-forward svc/vllm-router-service 30080:80
curl http://localhost:30080/v1/models
{
"object": "list",
"data": [
{
"id": "facebook/opt-125m",
"object": "model",
"created": 1749115431,
"owned_by": "vllm",
"root": null,
"parent": null
},
{
"id": "distilbert/distilgpt2",
"object": "model",
"created": 1749115431,
"owned_by": "vllm",
"root": null,
"parent": null
}
]
}Checked ray cluster with distilbert/distilgpt2 model working curl -X POST http://localhost:30080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "distilbert/distilgpt2",
"prompt": "Once upon a time,",
"max_tokens": 10
}'
{
"id": "cmpl-826415dd4f7649b9a6af5130a786bc5d",
"object": "text_completion",
"created": 1749115498,
"model": "distilbert/distilgpt2",
"choices": [
{
"index": 0,
"text": " our journey was certainly one of magnitude quicker than those",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 15,
"completion_tokens": 10,
"prompt_tokens_details": null
},
"kv_transfer_params": null
}Checked typical deployment with facebook/opt-125m model working curl -X POST http://localhost:30080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"prompt": "Once upon a time,",
"max_tokens": 10
}'
{
"id": "cmpl-854f5011f073410a97085a2d50b6ef22",
"object": "text_completion",
"created": 1749115612,
"model": "facebook/opt-125m",
"choices": [
{
"index": 0,
"text": " gonorrhoeas also had that beneficial holy",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 6,
"total_tokens": 16,
"completion_tokens": 10,
"prompt_tokens_details": null
},
"kv_transfer_params": null
} |
|
@YuhanLiu11 Thank you for checking! |
|
I will add another commit to include guidelines for multiple deployments in the tutorial document. |
…ed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Signed-off-by: ahinsutime <ahinsutime@gmail.com>
|
Tested with two ray clusters too values servingEngineSpec:
runtimeClassName: ""
modelSpec:
- name: "distilgpt2-raycluster"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "distilbert/distilgpt2"
replicaCount: 1
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1
vllmConfig:
tensorParallelSize: 1
pipelineParallelSize: 2
shmSize: "20Gi"
raySpec:
headNode:
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1
- name: "opt125m-raycluster"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "facebook/opt-125m"
replicaCount: 1
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1
vllmConfig:
tensorParallelSize: 1
pipelineParallelSize: 2
shmSize: "20Gi"
raySpec:
headNode:
requestCPU: 1
requestMemory: "20Gi"
requestGPU: 1Installed helm chart helm install vllm ./helm/ -f tutorials/assets/values-15-b-minimal-pipeline-parallel-example-multiple-modelspec.yaml
NAME: vllm
LAST DEPLOYED: Thu Jun 5 09:47:43 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: Nonekubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kuberay-operator-f89ddb644-w5cdt 1/1 Running 0 18d 192.168.190.22 instance-20250503-060921 <none> <none>
vllm-deployment-router-76977dcc4d-n59zv 1/1 Running 0 7m7s 192.168.165.212 insudevmachine <none> <none>
vllm-distilgpt2-raycluster-raycluster-head-c7z75 1/1 Running 0 7m7s 192.168.190.56 instance-20250503-060921 <none> <none>
vllm-distilgpt2-raycluster-raycluster-ray-worker-5nwx6 1/1 Running 0 7m7s 192.168.165.213 insudevmachine <none> <none>
vllm-opt125m-raycluster-raycluster-head-278xt 1/1 Running 0 7m7s 192.168.190.57 instance-20250503-060921 <none> <none>
vllm-opt125m-raycluster-raycluster-ray-worker-5bnlz 1/1 Running 0 7m7s 192.168.165.214 insudevmachine <none> <none>facebook/opt-125m curl -X POST http://localhost:30080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"prompt": "Once upon a time,",
"max_tokens": 10
}'
{
"id": "cmpl-447f6cd06298409c83af79e5e4206634",
"object": "text_completion",
"created": 1749117098,
"model": "facebook/opt-125m",
"choices": [
{
"index": 0,
"text": " before President Woodrow Wilson arrived and ran the United",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 6,
"total_tokens": 16,
"completion_tokens": 10,
"prompt_tokens_details": null
},
"kv_transfer_params": null
}distilbert/distilgpt2 curl -X POST http://localhost:30080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "distilbert/distilgpt2",
"prompt": "Once upon a time,",
"max_tokens": 10
}'
{
"id": "cmpl-cc6a8d3f617846afa9e32c132eabd5d7",
"object": "text_completion",
"created": 1749117081,
"model": "distilbert/distilgpt2",
"choices": [
{
"index": 0,
"text": " regime change worked. The Hereviks celebrated",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 15,
"completion_tokens": 10,
"prompt_tokens_details": null
},
"kv_transfer_params": null
} |
|
@ahinsutime This looks great! Thank you. Is the aim to include this update in the 0.1.4 release? :) |
I hope so too. It will be better if this bugfix PR merged as soon as possible. |
YuhanLiu11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix looks good to me!!
* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Updated tutorial document and values for raySpec. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Updated helm chart version to sync with changes due to bugfix. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Fixed configmap conflicts by distinguishing configmap names. Signed-off-by: ahinsutime <ahinsutime@gmail.com> --------- Signed-off-by: ahinsutime <ahinsutime@gmail.com> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>
* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Updated tutorial document and values for raySpec. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Updated helm chart version to sync with changes due to bugfix. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Fixed configmap conflicts by distinguishing configmap names. Signed-off-by: ahinsutime <ahinsutime@gmail.com> --------- Signed-off-by: ahinsutime <ahinsutime@gmail.com> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com> Signed-off-by: David Gao <davidgao313@outlook.com>
* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Updated tutorial document and values for raySpec. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Updated helm chart version to sync with changes due to bugfix. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Fixed configmap conflicts by distinguishing configmap names. Signed-off-by: ahinsutime <ahinsutime@gmail.com> --------- Signed-off-by: ahinsutime <ahinsutime@gmail.com> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com> Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>
FILL IN THE PR DESCRIPTION HERE
FIX #482 (link existing issues this PR will resolve)
BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
-swhen doinggit commit[Bugfix],[Feat], and[CI].Detailed Checklist (Click to Expand)
Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]for bug fixes.[CI/Build]for build or continuous integration improvements.[Doc]for documentation fixes and improvements.[Feat]for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]for changes to thevllm_router(e.g., routing algorithm, router observability, etc.).[Misc]for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-committo format your code. SeeREADME.mdfor installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:header which certifies agreement with the terms of the DCO.Using
-swithgit commitwill automatically add this header.What to Expect for the Reviews
We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.