Bugfix/482 helm rayspec fix #483

ahinsutime · 2025-06-03T17:15:43Z

FILL IN THE PR DESCRIPTION HERE

FIX #482 (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

…ith modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com>

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

ahinsutime · 2025-06-03T17:22:23Z

Confirmed intended behavior (to create a single ray cluster and a typical deployment):

# helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml 
servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "distilgpt2-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1
  - name: "distilgpt2-deployment"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1

    shmSize: "20Gi"

# deploy status (I have only 2 GPUs so one of ray node could not run)
# kubectl get pods
NAME                                                          READY   STATUS    RESTARTS   AGE
kuberay-operator-f89ddb644-w5cdt                              1/1     Running   0          17d
vllm-deployment-router-76977dcc4d-6qvcn                       1/1     Running   0          6m
vllm-distilgpt2-deployment-deployment-vllm-57bcc4fcdc-4crb6   1/1     Running   0          6m
vllm-distilgpt2-raycluster-raycluster-head-4629m              0/1     Running   0          6m
vllm-distilgpt2-raycluster-raycluster-ray-worker-4kznr        0/1     Pending   0          6m

cc. @bennorris123

ahinsutime · 2025-06-03T17:31:19Z

I will set this pr ready after I check it is working on multi-node (with more than 2 GPUs total).

ahinsutime · 2025-06-05T09:28:00Z

Tested and confirmed working with multi nodes:

kubectl get nodes
NAME                       STATUS   ROLES           AGE    VERSION
instance-20250503-060921   Ready    control-plane   19d    v1.32.4
insudevmachine             Ready    <none>          118m   v1.32.4

Installed helm chart

helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml 
NAME: vllm
LAST DEPLOYED: Thu Jun  5 09:16:59 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Used values

servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "distilgpt2-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1
  - name: "opt125m-deployment"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "facebook/opt-125m"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1

    shmSize: "20Gi"

Check multiple deployments

kubectl get pods
NAME                                                      READY   STATUS    RESTARTS   AGE     IP                NODE                       NOMINATED NODE   READINESS GAT
ES
kuberay-operator-f89ddb644-w5cdt                          1/1     Running   0          18d     192.168.190.22    instance-20250503-060921   <none>           <none>
vllm-deployment-router-76977dcc4d-rhr4k                   1/1     Running   0          6m11s   192.168.165.203   insudevmachine             <none>           <none>
vllm-distilgpt2-raycluster-raycluster-head-k5msz          1/1     Running   0          6m11s   192.168.190.51    instance-20250503-060921   <none>           <none>
vllm-distilgpt2-raycluster-raycluster-ray-worker-l69xc    1/1     Running   0          6m11s   192.168.165.205   insudevmachine             <none>           <none>
vllm-opt125m-deployment-deployment-vllm-f4c9c9bb4-llm72   1/1     Running   0          6m11s   192.168.165.204   insudevmachine             <none>           <none>

Tested deployments and traffic

 kubectl port-forward svc/vllm-router-service 30080:80

curl http://localhost:30080/v1/models

{
    "object": "list",
    "data": [
        {
            "id": "facebook/opt-125m",
            "object": "model",
            "created": 1749115431,
            "owned_by": "vllm",
            "root": null,
            "parent": null
        },
        {
            "id": "distilbert/distilgpt2",
            "object": "model",
            "created": 1749115431,
            "owned_by": "vllm",
            "root": null,
            "parent": null
        }
    ]
}

Checked ray cluster with distilbert/distilgpt2 model working

   curl -X POST http://localhost:30080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "distilbert/distilgpt2",
      "prompt": "Once upon a time,",
      "max_tokens": 10
    }'

{
    "id": "cmpl-826415dd4f7649b9a6af5130a786bc5d",
    "object": "text_completion",
    "created": 1749115498,
    "model": "distilbert/distilgpt2",
    "choices": [
        {
            "index": 0,
            "text": " our journey was certainly one of magnitude quicker than those",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "total_tokens": 15,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

Checked typical deployment with facebook/opt-125m model working

curl -X POST http://localhost:30080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "facebook/opt-125m",
      "prompt": "Once upon a time,",
      "max_tokens": 10
    }'

{
    "id": "cmpl-854f5011f073410a97085a2d50b6ef22",
    "object": "text_completion",
    "created": 1749115612,
    "model": "facebook/opt-125m",
    "choices": [
        {
            "index": 0,
            "text": " gonorrhoeas also had that beneficial holy",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 6,
        "total_tokens": 16,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

ahinsutime · 2025-06-05T09:29:44Z

@YuhanLiu11 Thank you for checking!

ahinsutime · 2025-06-05T09:37:29Z

I will add another commit to include guidelines for multiple deployments in the tutorial document.

…ed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com>

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

ahinsutime · 2025-06-05T09:57:29Z

Tested with two ray clusters too

values

servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "distilgpt2-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1
  - name: "opt125m-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "facebook/opt-125m"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1

Installed helm chart

helm install vllm ./helm/ -f tutorials/assets/values-15-b-minimal-pipeline-parallel-example-multiple-modelspec.yaml 
NAME: vllm
LAST DEPLOYED: Thu Jun  5 09:47:43 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

kubectl get pods -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE    IP                NODE                       NOMINATED NODE   READINESS GATES
kuberay-operator-f89ddb644-w5cdt                         1/1     Running   0          18d    192.168.190.22    instance-20250503-060921   <none>           <none>
vllm-deployment-router-76977dcc4d-n59zv                  1/1     Running   0          7m7s   192.168.165.212   insudevmachine             <none>           <none>
vllm-distilgpt2-raycluster-raycluster-head-c7z75         1/1     Running   0          7m7s   192.168.190.56    instance-20250503-060921   <none>           <none>
vllm-distilgpt2-raycluster-raycluster-ray-worker-5nwx6   1/1     Running   0          7m7s   192.168.165.213   insudevmachine             <none>           <none>
vllm-opt125m-raycluster-raycluster-head-278xt            1/1     Running   0          7m7s   192.168.190.57    instance-20250503-060921   <none>           <none>
vllm-opt125m-raycluster-raycluster-ray-worker-5bnlz      1/1     Running   0          7m7s   192.168.165.214   insudevmachine             <none>           <none>

facebook/opt-125m

 curl -X POST http://localhost:30080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "facebook/opt-125m",
    "prompt": "Once upon a time,",
    "max_tokens": 10
  }'

{
    "id": "cmpl-447f6cd06298409c83af79e5e4206634",
    "object": "text_completion",
    "created": 1749117098,
    "model": "facebook/opt-125m",
    "choices": [
        {
            "index": 0,
            "text": " before President Woodrow Wilson arrived and ran the United",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 6,
        "total_tokens": 16,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

distilbert/distilgpt2

curl -X POST http://localhost:30080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "distilbert/distilgpt2",
      "prompt": "Once upon a time,",
      "max_tokens": 10
}'

{
    "id": "cmpl-cc6a8d3f617846afa9e32c132eabd5d7",
    "object": "text_completion",
    "created": 1749117081,
    "model": "distilbert/distilgpt2",
    "choices": [
        {
            "index": 0,
            "text": " regime change worked. The Hereviks celebrated",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "total_tokens": 15,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

bennorris123 · 2025-06-05T10:03:51Z

@ahinsutime This looks great! Thank you. Is the aim to include this update in the 0.1.4 release? :)

ahinsutime · 2025-06-05T10:26:06Z

@ahinsutime This looks great! Thank you. Is the aim to include this update in the 0.1.4 release? :)

I hope so too. It will be better if this bugfix PR merged as soon as possible.

YuhanLiu11

This fix looks good to me!!

* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Updated tutorial document and values for raySpec. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Updated helm chart version to sync with changes due to bugfix. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Fixed configmap conflicts by distinguishing configmap names. Signed-off-by: ahinsutime <ahinsutime@gmail.com> --------- Signed-off-by: ahinsutime <ahinsutime@gmail.com> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>

* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Updated tutorial document and values for raySpec. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Updated helm chart version to sync with changes due to bugfix. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Fixed configmap conflicts by distinguishing configmap names. Signed-off-by: ahinsutime <ahinsutime@gmail.com> --------- Signed-off-by: ahinsutime <ahinsutime@gmail.com> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com> Signed-off-by: David Gao <davidgao313@outlook.com>

* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Updated tutorial document and values for raySpec. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Updated helm chart version to sync with changes due to bugfix. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com> * [Bugfix] Fixed configmap conflicts by distinguishing configmap names. Signed-off-by: ahinsutime <ahinsutime@gmail.com> --------- Signed-off-by: ahinsutime <ahinsutime@gmail.com> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com> Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>

ahinsutime added 3 commits June 3, 2025 17:08

[bugfix] Bugfixed raySpec preventing multiple deployments specified w…

49e467f

…ith modelSpecs. Signed-off-by: ahinsutime <ahinsutime@gmail.com>

[Doc] Updated tutorial document and values for raySpec.

e54b876

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

[Bugfix] Updated helm chart version to sync with changes due to bugfix.

346062f

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

ahinsutime marked this pull request as ready for review June 3, 2025 17:24

ahinsutime marked this pull request as draft June 3, 2025 17:30

ahinsutime marked this pull request as ready for review June 5, 2025 09:28

ahinsutime added 2 commits June 5, 2025 09:39

[Doc] Added guideline to deploy both ray cluster and deployments. Fix…

b4632ea

…ed typos. Added more example values. Signed-off-by: ahinsutime <ahinsutime@gmail.com>

[Bugfix] Fixed configmap conflicts by distinguishing configmap names.

f79af40

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

Merge branch 'main' into bugfix/482-helm-rayspec-fix

6b462b4

YuhanLiu11 approved these changes Jun 5, 2025

View reviewed changes

YuhanLiu11 merged commit 6e3c06f into vllm-project:main Jun 5, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix/482 helm rayspec fix #483

Bugfix/482 helm rayspec fix #483

Uh oh!

ahinsutime commented Jun 3, 2025

Uh oh!

ahinsutime commented Jun 3, 2025 •

edited

Loading

Uh oh!

ahinsutime commented Jun 3, 2025

Uh oh!

ahinsutime commented Jun 5, 2025 •

edited

Loading

Uh oh!

ahinsutime commented Jun 5, 2025

Uh oh!

ahinsutime commented Jun 5, 2025

Uh oh!

ahinsutime commented Jun 5, 2025

Uh oh!

bennorris123 commented Jun 5, 2025

Uh oh!

ahinsutime commented Jun 5, 2025 •

edited

Loading

Uh oh!

YuhanLiu11 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bugfix/482 helm rayspec fix #483

Bugfix/482 helm rayspec fix #483

Uh oh!

Conversation

ahinsutime commented Jun 3, 2025

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

ahinsutime commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahinsutime commented Jun 3, 2025

Uh oh!

ahinsutime commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahinsutime commented Jun 5, 2025

Uh oh!

ahinsutime commented Jun 5, 2025

Uh oh!

ahinsutime commented Jun 5, 2025

Uh oh!

bennorris123 commented Jun 5, 2025

Uh oh!

ahinsutime commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuhanLiu11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahinsutime commented Jun 3, 2025 •

edited

Loading

ahinsutime commented Jun 5, 2025 •

edited

Loading

ahinsutime commented Jun 5, 2025 •

edited

Loading