Skip to content

Conversation

@ahinsutime
Copy link
Contributor

FILL IN THE PR DESCRIPTION HERE

FIX #482 (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

…ith modelSpecs.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Signed-off-by: ahinsutime <ahinsutime@gmail.com>
@ahinsutime
Copy link
Contributor Author

ahinsutime commented Jun 3, 2025

Confirmed intended behavior (to create a single ray cluster and a typical deployment):

# helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml 
servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "distilgpt2-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1
  - name: "distilgpt2-deployment"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1

    shmSize: "20Gi"
# deploy status (I have only 2 GPUs so one of ray node could not run)
# kubectl get pods
NAME                                                          READY   STATUS    RESTARTS   AGE
kuberay-operator-f89ddb644-w5cdt                              1/1     Running   0          17d
vllm-deployment-router-76977dcc4d-6qvcn                       1/1     Running   0          6m
vllm-distilgpt2-deployment-deployment-vllm-57bcc4fcdc-4crb6   1/1     Running   0          6m
vllm-distilgpt2-raycluster-raycluster-head-4629m              0/1     Running   0          6m
vllm-distilgpt2-raycluster-raycluster-ray-worker-4kznr        0/1     Pending   0          6m

cc. @bennorris123

@ahinsutime ahinsutime marked this pull request as ready for review June 3, 2025 17:24
@ahinsutime ahinsutime marked this pull request as draft June 3, 2025 17:30
@ahinsutime
Copy link
Contributor Author

I will set this pr ready after I check it is working on multi-node (with more than 2 GPUs total).

@ahinsutime
Copy link
Contributor Author

ahinsutime commented Jun 5, 2025

Tested and confirmed working with multi nodes:

kubectl get nodes
NAME                       STATUS   ROLES           AGE    VERSION
instance-20250503-060921   Ready    control-plane   19d    v1.32.4
insudevmachine             Ready    <none>          118m   v1.32.4

Installed helm chart

helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml 
NAME: vllm
LAST DEPLOYED: Thu Jun  5 09:16:59 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Used values

servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "distilgpt2-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1
  - name: "opt125m-deployment"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "facebook/opt-125m"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1

    shmSize: "20Gi"

Check multiple deployments

kubectl get pods
NAME                                                      READY   STATUS    RESTARTS   AGE     IP                NODE                       NOMINATED NODE   READINESS GAT
ES
kuberay-operator-f89ddb644-w5cdt                          1/1     Running   0          18d     192.168.190.22    instance-20250503-060921   <none>           <none>
vllm-deployment-router-76977dcc4d-rhr4k                   1/1     Running   0          6m11s   192.168.165.203   insudevmachine             <none>           <none>
vllm-distilgpt2-raycluster-raycluster-head-k5msz          1/1     Running   0          6m11s   192.168.190.51    instance-20250503-060921   <none>           <none>
vllm-distilgpt2-raycluster-raycluster-ray-worker-l69xc    1/1     Running   0          6m11s   192.168.165.205   insudevmachine             <none>           <none>
vllm-opt125m-deployment-deployment-vllm-f4c9c9bb4-llm72   1/1     Running   0          6m11s   192.168.165.204   insudevmachine             <none>           <none>

Tested deployments and traffic

 kubectl port-forward svc/vllm-router-service 30080:80

curl http://localhost:30080/v1/models

{
    "object": "list",
    "data": [
        {
            "id": "facebook/opt-125m",
            "object": "model",
            "created": 1749115431,
            "owned_by": "vllm",
            "root": null,
            "parent": null
        },
        {
            "id": "distilbert/distilgpt2",
            "object": "model",
            "created": 1749115431,
            "owned_by": "vllm",
            "root": null,
            "parent": null
        }
    ]
}

Checked ray cluster with distilbert/distilgpt2 model working

   curl -X POST http://localhost:30080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "distilbert/distilgpt2",
      "prompt": "Once upon a time,",
      "max_tokens": 10
    }'

{
    "id": "cmpl-826415dd4f7649b9a6af5130a786bc5d",
    "object": "text_completion",
    "created": 1749115498,
    "model": "distilbert/distilgpt2",
    "choices": [
        {
            "index": 0,
            "text": " our journey was certainly one of magnitude quicker than those",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "total_tokens": 15,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

Checked typical deployment with facebook/opt-125m model working

curl -X POST http://localhost:30080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "facebook/opt-125m",
      "prompt": "Once upon a time,",
      "max_tokens": 10
    }'

{
    "id": "cmpl-854f5011f073410a97085a2d50b6ef22",
    "object": "text_completion",
    "created": 1749115612,
    "model": "facebook/opt-125m",
    "choices": [
        {
            "index": 0,
            "text": " gonorrhoeas also had that beneficial holy",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 6,
        "total_tokens": 16,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

@ahinsutime ahinsutime marked this pull request as ready for review June 5, 2025 09:28
@ahinsutime
Copy link
Contributor Author

@YuhanLiu11 Thank you for checking!

@ahinsutime
Copy link
Contributor Author

I will add another commit to include guidelines for multiple deployments in the tutorial document.

…ed typos. Added more example values.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Signed-off-by: ahinsutime <ahinsutime@gmail.com>
@ahinsutime
Copy link
Contributor Author

Tested with two ray clusters too

values

servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "distilgpt2-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1
  - name: "opt125m-raycluster"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "facebook/opt-125m"

    replicaCount: 1

    requestCPU: 1
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    raySpec:
      headNode:
        requestCPU: 1
        requestMemory: "20Gi"
        requestGPU: 1

Installed helm chart

helm install vllm ./helm/ -f tutorials/assets/values-15-b-minimal-pipeline-parallel-example-multiple-modelspec.yaml 
NAME: vllm
LAST DEPLOYED: Thu Jun  5 09:47:43 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
kubectl get pods -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE    IP                NODE                       NOMINATED NODE   READINESS GATES
kuberay-operator-f89ddb644-w5cdt                         1/1     Running   0          18d    192.168.190.22    instance-20250503-060921   <none>           <none>
vllm-deployment-router-76977dcc4d-n59zv                  1/1     Running   0          7m7s   192.168.165.212   insudevmachine             <none>           <none>
vllm-distilgpt2-raycluster-raycluster-head-c7z75         1/1     Running   0          7m7s   192.168.190.56    instance-20250503-060921   <none>           <none>
vllm-distilgpt2-raycluster-raycluster-ray-worker-5nwx6   1/1     Running   0          7m7s   192.168.165.213   insudevmachine             <none>           <none>
vllm-opt125m-raycluster-raycluster-head-278xt            1/1     Running   0          7m7s   192.168.190.57    instance-20250503-060921   <none>           <none>
vllm-opt125m-raycluster-raycluster-ray-worker-5bnlz      1/1     Running   0          7m7s   192.168.165.214   insudevmachine             <none>           <none>

facebook/opt-125m

 curl -X POST http://localhost:30080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "facebook/opt-125m",
    "prompt": "Once upon a time,",
    "max_tokens": 10
  }'

{
    "id": "cmpl-447f6cd06298409c83af79e5e4206634",
    "object": "text_completion",
    "created": 1749117098,
    "model": "facebook/opt-125m",
    "choices": [
        {
            "index": 0,
            "text": " before President Woodrow Wilson arrived and ran the United",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 6,
        "total_tokens": 16,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

distilbert/distilgpt2

curl -X POST http://localhost:30080/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "distilbert/distilgpt2",
      "prompt": "Once upon a time,",
      "max_tokens": 10
}'

{
    "id": "cmpl-cc6a8d3f617846afa9e32c132eabd5d7",
    "object": "text_completion",
    "created": 1749117081,
    "model": "distilbert/distilgpt2",
    "choices": [
        {
            "index": 0,
            "text": " regime change worked. The Hereviks celebrated",
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 5,
        "total_tokens": 15,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    },
    "kv_transfer_params": null
}

@bennorris123
Copy link

@ahinsutime This looks great! Thank you. Is the aim to include this update in the 0.1.4 release? :)

@ahinsutime
Copy link
Contributor Author

ahinsutime commented Jun 5, 2025

@ahinsutime This looks great! Thank you. Is the aim to include this update in the 0.1.4 release? :)

I hope so too. It will be better if this bugfix PR merged as soon as possible.

Copy link
Collaborator

@YuhanLiu11 YuhanLiu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix looks good to me!!

@YuhanLiu11 YuhanLiu11 merged commit 6e3c06f into vllm-project:main Jun 5, 2025
9 checks passed
JustinDuy pushed a commit to JustinDuy/production-stack-1 that referenced this pull request Jun 13, 2025
* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Doc] Updated tutorial document and values for raySpec.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Bugfix] Updated helm chart version to sync with changes due to bugfix.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Bugfix] Fixed configmap conflicts by distinguishing configmap names.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

---------

Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>
davidgao7 pushed a commit to davidgao7/production-stack that referenced this pull request Jun 26, 2025
* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Doc] Updated tutorial document and values for raySpec.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Bugfix] Updated helm chart version to sync with changes due to bugfix.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Bugfix] Fixed configmap conflicts by distinguishing configmap names.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

---------

Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>
Signed-off-by: David Gao <davidgao313@outlook.com>
Senne-Mennes pushed a commit to Senne-Mennes/production-stack that referenced this pull request Oct 22, 2025
* [bugfix] Bugfixed raySpec preventing multiple deployments specified with modelSpecs.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Doc] Updated tutorial document and values for raySpec.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Bugfix] Updated helm chart version to sync with changes due to bugfix.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Doc] Added guideline to deploy both ray cluster and deployments. Fixed typos. Added more example values.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

* [Bugfix] Fixed configmap conflicts by distinguishing configmap names.

Signed-off-by: ahinsutime <ahinsutime@gmail.com>

---------

Signed-off-by: ahinsutime <ahinsutime@gmail.com>
Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>
Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Unable to deploy ray cluster and other deployments at the same time

3 participants