[Feat] Adding support to turn on/off engine deployment #311

dumb0002 · 2025-03-20T21:41:18Z

This PR updates the Helm chart to easily allow to turn on/off the deployment of the vLLM engine. This supports the use-cases when there are already engine instances running and we only need the deployment of the router with the right configurations.

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

YuhanLiu11 · 2025-03-24T19:53:24Z

hey @dumb0002 thanks for your PR! This is great! Will take a look soon.

YuhanLiu11 · 2025-03-26T05:55:40Z

Hey @dumb0002 — thanks again for the PR! 🙌
Would you mind sharing an example log showing the pod statuses? I just want to double-check that only the router pods stay up when the vLLM engine deployments are turned off.

Shaoting-Feng

It appears that when enableEngine is set to true, the intended behavior is to skip or exclude all fields related to .servingEngineSpec. However, I noticed that files like secrets.yaml and the vllmApiKey field have not been modified accordingly. Could you please clarify whether these should also be skipped or if there's a specific reason they remain unchanged?

Shaoting-Feng · 2025-03-26T09:15:19Z

Btw when engine instances are already running, does launching the router pod ensure that it detects these existing engines through the service discovery mechanism (for example, via _watch_engines() in service_discovery.py)?

dumb0002 · 2025-03-26T14:34:25Z

Btw when engine instances are already running, does launching the router pod ensure that it detects these existing engines through the service discovery mechanism (for example, via _watch_engines() in service_discovery.py)?

@Shaoting-Feng, yes, the router is able to detect the existing engines by setting the flag --k8s-label-selector with the right labels. See the following configuration example below:

servingEngineSpec:
  enableEngine: false

routerSpec:
  enableRouter: true
  replicaCount: 1
  containerPort: 8000
  servicePort: 80

  serviceDiscovery: "k8s"
  routingLogic: "roundrobin"
  extraArgs: ["--k8s-label-selector", "environment=test""]

  ingress:
    enabled: true
    hosts:
      - host: vllm-router.local
        paths:
          - path: /
            pathType: Prefix

dumb0002 · 2025-03-26T18:40:33Z

It appears that when enableEngine is set to true, the intended behavior is to skip or exclude all fields related to .servingEngineSpec. However, I noticed that files like secrets.yaml and the vllmApiKey field have not been modified accordingly. Could you please clarify whether these should also be skipped or if there's a specific reason they remain unchanged?

@Shaoting-Feng, you're correct. The intended behavior is to skip or exclude all fields related to .servingEngineSpec indeed. I just pushed a commit to fix the issues that you found related to secrets.yaml and the vllmApiKey. Thanks!

Now, if engine instances are already running and secured with the api key vllmApiKey. Then, when launching the router the name of the secret associated with the api key needs to be specified. See example below:

servingEngineSpec:
  enableEngine: false

routerSpec:
  enableRouter: true
  replicaCount: 1
  containerPort: 8000
  servicePort: 80

  serviceDiscovery: "k8s"
  routingLogic: "roundrobin"
  extraArgs: ["--k8s-label-selector", "mvp=test"]

  vllmApiKey:
    secretName: "vllm-test-secrets"
    secretKey: "vllmApiKey"

  ingress:
    enabled: true
    hosts:
      - host: vllm-router.local
        paths:
          - path: /
            pathType: Prefix

dumb0002 · 2025-03-26T18:58:07Z

Hey @dumb0002 — thanks again for the PR! 🙌 Would you mind sharing an example log showing the pod statuses? I just want to double-check that only the router pods stay up when the vLLM engine deployments are turned off.

@YuhanLiu11, is this attached example log what you're looking for? thanks!
production-stack-PR-311-log.txt

dumb0002 · 2025-03-26T19:20:33Z

FYI, I see this PR is failing one of the tests - I checked it and it seems unrelated to it, please let me know if otherwise. I see the following error from the test logs:

X Exiting due to MK_USAGE: loading profile: cluster "minikube" does not exist

Shaoting-Feng · 2025-03-26T19:43:41Z

FYI, I see this PR is failing one of the tests - I checked it and it seems unrelated to it.

Could you rebase (merge) the main branch into your branch? There is some update regarding the CI workflow (PR #325 ). Thanks!

dumb0002 · 2025-03-26T19:54:44Z

FYI, I see this PR is failing one of the tests - I checked it and it seems unrelated to it.

Could you rebase (merge) the main branch into your branch? There is some update regarding the CI workflow (PR #325 ). Thanks!

Done. Thanks!

YuhanLiu11 · 2025-03-26T21:57:00Z

helm/values.yaml

One final minor comment :): Can you change this "VLMM" to "vLLM"? Thanks!

oh! that was a typo - fixed now. Thanks!

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>

YuhanLiu11

LGTM!

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com> Signed-off-by: Luke Tux <lwtucker@uchicago.edu>

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com> Signed-off-by: allytotheson <82621261+allytotheson@users.noreply.github.com>

dumb0002 force-pushed the enable-engine-router branch from 39c117e to 639c847 Compare March 20, 2025 22:17

YuhanLiu11 self-requested a review March 26, 2025 05:47

Shaoting-Feng reviewed Mar 26, 2025

View reviewed changes

dumb0002 force-pushed the enable-engine-router branch from 3c0c589 to c239ccf Compare March 26, 2025 18:33

dumb0002 force-pushed the enable-engine-router branch from c239ccf to a689fef Compare March 26, 2025 19:53

YuhanLiu11 reviewed Mar 26, 2025

View reviewed changes

Adding support to turn on/off engine deployment

586cc81

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>

dumb0002 force-pushed the enable-engine-router branch from 11db788 to 586cc81 Compare March 26, 2025 22:09

YuhanLiu11 approved these changes Mar 27, 2025

View reviewed changes

YuhanLiu11 merged commit c0cc922 into vllm-project:main Mar 27, 2025
9 checks passed

lucas-tucker pushed a commit to lucas-tucker/production-stack that referenced this pull request Apr 3, 2025

Adding support to turn on/off engine deployment (vllm-project#311)

1760fdd

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com> Signed-off-by: Luke Tux <lwtucker@uchicago.edu>

JustinDuy pushed a commit to JustinDuy/production-stack-1 that referenced this pull request Jun 13, 2025

Adding support to turn on/off engine deployment (vllm-project#311)

a9bcb4a

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>

[Feat] Adding support to turn on/off engine deployment #311

[Feat] Adding support to turn on/off engine deployment #311

Uh oh!

Conversation

dumb0002 commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

YuhanLiu11 commented Mar 24, 2025

Uh oh!

YuhanLiu11 commented Mar 26, 2025

Uh oh!

Shaoting-Feng left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shaoting-Feng commented Mar 26, 2025

Uh oh!

dumb0002 commented Mar 26, 2025

Uh oh!

dumb0002 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dumb0002 commented Mar 26, 2025

Uh oh!

dumb0002 commented Mar 26, 2025

Uh oh!

Shaoting-Feng commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dumb0002 commented Mar 26, 2025

Uh oh!

YuhanLiu11 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

dumb0002 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

YuhanLiu11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dumb0002 commented Mar 20, 2025 •

edited

Loading

Shaoting-Feng left a comment •

edited

Loading

dumb0002 commented Mar 26, 2025 •

edited

Loading

Shaoting-Feng commented Mar 26, 2025 •

edited

Loading