Skip to content

Conversation

@dumb0002
Copy link
Contributor

@dumb0002 dumb0002 commented Mar 20, 2025

This PR updates the Helm chart to easily allow to turn on/off the deployment of the vLLM engine. This supports the use-cases when there are already engine instances running and we only need the deployment of the router with the right configurations.

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

@dumb0002 dumb0002 force-pushed the enable-engine-router branch from 39c117e to 639c847 Compare March 20, 2025 22:17
@YuhanLiu11
Copy link
Collaborator

hey @dumb0002 thanks for your PR! This is great! Will take a look soon.

@YuhanLiu11 YuhanLiu11 self-requested a review March 26, 2025 05:47
@YuhanLiu11
Copy link
Collaborator

Hey @dumb0002 — thanks again for the PR! 🙌
Would you mind sharing an example log showing the pod statuses? I just want to double-check that only the router pods stay up when the vLLM engine deployments are turned off.

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that when enableEngine is set to true, the intended behavior is to skip or exclude all fields related to .servingEngineSpec. However, I noticed that files like secrets.yaml and the vllmApiKey field have not been modified accordingly. Could you please clarify whether these should also be skipped or if there's a specific reason they remain unchanged?

@Shaoting-Feng
Copy link
Collaborator

Btw when engine instances are already running, does launching the router pod ensure that it detects these existing engines through the service discovery mechanism (for example, via _watch_engines() in service_discovery.py)?

@dumb0002
Copy link
Contributor Author

Btw when engine instances are already running, does launching the router pod ensure that it detects these existing engines through the service discovery mechanism (for example, via _watch_engines() in service_discovery.py)?

@Shaoting-Feng, yes, the router is able to detect the existing engines by setting the flag --k8s-label-selector with the right labels. See the following configuration example below:

servingEngineSpec:
  enableEngine: false

routerSpec:
  enableRouter: true
  replicaCount: 1
  containerPort: 8000
  servicePort: 80

  serviceDiscovery: "k8s"
  routingLogic: "roundrobin"
  extraArgs: ["--k8s-label-selector", "environment=test""]

  ingress:
    enabled: true
    hosts:
      - host: vllm-router.local
        paths:
          - path: /
            pathType: Prefix

@dumb0002 dumb0002 force-pushed the enable-engine-router branch from 3c0c589 to c239ccf Compare March 26, 2025 18:33
@dumb0002
Copy link
Contributor Author

dumb0002 commented Mar 26, 2025

It appears that when enableEngine is set to true, the intended behavior is to skip or exclude all fields related to .servingEngineSpec. However, I noticed that files like secrets.yaml and the vllmApiKey field have not been modified accordingly. Could you please clarify whether these should also be skipped or if there's a specific reason they remain unchanged?

@Shaoting-Feng, you're correct. The intended behavior is to skip or exclude all fields related to .servingEngineSpec indeed. I just pushed a commit to fix the issues that you found related to secrets.yaml and the vllmApiKey. Thanks!

Now, if engine instances are already running and secured with the api key vllmApiKey. Then, when launching the router the name of the secret associated with the api key needs to be specified. See example below:

servingEngineSpec:
  enableEngine: false

routerSpec:
  enableRouter: true
  replicaCount: 1
  containerPort: 8000
  servicePort: 80

  serviceDiscovery: "k8s"
  routingLogic: "roundrobin"
  extraArgs: ["--k8s-label-selector", "mvp=test"]

  vllmApiKey:
    secretName: "vllm-test-secrets"
    secretKey: "vllmApiKey"

  ingress:
    enabled: true
    hosts:
      - host: vllm-router.local
        paths:
          - path: /
            pathType: Prefix

@dumb0002
Copy link
Contributor Author

Hey @dumb0002 — thanks again for the PR! 🙌 Would you mind sharing an example log showing the pod statuses? I just want to double-check that only the router pods stay up when the vLLM engine deployments are turned off.

@YuhanLiu11, is this attached example log what you're looking for? thanks!
production-stack-PR-311-log.txt

@dumb0002
Copy link
Contributor Author

FYI, I see this PR is failing one of the tests - I checked it and it seems unrelated to it, please let me know if otherwise. I see the following error from the test logs:

X Exiting due to MK_USAGE: loading profile: cluster "minikube" does not exist

@Shaoting-Feng
Copy link
Collaborator

Shaoting-Feng commented Mar 26, 2025

FYI, I see this PR is failing one of the tests - I checked it and it seems unrelated to it.

Could you rebase (merge) the main branch into your branch? There is some update regarding the CI workflow (PR #325 ). Thanks!

@dumb0002 dumb0002 force-pushed the enable-engine-router branch from c239ccf to a689fef Compare March 26, 2025 19:53
@dumb0002
Copy link
Contributor Author

FYI, I see this PR is failing one of the tests - I checked it and it seems unrelated to it.

Could you rebase (merge) the main branch into your branch? There is some update regarding the CI workflow (PR #325 ). Thanks!

Done. Thanks!

helm/values.yaml Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final minor comment :): Can you change this "VLMM" to "vLLM"? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! that was a typo - fixed now. Thanks!

Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>
@dumb0002 dumb0002 force-pushed the enable-engine-router branch from 11db788 to 586cc81 Compare March 26, 2025 22:09
Copy link
Collaborator

@YuhanLiu11 YuhanLiu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@YuhanLiu11 YuhanLiu11 merged commit c0cc922 into vllm-project:main Mar 27, 2025
9 checks passed
lucas-tucker pushed a commit to lucas-tucker/production-stack that referenced this pull request Apr 3, 2025
Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>
Signed-off-by: Luke Tux <lwtucker@uchicago.edu>
JustinDuy pushed a commit to JustinDuy/production-stack-1 that referenced this pull request Jun 13, 2025
Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>
allytotheson pushed a commit to allytotheson/production-stack that referenced this pull request Jun 30, 2025
Signed-off-by: Braulio Dumba <brauliodumba@gmail.com>
Signed-off-by: allytotheson <82621261+allytotheson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants