Skip to content

Conversation

@NargiT
Copy link
Contributor

@NargiT NargiT commented Oct 30, 2025

Because sleep mode is checked through pods, we still need pod role binding

def _check_engine_sleep_mode(self, service_name) -> Optional[bool]:
try:
service = self.k8s_api.read_namespaced_service(service_name, self.namespace)
if not service.spec.selector:
return False
selector = ",".join([f"{k}={v}" for k, v in service.spec.selector.items()])
pods = self.k8s_api.list_namespaced_pod(
namespace=self.namespace, label_selector=selector
)
if not pods.items:
logger.warning(
f"No pods found for service {service_name} in namespace {self.namespace}"
)
return False
enable_sleep_mode = False
for container in pods.items[0].spec.containers:
if container.name == "vllm":
for arg in container.command:
if arg == "--enable-sleep-mode":
enable_sleep_mode = True
break
return enable_sleep_mode
except client.rest.ApiException as e:
logger.error(
f"Error checking if sleep-mode is enable for service {service_name}: {e}"
)
return False


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

Signed-off-by: nargit <NargiT@users.noreply.github.com>
Copilot AI review requested due to automatic review settings October 30, 2025 11:12
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NargiT, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the Kubernetes Helm chart configuration where the pod-reader RoleBinding was not consistently applied. Previously, this binding was only created when k8sServiceDiscoveryType was pod-ip, leading to issues with features like sleep mode checks that require access to pod information, even when service-name discovery was in use. The change ensures that the pod-reader RoleBinding is always present when the router is enabled, thereby stabilizing pod-related functionalities across different service discovery configurations.

Highlights

  • Kubernetes RoleBinding Logic: The pod-reader RoleBinding is now unconditionally created when the router is enabled, ensuring necessary permissions for pod-related operations regardless of the k8sServiceDiscoveryType.
  • Service Discovery Bugfix: Addresses a bug where sleep mode checks, which rely on pod information, failed when k8s_discovery_mode was set to service-name due to missing pod-reader permissions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the conditional logic for rendering RoleBindings in the router service discovery configuration. The change removes the wrapping condition on the first RoleBinding and converts an else clause to an explicit condition check.

  • Removes the conditional wrapper around the first RoleBinding (pod-ip discovery type)
  • Changes the else clause to an explicit check for "service-name" discovery type
Comments suppressed due to low confidence (1)

helm/templates/rolebinding.yaml:19

  • The first RoleBinding (lines 2-14) is now always rendered, while the second RoleBinding (lines 16-28) is rendered when k8sServiceDiscoveryType equals 'service-name'. This creates a conflict: both RoleBindings have the same name '{{ .Release.Name }}-deployment-access-binding' but reference different roles. When k8sServiceDiscoveryType is 'pod-ip' (the default), only the first RoleBinding should be created. When it's 'service-name', only the second should be created. Add a conditional wrapper around the first RoleBinding: {{- if .Values.routerSpec.k8sServiceDiscoveryType | default \"pod-ip\" | eq \"pod-ip\" }} before line 2, or change line 15 to use else to ensure mutual exclusivity.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ .Release.Name }}-deployment-access-binding
  namespace: {{ .Release.Namespace }}
subjects:
  - kind: ServiceAccount
    name: {{ .Release.Name }}-router-service-account
    namespace: {{ .Release.Namespace }}
roleRef:
  kind: Role
  name: {{ .Release.Name }}-pod-reader
  apiGroup: rbac.authorization.k8s.io
{{- if .Values.routerSpec.k8sServiceDiscoveryType | default "pod-ip" | eq "service-name" }}
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ .Release.Name }}-deployment-access-binding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to ensure that the pod-reader RoleBinding is always created, even when k8s_discovery_mode is service-name, because it's needed for checking sleep mode. This is a valid change.

However, the implementation in helm/templates/rolebinding.yaml has a critical issue: when k8s_discovery_mode is service-name, it generates two RoleBinding resources with the same name and without a --- separator, which will cause the Helm deployment to fail. I've provided a suggestion to fix this by giving each RoleBinding a unique name and separating them correctly.

Additionally, for this change to work, helm/templates/role.yaml also needs to be updated. Currently, it only creates the pod-reader Role when k8s_discovery_mode is pod-ip. It should be modified to always create the pod-reader Role and conditionally create the service-reader Role, similar to the logic for RoleBindings. Without this change, the pod-reader-binding will fail to bind to a non-existent Role.

Signed-off-by: nargit <NargiT@users.noreply.github.com>
@NargiT NargiT requested a review from Copilot October 30, 2025 12:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- apiGroups: [ "" ] # "" indicates the core API group
resources: [ "services", "endpoints" ]
verbs: [ "get", "watch", "list", "patch" ]
resources: ["pods", "services", "endpoints"]
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When k8sServiceDiscoveryType is not 'pod-ip', the role now grants access to 'pods' in addition to 'services' and 'endpoints'. This is a permission expansion from the original configuration which only granted 'services' and 'endpoints' access. Verify this additional 'pods' permission is intentional for non-pod-ip discovery types, as it broadens the security scope of the service account.

Suggested change
resources: ["pods", "services", "endpoints"]
resources: ["services", "endpoints"]

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe values.yaml is an interface for the end user whereas the template logic is here to hide complexity and allow easy maintenance for maintainer.

Copy link
Collaborator

@zerofishnoodles zerofishnoodles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zerofishnoodles zerofishnoodles merged commit 4ca1966 into vllm-project:main Oct 31, 2025
16 checks passed
@NargiT NargiT deleted the fix/rolebinding branch November 2, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants