Skip to content

Conversation

@zhouwfang
Copy link
Contributor

@zhouwfang zhouwfang commented Jul 16, 2025

The time complexity of the current round robin-router implementation is O(nlogn) since it runs sorting on every request. With this PR, the best/general case time complexity is O(1) (as long as the same endpoints list is reused), and the worst case time complexity is O(nlogn).

Benchmarks (https://github.com/zhouwfang/production-stack-round-robin-benchmarks/tree/main)

Screenshot 2025-07-15 at 9 08 15 PM
  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

Signed-off-by: Zhou Fang <fang.github@gmail.com>
Signed-off-by: Zhou Fang <fang.github@gmail.com>
Signed-off-by: Zhou Fang <fang.github@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @zhouwfang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a critical performance enhancement for the round robin router. It refactors the endpoint selection logic to introduce an intelligent caching strategy, eliminating redundant sorting operations on every request. This change drastically reduces the average time complexity of routing, leading to a more efficient and responsive router.

Highlights

  • Performance Optimization: The primary goal of this PR is to significantly improve the performance of the round robin router. The previous implementation sorted the list of endpoints on every request, resulting in an O(n log n) time complexity. With these changes, the router now achieves O(1) complexity in the best case (when the endpoint list is reused) and O(n) in the worst case (when the endpoint list changes), by caching the sorted list.
  • Caching Mechanism: A new caching mechanism has been introduced within the RoundRobinRouter. This mechanism stores a pre-sorted list of endpoints (self.sorted_endpoints). The cache is intelligently invalidated and re-populated only when the endpoints object itself changes (checked using id()) or when the content (specifically, the URLs) of the endpoints changes (checked using a hash of the URLs).
  • State Management: New instance variables (self.sorted_endpoints, self.last_endpoints_id, self.last_endpoints_hash) have been added to the RoundRobinRouter's __init__ method. These variables are crucial for maintaining the state required by the caching logic, ensuring efficient and performant routing decisions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the performance of the round-robin router by caching the sorted list of endpoints. To ensure thread safety, I recommend adding a lock to the RoundRobinRouter singleton to prevent race conditions when accessing shared state.

@zhouwfang zhouwfang changed the title [Router] Improve performance of round robin router [Router] Improve performance of round-robin router Jul 16, 2025
Copy link
Contributor

@kobe0938 kobe0938 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current dual-checking approach (id + hash) is a bit sophisticated on the cognitive side. However after running https://github.com/zhouwfang/production-stack-round-robin-benchmarks/tree/main, I can see the performance gain is much better than some other implementations like hash-only or current implementation. Up to you to decide @YuhanLiu11

One minor suggestion is the naming can be even more clearer by changing last_endpoints_id to be cached_endpoints_id and last_endpoints_hash to be cached_endpoints_hash

@YuhanLiu11 YuhanLiu11 merged commit 5aa3b9b into vllm-project:main Jul 31, 2025
14 checks passed
Senne-Mennes pushed a commit to Senne-Mennes/production-stack that referenced this pull request Oct 22, 2025
* perf: improve round robin router

Signed-off-by: Zhou Fang <fang.github@gmail.com>

* revert: function docs

Signed-off-by: Zhou Fang <fang.github@gmail.com>

* style: minor

Signed-off-by: Zhou Fang <fang.github@gmail.com>

---------

Signed-off-by: Zhou Fang <fang.github@gmail.com>
Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>
Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants