Skip to content

Conversation

@Exchioz
Copy link
Contributor

@Exchioz Exchioz commented Jun 26, 2025

[Router] Expose /tokenize and /detokenize endpoints

This PR adds support for forwarding /tokenize and /detokenize requests through the vllm_router. These endpoints already exist in the vLLM backend but were not exposed by the production router.

This change enables users to access tokenization and detokenization functionality via the production-stack API, which is useful for analyzing token usage, preparing prompts, and monitoring inference behavior

Summary of Changes

  • Added @main_router.post("/tokenize") and @main_router.post("/detokenize") routes.
  • Both use route_general_request(...) to proxy to the backend vLLM endpoints.
  • This matches the behavior of other endpoints like /completions and /embeddings.

Fixes

No existing issue yet. Feel free to reference if created later.

Checklist

  • Code passes all pre-commit checks.
  • Commit is signed with -s.
  • Change is categorized under [Router].
  • Changes are backward-compatible and do not break existing routes.
  • (Optional) Add integration test if maintainers request it.

Signed-off-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Exchioz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to extend the vllm_router's capabilities by exposing the /tokenize and /detokenize endpoints from the vLLM backend. This makes tokenization and detokenization functionalities directly accessible through the production API, enabling users to leverage these features for various use cases like prompt preparation and token usage analysis.

Highlights

  • New API Endpoints: I've added two new POST endpoints, /tokenize and /detokenize, to the main_router.
  • Backend Proxying: These new endpoints are configured to proxy requests directly to their corresponding vLLM backend services using the existing route_general_request utility, mirroring the behavior of other established routes like /completions and /embeddings.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds /tokenize and /detokenize endpoints to the main router by reusing the route_general_request function. My review identifies a high-severity issue with this approach. The route_general_request function requires a model field in the request body for routing, which is not part of the standard vLLM API for these new endpoints. This will cause requests using the standard API to fail and creates an inconsistency between the router and the backend. I've recommended creating a dedicated routing function for these endpoints to handle this correctly.

@YuhanLiu11
Copy link
Collaborator

This is a super helpful feature! Thanks @Exchioz . We'll take a look on our side.

@Shaoting-Feng
Copy link
Collaborator

Can you provide a minimal example demonstrating how these endpoints work?

@Shaoting-Feng Shaoting-Feng self-assigned this Jul 1, 2025
@Exchioz
Copy link
Contributor Author

Exchioz commented Jul 2, 2025

Hi @Shaoting-Feng, thanks for the review!

Here's an example demonstrating how to /tokenize and /detokenize enpoint works.

1. /tokenize enpoint

Request

curl -X POST 'http://localhost:8000/tokenize' \
   -H 'Content-Type: application/json' \
      'Authorization: Bearer' \
   -d '{
       "model": "meta-llama/Llama-3.1-8B-Instruct",
       "prompt": "Hello World"
   }'

Expected Output

output:
{
    "count": 3,
    "max_model_len": 8192,
    "tokens": [
        128000,
        9906,
        4435
    ]
}

2. /detokenize enpoint

Request

curl -X POST 'http://localhost:8000/detokenize' \
   -H 'Content-Type: application/json' \
      'Authorization: Bearer' \
   -d '{
       "model": "meta-llama/Llama-3.1-8B-Instruct",
       "token": [128000,9906,4435]
   }'

Expected output

{
    "prompt": "<|begin_of_text|>Hello World"
}

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please update branch and we are ready to go.

@YuhanLiu11 YuhanLiu11 merged commit 5a40d76 into vllm-project:main Jul 7, 2025
14 checks passed
@ehalit
Copy link

ehalit commented Sep 16, 2025

I figured the required format for the /detokenize endpoint should have the "tokens" field instead of "token" field:

curl -X POST 'http://localhost:8000/detokenize' \
   -H 'Content-Type: application/json' \
      'Authorization: Bearer' \
   -d '{
       "model": "meta-llama/Llama-3.1-8B-Instruct",
       "tokens": [128000,9906,4435]
   }'

Senne-Mennes pushed a commit to Senne-Mennes/production-stack that referenced this pull request Oct 22, 2025
Signed-off-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id>
Co-authored-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id>
Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com>
Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants