[Router] Expose /tokenize and /detokenize endpoints #541

Exchioz · 2025-06-26T04:48:30Z

[Router] Expose `/tokenize` and `/detokenize` endpoints

This PR adds support for forwarding /tokenize and /detokenize requests through the vllm_router. These endpoints already exist in the vLLM backend but were not exposed by the production router.

This change enables users to access tokenization and detokenization functionality via the production-stack API, which is useful for analyzing token usage, preparing prompts, and monitoring inference behavior

Summary of Changes

Added @main_router.post("/tokenize") and @main_router.post("/detokenize") routes.
Both use route_general_request(...) to proxy to the backend vLLM endpoints.
This matches the behavior of other endpoints like /completions and /embeddings.

Fixes

No existing issue yet. Feel free to reference if created later.

Checklist

Code passes all pre-commit checks.
Commit is signed with -s.
Change is categorized under [Router].
Changes are backward-compatible and do not break existing routes.
(Optional) Add integration test if maintainers request it.

Signed-off-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id>

gemini-code-assist

Summary of Changes

Hello @Exchioz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to extend the vllm_router's capabilities by exposing the /tokenize and /detokenize endpoints from the vLLM backend. This makes tokenization and detokenization functionalities directly accessible through the production API, enabling users to leverage these features for various use cases like prompt preparation and token usage analysis.

Highlights

New API Endpoints: I've added two new POST endpoints, /tokenize and /detokenize, to the main_router.
Backend Proxying: These new endpoints are configured to proxy requests directly to their corresponding vLLM backend services using the existing route_general_request utility, mirroring the behavior of other established routes like /completions and /embeddings.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds /tokenize and /detokenize endpoints to the main router by reusing the route_general_request function. My review identifies a high-severity issue with this approach. The route_general_request function requires a model field in the request body for routing, which is not part of the standard vLLM API for these new endpoints. This will cause requests using the standard API to fail and creates an inconsistency between the router and the backend. I've recommended creating a dedicated routing function for these endpoints to handle this correctly.

src/vllm_router/routers/main_router.py

YuhanLiu11 · 2025-06-26T16:46:06Z

This is a super helpful feature! Thanks @Exchioz . We'll take a look on our side.

Shaoting-Feng · 2025-07-01T18:48:36Z

Can you provide a minimal example demonstrating how these endpoints work?

Exchioz · 2025-07-02T03:23:09Z

Hi @Shaoting-Feng, thanks for the review!

Here's an example demonstrating how to /tokenize and /detokenize enpoint works.

1. `/tokenize` enpoint

Request

curl -X POST 'http://localhost:8000/tokenize' \
   -H 'Content-Type: application/json' \
      'Authorization: Bearer' \
   -d '{
       "model": "meta-llama/Llama-3.1-8B-Instruct",
       "prompt": "Hello World"
   }'

Expected Output

output:
{
    "count": 3,
    "max_model_len": 8192,
    "tokens": [
        128000,
        9906,
        4435
    ]
}

2. `/detokenize` enpoint

Request

curl -X POST 'http://localhost:8000/detokenize' \
   -H 'Content-Type: application/json' \
      'Authorization: Bearer' \
   -d '{
       "model": "meta-llama/Llama-3.1-8B-Instruct",
       "token": [128000,9906,4435]
   }'

Expected output

{
    "prompt": "<|begin_of_text|>Hello World"
}

Shaoting-Feng

LGTM. Please update branch and we are ready to go.

ehalit · 2025-09-16T12:18:04Z

I figured the required format for the /detokenize endpoint should have the "tokens" field instead of "token" field:

curl -X POST 'http://localhost:8000/detokenize' \
   -H 'Content-Type: application/json' \
      'Authorization: Bearer' \
   -d '{
       "model": "meta-llama/Llama-3.1-8B-Instruct",
       "tokens": [128000,9906,4435]
   }'

Signed-off-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id> Co-authored-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id> Co-authored-by: Yuhan Liu <32589867+YuhanLiu11@users.noreply.github.com> Signed-off-by: senne.mennes@capgemini.com <senne.mennes@capgemini.com>

[Router] Expose /tokenize and /detokenize endpoints

75b09ef

Signed-off-by: Ivan Rajwa Naufal <ivan.naufal@datacomm.co.id>

gemini-code-assist bot reviewed Jun 26, 2025

View reviewed changes

src/vllm_router/routers/main_router.py Show resolved Hide resolved

Shaoting-Feng self-assigned this Jul 1, 2025

Shaoting-Feng approved these changes Jul 7, 2025

View reviewed changes

Merge branch 'main' into router/add-tokenize-detokenize

5a29587

YuhanLiu11 merged commit 5a40d76 into vllm-project:main Jul 7, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Router] Expose /tokenize and /detokenize endpoints #541

[Router] Expose /tokenize and /detokenize endpoints #541

Uh oh!

Exchioz commented Jun 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

YuhanLiu11 commented Jun 26, 2025

Uh oh!

Shaoting-Feng commented Jul 1, 2025

Uh oh!

Exchioz commented Jul 2, 2025

Uh oh!

Shaoting-Feng left a comment •

edited

Loading

Uh oh!

Uh oh!

ehalit commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Router] Expose /tokenize and /detokenize endpoints #541

[Router] Expose /tokenize and /detokenize endpoints #541

Uh oh!

Conversation

Exchioz commented Jun 26, 2025