Skip to content

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface#3469

Open
BjarniHaukur wants to merge 68 commits intohuggingface:mainfrom
ASSERT-KTH:async-vllm-server
Open

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface#3469
BjarniHaukur wants to merge 68 commits intohuggingface:mainfrom
ASSERT-KTH:async-vllm-server

Conversation

@BjarniHaukur
Copy link

What does this PR do?

This PR adds a new vllm_serve_async.py script to TRL. It:

  • Enables asynchronous, OpenAI-compatible inference with vLLM
  • Supports models that use tool calls (e.g., search APIs, python tool, general terminal usage)
  • Mirrors the weight syncing logic from vllm_serve.py
  • Delegates endpoint complexity to vllm.entrypoints.openai.api_server
  • Exposes a rollout_func interface that lets users define custom input/output structures and tool definitions to forward into reward functions

Fixes #3284

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@tmabraham
Copy link

thank you for this PR!!

@lewtun
Copy link
Member

lewtun commented Oct 20, 2025

Hi @BjarniHaukur thank you for the PR! We're now looking to integrate environments in TRL, so would you like to rebase your branch on main so we can test your proposal more thoroughly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for realistic multi-step rollouts via async vLLM API

4 participants

Comments