[2/2] Top-k and Top-p support for dtensor worker with vLLM V0 when `TP>1` by zhandaz · Pull Request #774 · NVIDIA-NeMo/RL

zhandaz · 2025-07-28T21:22:31Z

What does this PR do ?

tldr: Support top-k and top-p for dtensor worker with vLLM v0. This pr supports tp>1 on top of #773.

Instead of using _compute_distributed_log_softmax, we implement _compute_distributed_log_softmax_with_sampling to support Top-k and Top-p when TP is enabled.

Note: This change depends on #773 and should be merged after it. We should also decide if we want to merge this implementation or, alternatively, add a warning to users about a potential mismatch between this inference logic and the logic used in policy training for vLLM engine V0 and dtensor with TP>1.

Tests for distributed functionalities and docs will be added after we make the decision.

Issues

Related Issue: #69

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Copilot

Pull Request Overview

This PR adds support for top-k and top-p sampling in the dtensor policy worker with vLLM V0 when tensor parallelism (TP) is greater than 1. The implementation introduces a new distributed log softmax function that handles sampling parameters and modifies existing functions to propagate these parameters through the call stack.

Implements _compute_distributed_log_softmax_with_sampling to handle top-k/top-p sampling in distributed environments
Adds sampling parameter extraction and propagation through the dtensor policy worker
Updates function signatures across the distributed model utilities to support sampling parameters

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
nemo_rl/models/policy/dtensor_policy_worker.py	Adds sampling parameter extraction and passes them to logprob computation functions
nemo_rl/models/dtensor/parallelize.py	Updates function signature to accept and forward sampling parameters
nemo_rl/distributed/model_utils.py	Implements new sampling-aware distributed log softmax and updates all related functions

Copilot · 2025-07-28T21:23:46Z

+    Returns:
+        Log softmax output with sampling applied, same shape as input
+    """
+    if (top_k is not None and top_k == -1) and (top_p is not None and top_p == 1.0):


The condition uses and between two parenthesized conditions, but logically this should be or since either condition being true (top_k disabled OR top_p disabled) should trigger the fallback to regular log softmax.

Suggested change

if (top_k is not None and top_k == -1) and (top_p is not None and top_p == 1.0):

if (top_k is not None and top_k == -1) or (top_p is not None and top_p == 1.0):

Copilot · 2025-07-28T21:23:46Z

-        log_softmax_output = _compute_distributed_log_softmax(
-            vocab_parallel_logits, group=group
-        )
+        # Use sampling-aware distributed log softmax if sampling parameters are provided


The condition on line 142 uses or logic but the comment suggests both parameters need to be provided. The logic is correct (either parameter being active should use sampling), but the comment is misleading.

Suggested change

# Use sampling-aware distributed log softmax if sampling parameters are provided

# Use sampling-aware distributed log softmax if either top_k or top_p is provided

Copilot · 2025-07-28T21:23:46Z


    Args:
-        vocab_parallel_logits (orch.Tensor): Logits distributed across tensor parallel workers,
+        vocab_parallel_logits (torch.Tensor): Logits distributed across tensor parallel workers,


The docstring has a typo - 'orch.Tensor' was partially corrected to 'torch.Tensor' but the diff shows this was already fixed.

feat: Top-k and Top-p support for dtensor worker with vLLM V0 when TP>1

c16be85

zhandaz requested a review from Copilot July 28, 2025 21:23

Copilot AI reviewed Jul 28, 2025

View reviewed changes

zhandaz self-assigned this Jul 28, 2025

zhandaz added the enhancement New feature or request label Jul 28, 2025

zhandaz mentioned this pull request Jul 28, 2025

feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1 #773

Open

wangshangsam linked an issue Jul 29, 2025 that may be closed by this pull request

Top-p/Top-k Sampling Params handling in VLLM v1 #69

Open

zhandaz marked this pull request as draft July 30, 2025 15:08

yuki-97 mentioned this pull request Aug 1, 2025

chore: upgrade vllm to v0.10.0 #766

Merged

zhandaz mentioned this pull request Aug 5, 2025

Top-p/Top-k Sampling Params handling in VLLM v1 #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2/2] Top-k and Top-p support for dtensor worker with vLLM V0 when `TP>1`#774

[2/2] Top-k and Top-p support for dtensor worker with vLLM V0 when `TP>1`#774
zhandaz wants to merge 1 commit intozhanda/top-p-kfrom
zhanda/top-p-k-tp

zhandaz commented Jul 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 28, 2025

Uh oh!

Copilot AI Jul 28, 2025

Uh oh!

Copilot AI Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (top_k is not None and top_k == -1) and (top_p is not None and top_p == 1.0):
	if (top_k is not None and top_k == -1) or (top_p is not None and top_p == 1.0):

	# Use sampling-aware distributed log softmax if sampling parameters are provided
	# Use sampling-aware distributed log softmax if either top_k or top_p is provided

Conversation

zhandaz commented Jul 28, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants