feat: support DP inside vLLM for EP#1081
Conversation
terrykong
left a comment
There was a problem hiding this comment.
lgtm. @parthchadha to review
Signed-off-by: Yuki Huang <yukih@nvidia.com> fix type Signed-off-by: Yuki Huang <yukih@nvidia.com> fix rank Signed-off-by: Yuki Huang <yukih@nvidia.com> fix local rank Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
93dfe12 to
14b45e1
Compare
WalkthroughConfigs replace boolean Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Orchestrator
participant VirtualCluster
participant WorkerGroups
participant Worker as "vLLM Worker"
Orchestrator->>VirtualCluster: init/create placement groups (retries/backoff, timeout)
VirtualCluster-->>Orchestrator: placement groups ready + (ip,port) per bundle
Orchestrator->>WorkerGroups: request workers (bundle_indices_list)
loop per (pg_idx,bundle_idx)
WorkerGroups->>VirtualCluster: get_available_address_and_port(pg_idx,bundle_idx)
VirtualCluster-->>WorkerGroups: (ip, port)
end
WorkerGroups->>Worker: spawn with env { AVAILABLE_ADDR_LIST, AVAILABLE_PORT_LIST, ... }
Worker->>Worker: read cfg.expert_parallel_size and tensor_parallel_size
alt expert_parallel_size > tensor_parallel_size
Worker->>Worker: compute DP sizing, set VLLM_DP_RANK / VLLM_DP_RANK_LOCAL
Worker->>Worker: set VLLM_DP_MASTER_IP and VLLM_DP_MASTER_PORT from AVAILABLE_* lists
end
Worker-->>Orchestrator: worker ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1)
41-49: No 'enable_expert_parallel' config key found; fix vllm config typing
- Sweep: no configs/tests contain an "enable_expert_parallel" config key — it's only a runtime variable in nemo_rl/models/generation/vllm/vllm_worker.py (set from expert_parallel_size and passed to vLLM).
- Action (critical): nemo_rl/models/generation/vllm/config.py currently declares "expert_parallel_size: bool" (line ~23) but configs/tests use integers. Change that to "expert_parallel_size: int" and run validation/tests.
🧹 Nitpick comments (22)
examples/configs/recipes/llm/grpo-gspo-deepscaler-1.5b-8K.yaml (1)
105-109: Consistent migration to expert_parallel_size.EP=1 with TP=1 is a no-op; OK. Add the EP%TP note for clarity.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (1)
151-154: Migration LGTM; EP disabled by default.Here TP=4; keep a comment to avoid misconfig when enabling EP later.
- expert_parallel_size: 1 + # If you enable EP, set expert_parallel_size to a multiple of TP (=4 here) + expert_parallel_size: 1examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (1)
99-101: Rename is correct; matches PR semantics (EP=1 => vLLM DP off).Optional: add the EP%TP constraint comment.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (1)
99-101: Good: default EP off; TP=4 here.Add a guardrail comment to prevent invalid EP later.
- expert_parallel_size: 1 + # If enabling EP, choose a multiple of TP (=4) + expert_parallel_size: 1examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (1)
99-101: Change is consistent; EP off by default.Same optional comment suggestion.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/recipes/llm/grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml (1)
133-138: Rename aligns with code; maintainers note for EP/TP constraint recommended.TP=4 in this config; comment will help future edits.
- expert_parallel_size: 1 + # To enable EP, set to a multiple of TP (=4). Otherwise keep at 1. + expert_parallel_size: 1examples/configs/recipes/llm/grpo-deepscaler-1.5b-8K.yaml (1)
104-108: LGTM; EP=1 keeps previous behavior.Add the EP%TP reminder to reduce misconfiguration.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/evals/eval.yaml (1)
24-24: EP size switch ACK; default safe for single-GPU eval.Value 1 keeps EP disabled. Consider adding a comment that EP>1 on eval requires EP % TP == 0 and enough GPUs.
examples/configs/recipes/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.yaml (1)
99-99: Migration is correct; keep EP constraints visible to users.Optional: add a YAML comment noting EP > 1 requires EP % TP == 0 and may enable vLLM-internal DP = EP/TP.
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (1)
129-129: Good rename; default preserves behavior.No functional change. If you plan to exercise the new DP-in-vLLM path in CI, consider adding a variant with EP > 1 and TP = 1 for coverage.
examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (1)
99-99: Rename LGTM; add minimal guardrail note (optional).A short inline comment like “1 disables EP; >1 requires EP % TP == 0” could prevent misconfigurations.
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (1)
99-99: Config update is correct.Given TP=4, if someone sets EP>1 later, remind EP must be a multiple of 4; DP_in_vLLM = EP/4. Optional comment.
examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1)
41-49: Rename to expert_parallel_size looks good; default preserves behavior.Setting expert_parallel_size: 1 is backward‑compatible. When users tune EP > 1, please document that EP must be divisible by TP (EP % TP == 0) so vLLM DP = EP/TP is integral.
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (1)
95-102: Good default for EP.expert_parallel_size: 1 preserves previous behavior; note in docs that EP > 1 requires EP % TP == 0.
tests/unit/environments/test_code_environment.py (1)
51-62: Use boolean for enforce_eager (avoid string).The current value is a string "False"; prefer boolean False to match runtime expectations.
Apply this diff:
"gpu_memory_utilization": 0.6, - "enforce_eager": "False", + "enforce_eager": False,tests/unit/models/generation/test_vllm_generation.py (2)
53-65: EP key migration OK; also fix enforce_eager type.Key rename is correct. Please switch enforce_eager from string to boolean for consistency.
"load_format": "auto", - "enforce_eager": "False", + "enforce_eager": False,
53-65: Add a minimal EP>1 test to cover new DP-in-vLLM path.Parametrize a case with TP=1, EP=2 to validate layout checks and env wiring.
# Outside this block: add a new test def test_vllm_ep_gt_1_layout(cluster, tokenizer): cfg = deepcopy(basic_vllm_test_config) cfg = configure_generation_config(cfg, tokenizer, is_eval=True) cfg["vllm_cfg"]["tensor_parallel_size"] = 1 cfg["vllm_cfg"]["expert_parallel_size"] = 2 # triggers vLLM-internal DP v = VllmGeneration(cluster, cfg) v.finish_generation() # init path exercises EP layout v.shutdown()tests/unit/environments/test_retriever.py (1)
51-61: Set enforce_eager to boolean False.Avoid quoting booleans in config dict.
"gpu_memory_utilization": 0.6, - "enforce_eager": "False", + "enforce_eager": False,nemo_rl/models/generation/vllm/vllm_generation.py (1)
154-163: Make vLLM DP env setting robust; optional NCCL resiliency
- Gate
VLLM_DP_SIZEon computedvllm_dp_size > 1(covers future changes).- Optional: set
NCCL_ASYNC_ERROR_HANDLING=1for safer comms.Apply this diff:
if not self.cfg["colocated"]["enabled"]: env_vars["NCCL_CUMEM_ENABLE"] = "1" - # We should use vLLM DP if ep_size > tp_size since EP_SIZE = DP_SIZE * TP_SIZE in vLLM. - # See details in https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/data_parallel.py - if self.ep_size > self.tp_size: - env_vars["VLLM_DP_SIZE"] = str(self.vllm_dp_size) + # Use vLLM internal DP when DP_SIZE > 1 (EP = DP * TP). + if self.vllm_dp_size > 1: + env_vars["VLLM_DP_SIZE"] = str(self.vllm_dp_size) + # Optional resiliency: + # env_vars.setdefault("NCCL_ASYNC_ERROR_HANDLING", "1")nemo_rl/distributed/virtual_cluster.py (3)
354-372: Validate pg_idx and bundle_idx before indexingOut-of-range indices will surface as
IndexError. Fail fast with a clear message.Apply this diff:
- placement_groups = self.get_placement_groups() - if len(placement_groups) == 1: - pg = placement_groups[0] - else: - pg = placement_groups[pg_idx] + placement_groups = self.get_placement_groups() + if not placement_groups: + raise RuntimeError("No placement groups available") + if len(placement_groups) == 1: + pg = placement_groups[0] + else: + if not (0 <= pg_idx < len(placement_groups)): + raise ValueError( + f"pg_idx out of range: {pg_idx} (available={len(placement_groups)})" + ) + pg = placement_groups[pg_idx] + if not (0 <= bundle_idx < pg.bundle_count): + raise ValueError( + f"bundle_idx out of range: {bundle_idx} (bundle_count={pg.bundle_count})" + )
375-383: Confirm Ray behavior with num_cpus=0 + PG scheduling across Ray versionsSetting
num_cpus=0is intentional to avoid contention, but older Ray versions had quirks with PG enforcement on zero‑resource tasks. If you target older clusters, consider reserving a fractional CPU (e.g., 0.01) on the bundle.Would you like a small compatibility matrix check script for your supported Ray versions?
386-388: Minor: tighten error message or use a dedicated exceptionMessage is fine; if you want to appease linters, shorten or use a custom exception type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
examples/configs/evals/eval.yaml(1 hunks)examples/configs/grpo_math_1B.yaml(1 hunks)examples/configs/grpo_sliding_puzzle.yaml(1 hunks)examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml(1 hunks)examples/configs/recipes/llm/grpo-deepscaler-1.5b-8K.yaml(1 hunks)examples/configs/recipes/llm/grpo-gemma3-1b-it-1n8g-fsdp2tp1.yaml(1 hunks)examples/configs/recipes/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.yaml(1 hunks)examples/configs/recipes/llm/grpo-gspo-deepscaler-1.5b-8K.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml(1 hunks)examples/configs/recipes/llm/grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml(1 hunks)examples/configs/recipes/vlm/vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml(1 hunks)examples/configs/recipes/vlm/vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml(1 hunks)examples/configs/vlm_grpo_3B.yaml(1 hunks)nemo_rl/distributed/virtual_cluster.py(2 hunks)nemo_rl/distributed/worker_groups.py(2 hunks)nemo_rl/models/generation/vllm/config.py(1 hunks)nemo_rl/models/generation/vllm/vllm_generation.py(4 hunks)nemo_rl/models/generation/vllm/vllm_worker.py(2 hunks)tests/unit/environments/test_code_environment.py(1 hunks)tests/unit/environments/test_retriever.py(1 hunks)tests/unit/experience/test_rollouts.py(1 hunks)tests/unit/models/generation/test_vllm_generation.py(1 hunks)tests/unit/models/generation/test_vllm_large_model.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_rl/distributed/worker_groups.py (1)
nemo_rl/distributed/virtual_cluster.py (1)
get_available_address_and_port(354-388)
nemo_rl/models/generation/vllm/vllm_generation.py (3)
nemo_rl/distributed/worker_groups.py (1)
dp_size(590-592)tests/unit/models/generation/test_vllm_generation.py (1)
cluster(211-222)nemo_rl/distributed/virtual_cluster.py (1)
world_size(348-349)
🪛 Ruff (0.12.2)
nemo_rl/distributed/worker_groups.py
463-463: Loop control variable group_idx not used within loop body
(B007)
464-464: Loop control variable local_rank not used within loop body
(B007)
nemo_rl/models/generation/vllm/vllm_worker.py
347-347: Use of possibly insecure function; consider using ast.literal_eval
(S307)
348-348: Use of possibly insecure function; consider using ast.literal_eval
(S307)
nemo_rl/distributed/virtual_cluster.py
386-388: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Lint check
🔇 Additional comments (11)
examples/configs/grpo_sliding_puzzle.yaml (1)
44-46: Rename to expert_parallel_size looks good; preserves default off (1).File: examples/configs/grpo_sliding_puzzle.yaml (lines 44–46)
Add a short inline note that when setting expert_parallel_size > 1 it must be a multiple of tensor_parallel_size.
- expert_parallel_size: 1 + # Set >1 to enable EP. Constraint: expert_parallel_size % tensor_parallel_size == 0 + expert_parallel_size: 1Verification not completed: the ripgrep run returned "No files were searched". Re-run from the repo root to confirm there are no lingering enable_expert_parallel keys and that an alias/deprecation path exists. Suggested command:
rg -n --hidden -S --no-ignore 'enable_expert_parallel|expert_parallel_size' -g '!**/site-packages/**'examples/configs/recipes/llm/grpo-gemma3-1b-it-1n8g-fsdp2tp1.yaml (1)
98-98: Rename LGTM.Defaults are sane; nothing further.
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8.yaml (1)
128-128: Migration OK; FP8 generation unaffected.Nothing blocking. If you intend to demo EP>1 here, ensure kernel/layout support matches FP8 path.
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (1)
99-99: expert_parallel_size default preserved — add YAML hint & confirm TP mismatch
- vllm_worker sets enable_expert_parallel = expert_parallel_size > 1 and passes it through (nemo_rl/models/generation/vllm/vllm_worker.py:138, 386) — expert_parallel_size: 1 preserves current behavior.
- Add a one-line YAML comment next to examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (around line 99) stating: "If >1, expert_parallel_size must be a multiple of generation.vllm_cfg.tensor_parallel_size; vLLM internal DP = EP / TP."
- Quick rg run did not surface tensor_parallel_size entries under examples/configs — confirm whether the dtensor_cfg.tensor_parallel_size=8 vs generation.vllm_cfg.tensor_parallel_size=4 (train vs gen) mismatch is intentional; if not, align or document.
examples/configs/recipes/vlm/vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml (1)
110-117: EP size key migration OK.expert_parallel_size: 1 aligns with new runtime logic and keeps EP disabled by default.
examples/configs/vlm_grpo_3B.yaml (1)
111-118: Consistent EP config.Good: expert_parallel_size replaces the old boolean and stays at 1.
examples/configs/recipes/vlm/vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml (1)
109-116: Looks good.EP key rename applied consistently with value 1.
examples/configs/grpo_math_1B.yaml (1)
181-182: YAML update is consistent with the new EP semantics.Comment is correct (EP must be a multiple of TP).
tests/unit/experience/test_rollouts.py (1)
201-202: Config rename to expert_parallel_size is correct here.Keeps default EP=1 for tests.
nemo_rl/models/generation/vllm/vllm_worker.py (1)
137-139: Deriving enable_expert_parallel from expert_parallel_size is correct.This aligns with the new config contract (EP enabled iff size > 1).
Confirm all config sources now pass an int for expert_parallel_size (no strings).
nemo_rl/distributed/virtual_cluster.py (1)
390-397: Delegation wrapper looks goodBackwards compatibility via delegation to
(pg_idx=0, bundle_idx=0)is clean.
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
nemo_rl/models/generation/vllm/vllm_generation.py (3)
58-69: Compute sizes as ints, raise instead of assert, and defer vLLM-DP calc.Casts avoid accidental str math; explicit exceptions survive
-O; computevllm_dp_sizeonly after EP/TP validation.- self.tp_size = self.cfg["vllm_cfg"]["tensor_parallel_size"] - self.pp_size = self.cfg["vllm_cfg"]["pipeline_parallel_size"] - self.ep_size = self.cfg["vllm_cfg"]["expert_parallel_size"] - self.model_parallel_size = self.tp_size * self.pp_size - - assert cluster.world_size() % self.model_parallel_size == 0, ( - "World size must be a multiple of model parallel size. " - f"Got world size {cluster.world_size()} and model parallel size (TP * PP) {self.model_parallel_size}." - ) - self.dp_size = cluster.world_size() // self.model_parallel_size - self.vllm_dp_size = self.ep_size // self.tp_size + vllm_cfg = self.cfg["vllm_cfg"] + self.tp_size = int(vllm_cfg["tensor_parallel_size"]) + self.pp_size = int(vllm_cfg["pipeline_parallel_size"]) + self.ep_size = int(vllm_cfg["expert_parallel_size"]) + self.model_parallel_size = self.tp_size * self.pp_size + + total_mp = self.model_parallel_size + if cluster.world_size() % total_mp != 0: + raise ValueError( + f"world_size={cluster.world_size()} is not divisible by TP*PP={self.tp_size}*{self.pp_size}={total_mp}" + ) + self.dp_size = cluster.world_size() // total_mp
70-88: Replace assert-based config checks; validate EP vs TP before computing vLLM-DP.Asserts can be optimized out; make the checks deterministic and only then set
vllm_dp_size.- if self.pp_size > 1: - assert self.cfg["vllm_cfg"]["async_engine"], ( - "When pipeline_parallel_size > 1, async_engine must be set to True in the vLLM configuration. " - "You can enable it by adding `policy.generation.vllm_cfg.async_engine=true` to your command." - ) + if self.pp_size > 1 and not vllm_cfg["async_engine"]: + raise ValueError( + "When pipeline_parallel_size > 1, set vllm_cfg.async_engine=true." + ) - if self.ep_size > 1: - assert self.ep_size % self.tp_size == 0, ( - "When EP > 1, EP must be a multiple of TP since vLLM's EP = DP * TP. " - "Please update your configuration to set expert_parallel_size to a multiple of tensor_parallel_size." - ) - if self.ep_size != self.tp_size: - # vLLM's EP = DP * TP, so here we need to use DP inside vLLM. - assert not self.cfg["vllm_cfg"]["async_engine"], ( - "vLLM async_engine has some issues when using DP inside vLLM. " - "Please update your configuration to set `policy.generation.vllm_cfg.async_engine=false`. " - "See https://github.com/NVIDIA-NeMo/RL/issues/1101 for more details." - ) + if self.ep_size > 1: + if self.ep_size % self.tp_size != 0: + raise ValueError( + "When EP > 1, expert_parallel_size must be a multiple of tensor_parallel_size (EP = DP * TP in vLLM)." + ) + if self.ep_size != self.tp_size and vllm_cfg["async_engine"]: + raise ValueError( + "async_engine is not supported when using DP inside vLLM (EP != TP). Set vllm_cfg.async_engine=false. " + "See https://github.com/NVIDIA-NeMo/RL/issues/1101 for more details." + ) + # vLLM DP size computed after EP/TP validation + self.vllm_dp_size = max(1, self.ep_size // self.tp_size)
199-201: Don’t use assert for DP-size contract; raise a deterministic error.Prevents silent bypass under
python -O.- assert self.dp_size == self.worker_group.dp_size, ( - f"Data parallel size mismatch. Expected {self.dp_size}, got {self.worker_group.dp_size}" - ) + if self.dp_size != self.worker_group.dp_size: + raise RuntimeError( + f"Data-parallel size mismatch: expected {self.dp_size}, got {self.worker_group.dp_size}" + )
🧹 Nitpick comments (1)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
93-114: Fix non-interpolated constants in error messages.These lines reference placeholders without f-strings; users will see braces literally.
- "does not return logprobs after top_k filtering. Values >= {TOP_K_THRESHOLD} are allowed " + f"does not return logprobs after top_k filtering. Values >= {TOP_K_THRESHOLD} are allowed " ... - "does not return logprobs after top_p filtering. Values >= {TOP_P_THRESHOLD} are allowed " + f"does not return logprobs after top_p filtering. Values >= {TOP_P_THRESHOLD} are allowed "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_rl/models/generation/vllm/vllm_generation.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_rl/models/generation/vllm/vllm_generation.py (3)
tests/unit/models/generation/test_vllm_generation.py (1)
cluster(211-222)nemo_rl/distributed/virtual_cluster.py (1)
world_size(348-349)nemo_rl/distributed/worker_groups.py (1)
dp_size(590-592)
🔇 Additional comments (2)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
127-132: NamedSharding reshape looks good given the divisibility guard.With the explicit divisibility check above, this reshape is safe.
160-169: Env wiring for VLLM_DP_SIZE is correct and scoped.Setting VLLM_DP_SIZE only when EP > TP matches the DP-inside-vLLM contract.
|
this commit passed as part of this merge queue run: https://github.com/NVIDIA-NeMo/RL/actions/runs/17828559136. will manually merge this one |
Signed-off-by: Yuki Huang <yukih@nvidia.com>
What does this PR do ?
enable_expert_paralleltoexpert_parallel_size.enable_expert_parallelare all set to 1 by default, so that nothing will be affected.For how does DP in generation side be like after this PR:
Issues
Related #908.
Test Result
green: main branch baseline, TP32
blue: feature branch, using the same setting as baseline
red/pink: feature branch, TP8EP64
Convergence

Time cost

The EP setting is not fine tuned, so the generation time is slower.
Summary by CodeRabbit
New Features
Reliability
Refactor
Tests