docs: GRPO documentation and Configuration cleanup#7
Merged
SahilJain314 merged 6 commits intomainfrom Mar 21, 2025
Merged
Conversation
bbc08f2 to
3cdffe3
Compare
terrykong
reviewed
Mar 21, 2025
terrykong
reviewed
Mar 21, 2025
terrykong
reviewed
Mar 21, 2025
terrykong
reviewed
Mar 21, 2025
terrykong
reviewed
Mar 21, 2025
parthchadha
reviewed
Mar 21, 2025
parthchadha
previously approved these changes
Mar 21, 2025
7a001c0 to
b99a400
Compare
terrykong
previously approved these changes
Mar 21, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
b99a400 to
bb7b255
Compare
parthchadha
approved these changes
Mar 21, 2025
KiddoZhu
pushed a commit
that referenced
this pull request
May 6, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
sharonyu-115
referenced
this pull request
in sharonyu-115/RL
Apr 27, 2026
Validated in jobs 11363285 (bake) + 11363286 (inference):
greedy generation against DeepSeek-V4-Flash-Base produced coherent text
("The capital of France is" -> " Paris. The capital of Germany is
Berlin..."). The original `_load_w13: size of tensor a (2048) must
match tensor b (16) at dim 0` blocker is now cleared at the vLLM half;
Automodel half still pending teammate's branch.
## What changed
### pyproject.toml + uv.lock — torch 2.10 rollback
Larkz's vllm wheel `vllm-0.19.2rc1.dev219+gc0879d948.cu129-cp313-cp313`
ships METADATA `Requires-Dist: torch==2.11.0` but its `_C.abi3.so` was
compiled in a `torch==2.10.0` venv (per larkz/nemorl-ds4/pyproject.toml
pin) and references the OLD 4-arg `MessageLogger(char const*, int, int,
bool)` constructor that torch 2.11's libc10 dropped. First bake (job
11362567) failed at venv rebuild with `undefined symbol:
_ZN3c1013MessageLoggerC1EPKciib`.
Reverted in 5 sites:
- project.dependencies torch 2.11.0 -> 2.10.0
- project.dependencies torchvision 0.26.0 -> 0.25.0
- dependency-groups.build torch 2.11.0 -> 2.10.0
- tool.uv.override-dependencies torch / torchaudio -> 2.10.0
- tool.uv.override-dependencies + explicit torchvision==0.25.0
uv lock confirmed: torch / torchaudio / torchvision rolled to 2.10.0+cu129
/ 2.10.0+cu129 / 0.25.0+cu129; nccl 2.28.9 -> 2.27.5 transitively.
### tools/patch_vllm_dsv4_base_fp8_quick.sh — post-#40860 anchors
Anchor #7 rewritten: upstream's rewritten `load_weights` now captures
the result before calling `self.model.finalize_mega_moe_weights()`, so
"return loader.load_weights(...)" no longer matches; new anchor
"loaded_params = loader.load_weights(...)".
Added 8th anchor: forces `use_mega_moe = False` when
`VLLM_DSV4_BASE_FP8=1`. Without this, post-load
`finalize_mega_moe_weights()` would invoke `experts.finalize_weights()`
on a non-MegaMoE FusedMoE layer (Base routes through `Fp8MoEMethod`,
not `DeepseekV4MegaMoEExperts`). Self-contained env gating means
recipes don't also need to override `moe_backend`.
Verified against the larkz wheel: 9 anchors apply cleanly,
`py_compile` clean.
### env_refresh_dsv4.sh — bake the Base patch
Now applies two patches over the freshly-built worker venv before
`--container-save`:
1. `tools/vllm_deepseek_v4_config_patch.py` (defensive superset of
upstream main's narrow rope kwargs fix)
2. `tools/patch_vllm_dsv4_base_fp8_quick.sh` (env-gated Base FP8
support; idempotent, no-op when `VLLM_DSV4_BASE_FP8` unset)
### tools/test_vllm_dsv4_base_inference.py (new)
Counterpart to test_vllm_dsv4_inference.py for Flash. Sets
`VLLM_DSV4_BASE_FP8=1` before vllm import, re-applies the patch script
defensively (idempotent), runs greedy generation through standalone
`vllm.LLM`. Ran clean against the new sqsh.
### slurm_jobs/dsv4_base_{bake,inference_test}.sub (new)
Submit scripts for the chained bake + inference flow:
- bake: 1 node 8 GPU 3:55 walltime, --container-save
- inference: 1 node 8 GPU 1:00 walltime, --dependency=afterok
### docs/model-readiness/deepseek-v4* (new + updated)
Tracks the full DSV4 bring-up. Bring-up status doc now reflects:
- vLLM half of Base blocker CLEARED
- Torch ABI rollback chronicled
- New sqsh artifact path
(.../nemo-rl-dsv4-vllm-c0879d-base-torch210-2026-04-27.sqsh ~99.8 GiB)
- Inference job 11363286 success, generated text logged
- Old broken sqsh marked for forensics
## What still blocks
Automodel-side Base FP8-block-quant state_dict_adapter is pending the
teammate's new branch. Once that lands, sqsh rebake + GRPO smoke can
follow. The vLLM-side recipe override `load_format: auto` may now be
removable since the new wheel includes the upstream dummy-load fix
(`finalize_weights()` 3 hits in main's deepseek_v4.py); validation
deferred until first GRPO smoke attempt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
sharonyu-115
referenced
this pull request
in sharonyu-115/RL
Apr 27, 2026
Re-applies the in-scope subset of reverted ef7bfb4 (kept the docs/scripts out of git per user request). pyproject.toml + uv.lock — torch 2.11 -> 2.10: larkz's vllm wheel (vllm-0.19.2rc1.dev219+gc0879d948) ships METADATA "Requires-Dist: torch==2.11.0" but its _C.abi3.so was compiled in a torch==2.10 venv (per larkz/nemorl-ds4/pyproject.toml pin) and references the OLD 4-arg MessageLogger(char const*, int, int, bool) constructor that torch 2.11's libc10 dropped. Bake job 11362567 failed at venv rebuild with `undefined symbol _ZN3c1013MessageLoggerC1EPKciib`. Reverted in 5 sites (project deps, build group, override-deps). uv lock confirms torch / torchaudio / torchvision rolled to 2.10.0+cu129 / 2.10.0+cu129 / 0.25.0+cu129; nccl 2.28.9 -> 2.27.5 transitively. Bake job 11363285 + inference job 11363286 verified the fix end-to-end. tools/patch_vllm_dsv4_base_fp8_quick.sh — post-#40860 anchors: - Anchor #7 rewritten: upstream's rewritten load_weights now captures the result before calling self.model.finalize_mega_moe_weights(), so "return loader.load_weights(...)" no longer matches; new anchor "loaded_params = loader.load_weights(...)". - Added 8th anchor: forces use_mega_moe = False when VLLM_DSV4_BASE_FP8=1. Without this, post-load finalize_mega_moe_weights() would invoke experts.finalize_weights() on a non-MegaMoE FusedMoE layer (Base routes through Fp8MoEMethod, not DeepseekV4MegaMoEExperts). Self-contained env gating means recipes don't also need to override moe_backend. Verified: 9 anchors apply cleanly against the larkz wheel, py_compile clean, end-to-end inference passed in job 11363286. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adds Quick-Start, Example Notebook, and Detailed Explanation documentation for GRPO