docs: GRPO documentation and Configuration cleanup by SahilJain314 · Pull Request #7 · NVIDIA-NeMo/RL

SahilJain314 · 2025-03-21T00:43:49Z

What does this PR do ?

Adds Quick-Start, Example Notebook, and Detailed Explanation documentation for GRPO

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Validated in jobs 11363285 (bake) + 11363286 (inference): greedy generation against DeepSeek-V4-Flash-Base produced coherent text ("The capital of France is" -> " Paris. The capital of Germany is Berlin..."). The original `_load_w13: size of tensor a (2048) must match tensor b (16) at dim 0` blocker is now cleared at the vLLM half; Automodel half still pending teammate's branch. ## What changed ### pyproject.toml + uv.lock — torch 2.10 rollback Larkz's vllm wheel `vllm-0.19.2rc1.dev219+gc0879d948.cu129-cp313-cp313` ships METADATA `Requires-Dist: torch==2.11.0` but its `_C.abi3.so` was compiled in a `torch==2.10.0` venv (per larkz/nemorl-ds4/pyproject.toml pin) and references the OLD 4-arg `MessageLogger(char const*, int, int, bool)` constructor that torch 2.11's libc10 dropped. First bake (job 11362567) failed at venv rebuild with `undefined symbol: _ZN3c1013MessageLoggerC1EPKciib`. Reverted in 5 sites: - project.dependencies torch 2.11.0 -> 2.10.0 - project.dependencies torchvision 0.26.0 -> 0.25.0 - dependency-groups.build torch 2.11.0 -> 2.10.0 - tool.uv.override-dependencies torch / torchaudio -> 2.10.0 - tool.uv.override-dependencies + explicit torchvision==0.25.0 uv lock confirmed: torch / torchaudio / torchvision rolled to 2.10.0+cu129 / 2.10.0+cu129 / 0.25.0+cu129; nccl 2.28.9 -> 2.27.5 transitively. ### tools/patch_vllm_dsv4_base_fp8_quick.sh — post-#40860 anchors Anchor #7 rewritten: upstream's rewritten `load_weights` now captures the result before calling `self.model.finalize_mega_moe_weights()`, so "return loader.load_weights(...)" no longer matches; new anchor "loaded_params = loader.load_weights(...)". Added 8th anchor: forces `use_mega_moe = False` when `VLLM_DSV4_BASE_FP8=1`. Without this, post-load `finalize_mega_moe_weights()` would invoke `experts.finalize_weights()` on a non-MegaMoE FusedMoE layer (Base routes through `Fp8MoEMethod`, not `DeepseekV4MegaMoEExperts`). Self-contained env gating means recipes don't also need to override `moe_backend`. Verified against the larkz wheel: 9 anchors apply cleanly, `py_compile` clean. ### env_refresh_dsv4.sh — bake the Base patch Now applies two patches over the freshly-built worker venv before `--container-save`: 1. `tools/vllm_deepseek_v4_config_patch.py` (defensive superset of upstream main's narrow rope kwargs fix) 2. `tools/patch_vllm_dsv4_base_fp8_quick.sh` (env-gated Base FP8 support; idempotent, no-op when `VLLM_DSV4_BASE_FP8` unset) ### tools/test_vllm_dsv4_base_inference.py (new) Counterpart to test_vllm_dsv4_inference.py for Flash. Sets `VLLM_DSV4_BASE_FP8=1` before vllm import, re-applies the patch script defensively (idempotent), runs greedy generation through standalone `vllm.LLM`. Ran clean against the new sqsh. ### slurm_jobs/dsv4_base_{bake,inference_test}.sub (new) Submit scripts for the chained bake + inference flow: - bake: 1 node 8 GPU 3:55 walltime, --container-save - inference: 1 node 8 GPU 1:00 walltime, --dependency=afterok ### docs/model-readiness/deepseek-v4* (new + updated) Tracks the full DSV4 bring-up. Bring-up status doc now reflects: - vLLM half of Base blocker CLEARED - Torch ABI rollback chronicled - New sqsh artifact path (.../nemo-rl-dsv4-vllm-c0879d-base-torch210-2026-04-27.sqsh ~99.8 GiB) - Inference job 11363286 success, generated text logged - Old broken sqsh marked for forensics ## What still blocks Automodel-side Base FP8-block-quant state_dict_adapter is pending the teammate's new branch. Once that lands, sqsh rebake + GRPO smoke can follow. The vLLM-side recipe override `load_format: auto` may now be removable since the new wheel includes the upstream dummy-load fix (`finalize_weights()` 3 hits in main's deepseek_v4.py); validation deferred until first GRPO smoke attempt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Re-applies the in-scope subset of reverted ef7bfb4 (kept the docs/scripts out of git per user request). pyproject.toml + uv.lock — torch 2.11 -> 2.10: larkz's vllm wheel (vllm-0.19.2rc1.dev219+gc0879d948) ships METADATA "Requires-Dist: torch==2.11.0" but its _C.abi3.so was compiled in a torch==2.10 venv (per larkz/nemorl-ds4/pyproject.toml pin) and references the OLD 4-arg MessageLogger(char const*, int, int, bool) constructor that torch 2.11's libc10 dropped. Bake job 11362567 failed at venv rebuild with `undefined symbol _ZN3c1013MessageLoggerC1EPKciib`. Reverted in 5 sites (project deps, build group, override-deps). uv lock confirms torch / torchaudio / torchvision rolled to 2.10.0+cu129 / 2.10.0+cu129 / 0.25.0+cu129; nccl 2.28.9 -> 2.27.5 transitively. Bake job 11363285 + inference job 11363286 verified the fix end-to-end. tools/patch_vllm_dsv4_base_fp8_quick.sh — post-#40860 anchors: - Anchor #7 rewritten: upstream's rewritten load_weights now captures the result before calling self.model.finalize_mega_moe_weights(), so "return loader.load_weights(...)" no longer matches; new anchor "loaded_params = loader.load_weights(...)". - Added 8th anchor: forces use_mega_moe = False when VLLM_DSV4_BASE_FP8=1. Without this, post-load finalize_mega_moe_weights() would invoke experts.finalize_weights() on a non-MegaMoE FusedMoE layer (Base routes through Fp8MoEMethod, not DeepseekV4MegaMoEExperts). Self-contained env gating means recipes don't also need to override moe_backend. Verified: 9 anchors apply cleanly against the larkz wheel, py_compile clean, end-to-end inference passed in job 11363286. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

github-actions Bot added the Documentation Improvements or additions to documentation label Mar 21, 2025

SahilJain314 changed the title ~~Draft: GRPO documentation~~ Draft: GRPO documentation and Configuration cleanup Mar 21, 2025

SahilJain314 force-pushed the sahilj/more_docs branch 2 times, most recently from bbc08f2 to 3cdffe3 Compare March 21, 2025 04:02

SahilJain314 added the Run CICD label Mar 21, 2025