Skip to content

docs: GRPO documentation and Configuration cleanup#7

Merged
SahilJain314 merged 6 commits intomainfrom
sahilj/more_docs
Mar 21, 2025
Merged

docs: GRPO documentation and Configuration cleanup#7
SahilJain314 merged 6 commits intomainfrom
sahilj/more_docs

Conversation

@SahilJain314
Copy link
Copy Markdown
Contributor

What does this PR do ?

Adds Quick-Start, Example Notebook, and Detailed Explanation documentation for GRPO

@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Mar 21, 2025
@SahilJain314 SahilJain314 changed the title Draft: GRPO documentation Draft: GRPO documentation and Configuration cleanup Mar 21, 2025
@SahilJain314 SahilJain314 force-pushed the sahilj/more_docs branch 2 times, most recently from bbc08f2 to 3cdffe3 Compare March 21, 2025 04:02
Comment thread docs/design_docs/design_and_philosophy.md Outdated
Comment thread docs/guides/grpo.md
Comment thread docs/guides/sft.md Outdated
Comment thread docs/guides/grpo.md
Comment thread docs/guides/grpo.md
Comment thread docs/guides/grpo.md Outdated
Comment thread docs/guides/grpo.md Outdated
@SahilJain314 SahilJain314 changed the title Draft: GRPO documentation and Configuration cleanup docs: GRPO documentation and Configuration cleanup Mar 21, 2025
parthchadha
parthchadha previously approved these changes Mar 21, 2025
@SahilJain314 SahilJain314 force-pushed the sahilj/more_docs branch 2 times, most recently from 7a001c0 to b99a400 Compare March 21, 2025 07:29
terrykong
terrykong previously approved these changes Mar 21, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@SahilJain314 SahilJain314 merged commit 8816926 into main Mar 21, 2025
5 checks passed
@SahilJain314 SahilJain314 deleted the sahilj/more_docs branch March 21, 2025 07:39
KiddoZhu pushed a commit that referenced this pull request May 6, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
sharonyu-115 referenced this pull request in sharonyu-115/RL Apr 27, 2026
Validated in jobs 11363285 (bake) + 11363286 (inference):
greedy generation against DeepSeek-V4-Flash-Base produced coherent text
("The capital of France is" -> " Paris. The capital of Germany is
Berlin..."). The original `_load_w13: size of tensor a (2048) must
match tensor b (16) at dim 0` blocker is now cleared at the vLLM half;
Automodel half still pending teammate's branch.

## What changed

### pyproject.toml + uv.lock — torch 2.10 rollback

Larkz's vllm wheel `vllm-0.19.2rc1.dev219+gc0879d948.cu129-cp313-cp313`
ships METADATA `Requires-Dist: torch==2.11.0` but its `_C.abi3.so` was
compiled in a `torch==2.10.0` venv (per larkz/nemorl-ds4/pyproject.toml
pin) and references the OLD 4-arg `MessageLogger(char const*, int, int,
bool)` constructor that torch 2.11's libc10 dropped. First bake (job
11362567) failed at venv rebuild with `undefined symbol:
_ZN3c1013MessageLoggerC1EPKciib`.

Reverted in 5 sites:
- project.dependencies torch 2.11.0 -> 2.10.0
- project.dependencies torchvision 0.26.0 -> 0.25.0
- dependency-groups.build torch 2.11.0 -> 2.10.0
- tool.uv.override-dependencies torch / torchaudio -> 2.10.0
- tool.uv.override-dependencies + explicit torchvision==0.25.0

uv lock confirmed: torch / torchaudio / torchvision rolled to 2.10.0+cu129
/ 2.10.0+cu129 / 0.25.0+cu129; nccl 2.28.9 -> 2.27.5 transitively.

### tools/patch_vllm_dsv4_base_fp8_quick.sh — post-#40860 anchors

Anchor #7 rewritten: upstream's rewritten `load_weights` now captures
the result before calling `self.model.finalize_mega_moe_weights()`, so
"return loader.load_weights(...)" no longer matches; new anchor
"loaded_params = loader.load_weights(...)".

Added 8th anchor: forces `use_mega_moe = False` when
`VLLM_DSV4_BASE_FP8=1`. Without this, post-load
`finalize_mega_moe_weights()` would invoke `experts.finalize_weights()`
on a non-MegaMoE FusedMoE layer (Base routes through `Fp8MoEMethod`,
not `DeepseekV4MegaMoEExperts`). Self-contained env gating means
recipes don't also need to override `moe_backend`.

Verified against the larkz wheel: 9 anchors apply cleanly,
`py_compile` clean.

### env_refresh_dsv4.sh — bake the Base patch

Now applies two patches over the freshly-built worker venv before
`--container-save`:
1. `tools/vllm_deepseek_v4_config_patch.py` (defensive superset of
   upstream main's narrow rope kwargs fix)
2. `tools/patch_vllm_dsv4_base_fp8_quick.sh` (env-gated Base FP8
   support; idempotent, no-op when `VLLM_DSV4_BASE_FP8` unset)

### tools/test_vllm_dsv4_base_inference.py (new)

Counterpart to test_vllm_dsv4_inference.py for Flash. Sets
`VLLM_DSV4_BASE_FP8=1` before vllm import, re-applies the patch script
defensively (idempotent), runs greedy generation through standalone
`vllm.LLM`. Ran clean against the new sqsh.

### slurm_jobs/dsv4_base_{bake,inference_test}.sub (new)

Submit scripts for the chained bake + inference flow:
- bake: 1 node 8 GPU 3:55 walltime, --container-save
- inference: 1 node 8 GPU 1:00 walltime, --dependency=afterok

### docs/model-readiness/deepseek-v4* (new + updated)

Tracks the full DSV4 bring-up. Bring-up status doc now reflects:
- vLLM half of Base blocker CLEARED
- Torch ABI rollback chronicled
- New sqsh artifact path
  (.../nemo-rl-dsv4-vllm-c0879d-base-torch210-2026-04-27.sqsh ~99.8 GiB)
- Inference job 11363286 success, generated text logged
- Old broken sqsh marked for forensics

## What still blocks

Automodel-side Base FP8-block-quant state_dict_adapter is pending the
teammate's new branch. Once that lands, sqsh rebake + GRPO smoke can
follow. The vLLM-side recipe override `load_format: auto` may now be
removable since the new wheel includes the upstream dummy-load fix
(`finalize_weights()` 3 hits in main's deepseek_v4.py); validation
deferred until first GRPO smoke attempt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
sharonyu-115 referenced this pull request in sharonyu-115/RL Apr 27, 2026
Re-applies the in-scope subset of reverted ef7bfb4 (kept the
docs/scripts out of git per user request).

pyproject.toml + uv.lock — torch 2.11 -> 2.10:
larkz's vllm wheel (vllm-0.19.2rc1.dev219+gc0879d948) ships METADATA
"Requires-Dist: torch==2.11.0" but its _C.abi3.so was compiled in a
torch==2.10 venv (per larkz/nemorl-ds4/pyproject.toml pin) and references
the OLD 4-arg MessageLogger(char const*, int, int, bool) constructor that
torch 2.11's libc10 dropped.  Bake job 11362567 failed at venv rebuild
with `undefined symbol _ZN3c1013MessageLoggerC1EPKciib`.  Reverted in
5 sites (project deps, build group, override-deps).  uv lock confirms
torch / torchaudio / torchvision rolled to 2.10.0+cu129 / 2.10.0+cu129
/ 0.25.0+cu129; nccl 2.28.9 -> 2.27.5 transitively.  Bake job 11363285
+ inference job 11363286 verified the fix end-to-end.

tools/patch_vllm_dsv4_base_fp8_quick.sh — post-#40860 anchors:
- Anchor #7 rewritten: upstream's rewritten load_weights now captures
  the result before calling self.model.finalize_mega_moe_weights(), so
  "return loader.load_weights(...)" no longer matches; new anchor
  "loaded_params = loader.load_weights(...)".
- Added 8th anchor: forces use_mega_moe = False when
  VLLM_DSV4_BASE_FP8=1.  Without this, post-load
  finalize_mega_moe_weights() would invoke experts.finalize_weights()
  on a non-MegaMoE FusedMoE layer (Base routes through Fp8MoEMethod,
  not DeepseekV4MegaMoEExperts).  Self-contained env gating means
  recipes don't also need to override moe_backend.
Verified: 9 anchors apply cleanly against the larkz wheel,
py_compile clean, end-to-end inference passed in job 11363286.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants