docs(optimizer): Add Muon post-training support#1848
Conversation
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
📝 WalkthroughWalkthroughThis pull request introduces documentation for the Muon optimizer integration with NeMo RL, including a comprehensive guide with configuration examples, and adds the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/guides/muon-optimizer.md`:
- Line 91: Update the two fenced command blocks that currently lack language
identifiers so they pass MD040 linting: add "bash" after the opening backticks
for the block containing the command "uv run examples/run_sft.py" and likewise
add "bash" for the block containing "uv run examples/run_grpo_math.py" (these
are the two examples referenced around the comment). Ensure both opening fences
read ```bash so the markdown linter recognizes them as shell/command examples.
- Around line 22-23: The example enables both Megatron and DTensor, which
contradicts the "Megatron-only" statement; update the example so only Megatron
is enabled by either removing the policy.dtensor_cfg.enabled line or explicitly
setting policy.dtensor_cfg.enabled=false, and ensure
policy.megatron_cfg.enabled=true remains; reference the config keys
policy.megatron_cfg.enabled and policy.dtensor_cfg.enabled and adjust the
surrounding text to reflect that DTensor is disabled in the Megatron-only
example.
- Line 92: The SFT example command block is not copy/paste runnable: add a
trailing backslash to the end of the "uv run examples/run_sft.py" line so the
shell sees the next lines as continuations, and split the merged config
arguments so "policy.tokenizer.name=Qwen/Qwen3-235B-A22B" and
"checkpointing.enabled=True" are on separate lines (they currently appear merged
on the same line), ensuring each config flag is its own continued line in the
examples/run_sft.py invocation.
In `@pyproject.toml`:
- Line 119: Remove or replace the invalid dependency pin
"emerging-optimizers==0.1.0" in pyproject.toml: either remove the line entirely
or change it to a valid installable spec (e.g., a released PyPI version if
available or a VCS URL like a git+https://...@<commit_or_tag> for the
Emerging-Optimizers repo). Update the dependency entry that currently reads
emerging-optimizers==0.1.0 so the package installer can resolve it during the uv
sync --extra mcore step.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
docs/assets/muon-dapo-reward.pngis excluded by!**/*.pngdocs/assets/muon-dapo-val-acc.pngis excluded by!**/*.pngdocs/assets/muon-sft-comparison.pngis excluded by!**/*.pnguv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
docs/guides/muon-optimizer.mddocs/index.mdpyproject.toml
Signed-off-by: Aditya Vavre <avavre@nvidia.com>
|
/ok to test c41ca0a |
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Aditya Vavre <avavre@nvidia.com> Co-authored-by: adityavavreNVDA <avavre@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Shih-Yang Liu <shihyangl@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Aditya Vavre <avavre@nvidia.com> Co-authored-by: adityavavreNVDA <avavre@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Shih-Yang Liu <shihyangl@nvidia.com>
|
@qiaochuz-nv to QA |
What does this PR do ?
NOTE: blocked by #1787
W&B report with latest experiments (all using adam-pretrained base models): https://wandb.ai/nvidia/ashors-muon/reports/Muon-for-Post-Training--VmlldzoxNTAzMzcwMA
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Documentation
Chores