Skip to content

cp: docs: v0.5 performance results update (1772) into r0.5.0#1800

Merged
terrykong merged 1 commit intor0.5.0from
cherry-pick-1772-r0.5.0
Jan 21, 2026
Merged

cp: docs: v0.5 performance results update (1772) into r0.5.0#1800
terrykong merged 1 commit intor0.5.0from
cherry-pick-1772-r0.5.0

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented Jan 21, 2026

beep boop [🤖]: Hi @guyueh1 👋,

we've cherry picked #1772 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Release Notes

  • Documentation
    • Updated performance documentation to v0.5
    • Added benchmark results for H100 and GB200 GPUs with BF16/FP8 precision configurations
    • Reorganized performance metrics table with expanded model coverage (GRPO, DeepSeek V3, Qwen, LLAMA variants)
    • Added references to YAML recipes for benchmark reproduction

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 21, 2026

📝 Walkthrough

Walkthrough

Version label updated from v0.4 to v0.5, recipe references added, and benchmark table restructured with new columns (Algorithm, Model). New benchmark sections introduced for H100 and GB200 with BF16 and FP8 precision variants, along with corresponding performance data entries.

Changes

Cohort / File(s) Summary
Performance Summary Documentation
docs/about/performance-summary.md
Version label updated to v0.5; table structure reworked with explicit Algorithm and Model columns; new fields added (Generation, Training, Tokens/sec/GPU, Total Step time); benchmark sections expanded for H100 BF16/FP8 and GB200 BF16/FP8 with system and precision details; data entries updated for GRPO, DeepSeek V3, Qwen, and LLAMA models; note section formatting adjusted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

documentation, CI:docs, r0.5.0

Suggested reviewers

  • guyueh1
  • snowmanwwg
  • terrykong
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly indicates this is a cherry-pick of a documentation update for v0.5 performance results into the r0.5.0 branch, which aligns with the actual changeset updating performance benchmark data.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains documentation-only changes to performance summary with benchmark data updates and table restructuring, without code modifications or features affecting numerics/convergence.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/about/performance-summary.md (1)

36-36: Update the outdated recipe link to match v0.5.

This section references r0.4.0 while the page is v0.5. Please update to r0.5.0 (or remove the duplicate link) to avoid conflicting guidance.

🤖 Fix all issues with AI agents
In `@docs/about/performance-summary.md`:
- Around line 54-66: The Training tuple schema is inconsistent: the header shows
five fields "[TP,CP,EP,PP,VPP]" but rows contain varying arity (e.g.
"[1,1,1,1,1,2,n/a]" vs "[1,1,16,16,n/a]"); normalize every Training cell to the
same canonical tuple length and update the header to reflect the true parameter
list (add any missing parameter name such as "VP" or "VPP" as appropriate),
ensuring the order matches across all rows—apply this fix to the three
performance tables (the two tables around the shown diff and the GB200 table)
and make sure example rows like "[1,1,1,1,1,2,n/a]" and "[1,1,16,16,n/a]" are
rewritten to the agreed schema.
🧹 Nitpick comments (1)
docs/about/performance-summary.md (1)

19-19: Minor grammar tweak for clarity.

Consider: “NeMo RL has two training backends… this performance summary currently only shows numbers from the Megatron backend.”

Comment on lines +54 to +66
| Algorithm | Model |On/Off policy|T-Max Sequence Length|G-Average Seq len|#-GPUs|G-GBS|T-GBS|Generation [TP,PP]|Training [TP,CP,EP,PP,VPP]|Tokens / sec / GPU|Total Step time(s)|
|--------- |------- |-------- |----- |----- |------|---- |---- |---- |---- |--- |---|
| GRPO |LLAMA3.1_8B|On policy |4,096 |1,019 |16 |2,048|512 |[1,1] |[1,1,1,1,1,2,n/a] |1,581 | 92.8|
| GRPO |LLAMA3.1_8B|1-step Off |4,096 |1,123 |16 |2,048|512 |[1,1] |[1,1,1,1,1,1,n/a] |2,478 | 64.8|
| GRPO |DeepSeek V3|On policy |1,536 |744 |256 |512 |512 |[32,1] |[1,1,16,16,n/a] |12.7 | 134|
| GRPO |DeepSeek V3|1-step Off |1,536 |738 |512 |512 |512 |[32,1] |[1,1,16,16,n/a] |13.1 | 64.9|
| DAPO |DeepSeek V3|On policy |1,536 |974 |512 |512 |512 |[64,1] |[8,4,32,8,n/a] |2.45 | 974|
| GRPO |Qwen3-235B |On policy |8,192 |5,700 |128 |512 |512 |[16,1] |[2,2,16,8,n/a] |54.1 | 431|
| GRPO |Qwen3-235B |1-step Off |8,192 |5,707 |256 |512 |512 |[8,1] |[4,1,16,8,n/a] |58.7 | 203|
| GRPO |Qwen3-30B3A|On policy |4,096 |3,196 |32 |2,048|512 |[2,1] |[1,1,8,1,n/a] |1066 | 198|
| GRPO |Qwen3-30B3A|1-step Off |4,096 |3,201 |32 |2,048|512 |[2,1] |[1,1,8,2,n/a] |1391 | 154|
| GRPO |Qwen3-32B |On policy |4,096 |3,251 |32 |2,048|512 |[4,1] |[4,1,1,4,n/a] |571 | 376|
| GRPO |Qwen3-32B |1-step Off |4,096 |3,252 |64 |2,048|512 |[4,1] |[4,1,1,4,n/a] |538 | 200|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and view the relevant lines
if [ -f "docs/about/performance-summary.md" ]; then
  echo "=== File found, viewing lines 50-100 ==="
  sed -n '50,100p' docs/about/performance-summary.md
else
  echo "File not found at that path"
  fd -type f -name "performance-summary.md"
fi

Repository: NVIDIA-NeMo/RL

Length of output: 5673


Fix the Training tuple schema mismatch in the performance tables.

The header lists 5 fields [TP,CP,EP,PP,VPP], but rows vary between 5 and 7 elements. For example, [1,1,1,1,1,2,n/a] has 7 elements while [1,1,16,16,n/a] has 5, creating ambiguity about parameter meanings and order. Normalize all rows to the same arity and ensure the header accurately reflects the schema (including any missing parameters like VP).

This applies to all three performance tables: lines 54–66, 85–96, and the GB200 table.

🤖 Prompt for AI Agents
In `@docs/about/performance-summary.md` around lines 54 - 66, The Training tuple
schema is inconsistent: the header shows five fields "[TP,CP,EP,PP,VPP]" but
rows contain varying arity (e.g. "[1,1,1,1,1,2,n/a]" vs "[1,1,16,16,n/a]");
normalize every Training cell to the same canonical tuple length and update the
header to reflect the true parameter list (add any missing parameter name such
as "VP" or "VPP" as appropriate), ensuring the order matches across all
rows—apply this fix to the three performance tables (the two tables around the
shown diff and the GB200 table) and make sure example rows like
"[1,1,1,1,1,2,n/a]" and "[1,1,16,16,n/a]" are rewritten to the agreed schema.

@terrykong terrykong merged commit 42a4c8d into r0.5.0 Jan 21, 2026
57 of 61 checks passed
@terrykong terrykong deleted the cherry-pick-1772-r0.5.0 branch January 21, 2026 06:56
avenkateshha pushed a commit to avenkateshha/RL that referenced this pull request Apr 10, 2026
…DIA-NeMo#1800)

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick CI:docs Run doctest Documentation Improvements or additions to documentation Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants