cp: docs: v0.5 performance results update (1772) into r0.5.0#1800
cp: docs: v0.5 performance results update (1772) into r0.5.0#1800
docs: v0.5 performance results update (1772) into r0.5.0#1800Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughVersion label updated from v0.4 to v0.5, recipe references added, and benchmark table restructured with new columns (Algorithm, Model). New benchmark sections introduced for H100 and GB200 with BF16 and FP8 precision variants, along with corresponding performance data entries. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/about/performance-summary.md (1)
36-36: Update the outdated recipe link to match v0.5.This section references r0.4.0 while the page is v0.5. Please update to r0.5.0 (or remove the duplicate link) to avoid conflicting guidance.
🤖 Fix all issues with AI agents
In `@docs/about/performance-summary.md`:
- Around line 54-66: The Training tuple schema is inconsistent: the header shows
five fields "[TP,CP,EP,PP,VPP]" but rows contain varying arity (e.g.
"[1,1,1,1,1,2,n/a]" vs "[1,1,16,16,n/a]"); normalize every Training cell to the
same canonical tuple length and update the header to reflect the true parameter
list (add any missing parameter name such as "VP" or "VPP" as appropriate),
ensuring the order matches across all rows—apply this fix to the three
performance tables (the two tables around the shown diff and the GB200 table)
and make sure example rows like "[1,1,1,1,1,2,n/a]" and "[1,1,16,16,n/a]" are
rewritten to the agreed schema.
🧹 Nitpick comments (1)
docs/about/performance-summary.md (1)
19-19: Minor grammar tweak for clarity.Consider: “NeMo RL has two training backends… this performance summary currently only shows numbers from the Megatron backend.”
| | Algorithm | Model |On/Off policy|T-Max Sequence Length|G-Average Seq len|#-GPUs|G-GBS|T-GBS|Generation [TP,PP]|Training [TP,CP,EP,PP,VPP]|Tokens / sec / GPU|Total Step time(s)| | ||
| |--------- |------- |-------- |----- |----- |------|---- |---- |---- |---- |--- |---| | ||
| | GRPO |LLAMA3.1_8B|On policy |4,096 |1,019 |16 |2,048|512 |[1,1] |[1,1,1,1,1,2,n/a] |1,581 | 92.8| | ||
| | GRPO |LLAMA3.1_8B|1-step Off |4,096 |1,123 |16 |2,048|512 |[1,1] |[1,1,1,1,1,1,n/a] |2,478 | 64.8| | ||
| | GRPO |DeepSeek V3|On policy |1,536 |744 |256 |512 |512 |[32,1] |[1,1,16,16,n/a] |12.7 | 134| | ||
| | GRPO |DeepSeek V3|1-step Off |1,536 |738 |512 |512 |512 |[32,1] |[1,1,16,16,n/a] |13.1 | 64.9| | ||
| | DAPO |DeepSeek V3|On policy |1,536 |974 |512 |512 |512 |[64,1] |[8,4,32,8,n/a] |2.45 | 974| | ||
| | GRPO |Qwen3-235B |On policy |8,192 |5,700 |128 |512 |512 |[16,1] |[2,2,16,8,n/a] |54.1 | 431| | ||
| | GRPO |Qwen3-235B |1-step Off |8,192 |5,707 |256 |512 |512 |[8,1] |[4,1,16,8,n/a] |58.7 | 203| | ||
| | GRPO |Qwen3-30B3A|On policy |4,096 |3,196 |32 |2,048|512 |[2,1] |[1,1,8,1,n/a] |1066 | 198| | ||
| | GRPO |Qwen3-30B3A|1-step Off |4,096 |3,201 |32 |2,048|512 |[2,1] |[1,1,8,2,n/a] |1391 | 154| | ||
| | GRPO |Qwen3-32B |On policy |4,096 |3,251 |32 |2,048|512 |[4,1] |[4,1,1,4,n/a] |571 | 376| | ||
| | GRPO |Qwen3-32B |1-step Off |4,096 |3,252 |64 |2,048|512 |[4,1] |[4,1,1,4,n/a] |538 | 200| |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and view the relevant lines
if [ -f "docs/about/performance-summary.md" ]; then
echo "=== File found, viewing lines 50-100 ==="
sed -n '50,100p' docs/about/performance-summary.md
else
echo "File not found at that path"
fd -type f -name "performance-summary.md"
fiRepository: NVIDIA-NeMo/RL
Length of output: 5673
Fix the Training tuple schema mismatch in the performance tables.
The header lists 5 fields [TP,CP,EP,PP,VPP], but rows vary between 5 and 7 elements. For example, [1,1,1,1,1,2,n/a] has 7 elements while [1,1,16,16,n/a] has 5, creating ambiguity about parameter meanings and order. Normalize all rows to the same arity and ensure the header accurately reflects the schema (including any missing parameters like VP).
This applies to all three performance tables: lines 54–66, 85–96, and the GB200 table.
🤖 Prompt for AI Agents
In `@docs/about/performance-summary.md` around lines 54 - 66, The Training tuple
schema is inconsistent: the header shows five fields "[TP,CP,EP,PP,VPP]" but
rows contain varying arity (e.g. "[1,1,1,1,1,2,n/a]" vs "[1,1,16,16,n/a]");
normalize every Training cell to the same canonical tuple length and update the
header to reflect the true parameter list (add any missing parameter name such
as "VP" or "VPP" as appropriate), ensuring the order matches across all
rows—apply this fix to the three performance tables (the two tables around the
shown diff and the GB200 table) and make sure example rows like
"[1,1,1,1,1,2,n/a]" and "[1,1,16,16,n/a]" are rewritten to the agreed schema.
…DIA-NeMo#1800) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
beep boop [🤖]: Hi @guyueh1 👋,
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.