fix: add H200 TFLOPS by clumsy · Pull Request #1543 · NVIDIA-NeMo/RL

clumsy · 2025-11-19T18:21:44Z

What does this PR do ?

Adds theoretical H200 TFLOPS as per https://www.nvidia.com/en-us/data-center/h200

This addresses the following issue:

...nemo_rl/models/policy/lm_policy.py:556: UserWarning: Error getting theoretical flops: Unknown device name: NVIDIA H200 and dtype name: torch.bfloat16

Issues

List issues that this PR closes (syntax): N/A

Usage

N/A

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

https://www.nvidia.com/en-us/data-center/h200

Summary by CodeRabbit

Release Notes

New Features
- Added support for NVIDIA H200 accelerator with performance tracking capabilities across multiple data types.
Tests
- Introduced comprehensive unit tests for performance metric validation to ensure accuracy across supported accelerators.

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>

clumsy · 2025-11-19T18:22:01Z

Please check this small fix, @terrykong @yuki-97

coderabbitai · 2025-11-19T18:23:27Z

📝 Walkthrough

Walkthrough

Added NVIDIA H200 accelerator entries to the TFLOPS mapping in flops_tracker.py with corresponding theoretical TFLOPS values for bfloat16 and float32 data types. Introduced a new unit test file to validate the theoretical TFLOPS calculations across multiple device and data type configurations.

Changes

Cohort / File(s)	Change Summary
TFLOPS Mapping Updates `nemo_rl/utils/flops_tracker.py`	Added two entries to THEORETICAL_TFLOPS dictionary for NVIDIA H200: bfloat16 with value 1979 / 2 and float32 with conditional value based on TF32 usage (989 / 2 if enabled, else 67.0)
Unit Tests `tests/unit/utils/test_flops_tracker.py`	New test file with parameterized test `test_theoretical_tflops` validating theoretical TFLOPS calculations across multiple NVIDIA devices and data types (bfloat16, float32) with tolerance-based assertions

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Straightforward constant additions to existing mapping
Simple parameterized test with consistent structure
No complex logic or control flow changes
Configuration and test data primarily

Possibly related PRs

feat: Update Theoretical TFLOPS #1236: Adds B200/B300/GB200/GB300 entries to the same THEORETICAL_TFLOPS mapping in flops_tracker.py
fix: report the correct number of workers during FLOPs calculation #1034: Modifies flops_tracker.py with imports and FLOPS-related logic changes

Suggested labels

CI:L1

Suggested reviewers

terrykong
guyueh1

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: add H200 TFLOPS' directly and clearly summarizes the main change: adding NVIDIA H200 TFLOPS values to the theoretical TFLOPS mapping.
Test Results For Major Changes	✅ Passed	Changes add NVIDIA H200 hardware support constants with unit test coverage, fixing runtime warnings without introducing new features or breaking existing functionality.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unit/utils/test_flops_tracker.py (1)
8-26: Comprehensive test coverage for all device configurations.

The parameterized test thoroughly covers all supported devices including the newly added H200. The test cases correctly mirror the expected values from the source dictionary.

Consider adding a test case to verify that get_theoretical_tflops raises a ValueError for unknown devices:
def test_theoretical_tflops_unknown_device():
    with pytest.raises(ValueError, match="Unknown device name"):
        get_theoretical_tflops("NVIDIA Unknown", torch.bfloat16)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55dc433 and 9c97222.

📒 Files selected for processing (2)

nemo_rl/utils/flops_tracker.py (1 hunks)
tests/unit/utils/test_flops_tracker.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit/utils/test_flops_tracker.py (1)

nemo_rl/utils/flops_tracker.py (2)

get_theoretical_tflops (131-138)

is_using_tf32 (105-110)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (3)

tests/unit/utils/test_flops_tracker.py (2)

1-5: LGTM!

Imports are clean and include all necessary dependencies for the test.

27-28: LGTM!

The test correctly uses pytest.approx for floating-point comparison, which handles potential precision differences gracefully.

nemo_rl/utils/flops_tracker.py (1)

118-119: No issues found—H200 TFLOPS values are accurate.

The verification confirms the H200 entries are correct:

BFLOAT16 base value (1979) matches NVIDIA H200 SXM specification

FP32 scalar fallback (67.0) matches NVIDIA H200 SXM specification exactly

The division by 2 pattern is a deliberate, consistent convention applied uniformly across all 8 GPU entries in this tracker (A100, H100, H200, B200, B300, GB200, GB300), not a H200-specific issue

H200 correctly mirrors H100 since they share compute architecture

terrykong · 2025-11-20T00:25:46Z

@guyueh1 to review

terrykong · 2025-11-20T00:26:01Z

@clumsy can you add a copyright on the test module?

clumsy · 2025-11-20T02:27:11Z

Done, @terrykong

clumsy · 2025-12-02T15:22:54Z

Is there anything left to do for this PR, @terrykong , @guyueh1 ?

terrykong · 2025-12-02T23:34:27Z

@clumsy could you resolve the linting ci error by running pre-commit? see https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md#making-changes

clumsy · 2025-12-03T13:16:28Z

Done, @terrykong

youngeunkwon0405 · 2025-12-09T09:08:58Z

Hi @clumsy, I tried to re-run the ci-tests.

Looks like the assertion margin is too tight?

=================================== FAILURES ===================================
________ test_flops_counter[meta-llama/Llama-2-7b-hf-128-4096-2.25e+16] ________

model_name = 'meta-llama/Llama-2-7b-hf', gbs = 128, seqlen = 4096
expected_flops = 2.25e+16

    @pytest.mark.parametrize(
        "model_name, gbs, seqlen, expected_flops",
        [
            ("meta-llama/Llama-2-7b-hf", 128, 4096, 2.25e16),
            ("meta-llama/Llama-2-13b-hf", 128, 4096, 4.17e16),
            ("meta-llama/Llama-2-70b-hf", 128, 4096, 2.25e17),
            ("meta-llama/Meta-Llama-3-8B", 128, 8192, 5.31e16),
            ("meta-llama/Llama-3.1-70B-Instruct", 128, 8192, 4.71e17),
            ("meta-llama/Llama-3.1-405B-Instruct", 128, 8192, 2.65e18),
            ("Qwen/Qwen3-30B-A3B", 128, 4096, 9.37e15),
            ("Qwen/Qwen3-235B-A22B", 128, 4096, 6.21e16),
            ("deepseek-ai/DeepSeek-V3", 1, 4096, 1.023e15),
            ("moonshotai/Moonlight-16B-A3B-Instruct", 1, 4096, 6.45e13),
        ],
    )
    def test_flops_counter(model_name, gbs, seqlen, expected_flops):
        model_config = get_default_hf_config(model_name)
        flops_tracker = FLOPTracker.from_config(model_name, model_config)
        flops_tracker.track(gbs, seqlen)
    
>       assert flops_tracker.total_flops == pytest.approx(expected_flops), (
            f"Expected {expected_flops} flops, got {flops_tracker.total_flops}"
        )
E       AssertionError: Expected 2.25e+16 flops, got 2.2472918160113664e+16
E       assert 2.2472918160113664e+16 == 2.25e+16 ± 2.2e+10
E         
E         comparison failed
E         Obtained: 2.2472918160113664e+16
E         Expected: 2.25e+16 ± 2.2e+10

unit/utils/test_flops_counter.py:40: AssertionError

clumsy · 2025-12-09T19:28:40Z

And perhaps platform-specific, @youngeunkwon0405 Let me use the same approach as for other float assertions in tests

clumsy · 2025-12-09T20:05:24Z

@youngeunkwon0405 @terrykong please check the new version.

terrykong · 2025-12-09T20:27:24Z

@guyueh1 @youngeunkwon0405 to approve if good

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: ZeYi Lin <944270057@qq.com>

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>

fix: add H200 TFLOPS

bc0ca9f

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>

clumsy requested review from a team as code owners November 19, 2025 18:21

github-actions Bot added the community-request label Nov 19, 2025

coderabbitai Bot reviewed Nov 19, 2025

View reviewed changes

terrykong requested a review from guyueh1 November 20, 2025 00:24

terrykong added the CI:L0 Run doctests and unit tests label Nov 20, 2025

terrykong temporarily deployed to nemo-ci November 20, 2025 00:25 — with GitHub Actions Inactive

terrykong mentioned this pull request Nov 20, 2025

fix: add theoretical TFlops for H200 GPU #1422

Closed

4 tasks

terrykong temporarily deployed to nemo-ci November 20, 2025 00:48 — with GitHub Actions Inactive

clumsy force-pushed the fix/h200_tflops branch from f16ca0f to 958577a Compare November 20, 2025 02:26

terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 20, 2025

terrykong had a problem deploying to nemo-ci December 2, 2025 20:09 — with GitHub Actions Failure

terrykong enabled auto-merge (squash) December 2, 2025 20:09

guyueh1 previously approved these changes Dec 2, 2025

View reviewed changes

auto-merge was automatically disabled December 3, 2025 13:16
Head branch was pushed to by a user without write access

clumsy dismissed guyueh1’s stale review via e971167 December 3, 2025 13:16

clumsy force-pushed the fix/h200_tflops branch from 59fd841 to e971167 Compare December 3, 2025 13:16

terrykong previously approved these changes Dec 3, 2025

View reviewed changes

terrykong enabled auto-merge (squash) December 3, 2025 18:50

terrykong removed the CI:L0 Run doctests and unit tests label Dec 3, 2025

terrykong added the CI:L0 Run doctests and unit tests label Dec 3, 2025

terrykong temporarily deployed to nemo-ci December 3, 2025 18:52 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci December 3, 2025 18:56 — with GitHub Actions Inactive

auto-merge was automatically disabled December 9, 2025 19:58
Head branch was pushed to by a user without write access

clumsy dismissed terrykong’s stale review via bc0ca9f December 9, 2025 19:58

clumsy force-pushed the fix/h200_tflops branch from 12bd2d5 to bc0ca9f Compare December 9, 2025 19:58

Merge branch 'main' into fix/h200_tflops

a5df772

youngeunkwon0405 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Dec 9, 2025

terrykong enabled auto-merge (squash) December 9, 2025 20:27

youngeunkwon0405 temporarily deployed to nemo-ci December 9, 2025 20:27 — with GitHub Actions Inactive

youngeunkwon0405 temporarily deployed to nemo-ci December 9, 2025 20:34 — with GitHub Actions Inactive

youngeunkwon0405 approved these changes Dec 9, 2025

View reviewed changes

terrykong merged commit 5bc5eba into NVIDIA-NeMo:main Dec 9, 2025
40 of 41 checks passed

Zeyi-Lin pushed a commit to Zeyi-Lin/RL that referenced this pull request Dec 10, 2025

fix: add H200 TFLOPS (NVIDIA-NeMo#1543)

2c86012

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: ZeYi Lin <944270057@qq.com>

Zeyi-Lin pushed a commit to Zeyi-Lin/RL that referenced this pull request Dec 11, 2025

fix: add H200 TFLOPS (NVIDIA-NeMo#1543)

4124a45

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: ZeYi Lin <944270057@qq.com>

clumsy deleted the fix/h200_tflops branch December 11, 2025 13:38

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

fix: add H200 TFLOPS (NVIDIA-NeMo#1543)

39ad4dc

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>

youngeunkwon0405 mentioned this pull request Jan 23, 2026

Error getting theoretical flops: Unknown device name: NVIDIA H200 and dtype name: torch.bfloat16 #1609

Closed

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

fix: add H200 TFLOPS (#1543)

0bf290c

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

fix: add H200 TFLOPS (#1543)

51afe66

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

fix: add H200 TFLOPS (#1543)

1a467d2

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>

Conversation

clumsy commented Nov 19, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

clumsy commented Nov 19, 2025

Uh oh!

coderabbitai Bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong commented Nov 20, 2025

Uh oh!

terrykong commented Nov 20, 2025

Uh oh!

clumsy commented Nov 20, 2025

Uh oh!

clumsy commented Dec 2, 2025

Uh oh!

terrykong commented Dec 2, 2025

Uh oh!

clumsy commented Dec 3, 2025

Uh oh!

youngeunkwon0405 commented Dec 9, 2025

Uh oh!

clumsy commented Dec 9, 2025

Uh oh!

clumsy commented Dec 9, 2025

Uh oh!

terrykong commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

clumsy commented Nov 19, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 19, 2025 •

edited

Loading