Conversation
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>
|
Please check this small fix, @terrykong @yuki-97 |
📝 WalkthroughWalkthroughAdded NVIDIA H200 accelerator entries to the TFLOPS mapping in Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/unit/utils/test_flops_tracker.py (1)
8-26: Comprehensive test coverage for all device configurations.The parameterized test thoroughly covers all supported devices including the newly added H200. The test cases correctly mirror the expected values from the source dictionary.
Consider adding a test case to verify that
get_theoretical_tflopsraises aValueErrorfor unknown devices:def test_theoretical_tflops_unknown_device(): with pytest.raises(ValueError, match="Unknown device name"): get_theoretical_tflops("NVIDIA Unknown", torch.bfloat16)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_rl/utils/flops_tracker.py(1 hunks)tests/unit/utils/test_flops_tracker.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/utils/test_flops_tracker.py (1)
nemo_rl/utils/flops_tracker.py (2)
get_theoretical_tflops(131-138)is_using_tf32(105-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (3)
tests/unit/utils/test_flops_tracker.py (2)
1-5: LGTM!Imports are clean and include all necessary dependencies for the test.
27-28: LGTM!The test correctly uses
pytest.approxfor floating-point comparison, which handles potential precision differences gracefully.nemo_rl/utils/flops_tracker.py (1)
118-119: No issues found—H200 TFLOPS values are accurate.The verification confirms the H200 entries are correct:
- BFLOAT16 base value (1979) matches NVIDIA H200 SXM specification
- FP32 scalar fallback (67.0) matches NVIDIA H200 SXM specification exactly
- The division by 2 pattern is a deliberate, consistent convention applied uniformly across all 8 GPU entries in this tracker (A100, H100, H200, B200, B300, GB200, GB300), not a H200-specific issue
- H200 correctly mirrors H100 since they share compute architecture
|
@guyueh1 to review |
|
@clumsy can you add a copyright on the test module? |
f16ca0f to
958577a
Compare
|
Done, @terrykong |
|
Is there anything left to do for this PR, @terrykong , @guyueh1 ? |
|
@clumsy could you resolve the linting ci error by running pre-commit? see https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md#making-changes |
Head branch was pushed to by a user without write access
59fd841 to
e971167
Compare
|
Done, @terrykong |
|
Hi @clumsy, I tried to re-run the ci-tests. Looks like the assertion margin is too tight? |
|
And perhaps platform-specific, @youngeunkwon0405 Let me use the same approach as for other float assertions in tests |
Head branch was pushed to by a user without write access
12bd2d5 to
bc0ca9f
Compare
|
@youngeunkwon0405 @terrykong please check the new version. |
|
@guyueh1 @youngeunkwon0405 to approve if good |
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: ZeYi Lin <944270057@qq.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: ZeYi Lin <944270057@qq.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>
What does this PR do ?
Adds theoretical H200 TFLOPS as per https://www.nvidia.com/en-us/data-center/h200
This addresses the following issue:
Issues
List issues that this PR closes (syntax): N/A
Usage
N/A
Before your PR is "Ready for review"
Pre checks:
Additional Information
https://www.nvidia.com/en-us/data-center/h200
Summary by CodeRabbit
Release Notes
New Features
Tests