cp: `feat: Add deepseek flops tracker (1250)` into `r0.4.0` by chtruong814 · Pull Request #1309 · NVIDIA-NeMo/RL

chtruong814 · 2025-10-08T11:04:51Z

beep boop [🤖]: Hi @guyueh1 👋,

we've cherry picked #1250 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added FLOPS estimation support for the DeepSeek-V3 model family, enabling accurate calculations for advanced architectures (including MoE and LoRA variations).
- Expanded configuration handling to accommodate a wider range of model characteristics for more precise FLOPS reporting.
Tests
- Added unit test coverage for DeepSeek-V3 with long sequence lengths to validate FLOPS estimates and ensure reliability across supported models.

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2025-10-08T11:10:38Z

📝 Walkthrough

Walkthrough

Adds DeepSeek-V3 support to FLOPs tracking by extending config-to-formula mapping and importing its formula. Updates unit tests to include a DeepSeek-V3 case with expected FLOPs. No public API changes.

Changes

Cohort / File(s)	Summary
FLOPs tracker update `nemo_rl/utils/flops_tracker.py`	Adds a `deepseek_v3` branch in config conversion, constructing an extended FLOPSConfig (including MoE and LoRA-related fields) and dispatching to `deepseekv3` formula; updates imports to include `deepseekv3`.
Unit tests update `tests/unit/utils/test_flops_counter.py`	Extends parameterized test cases with DeepSeek-V3 entry (gbs=1, seqlen=4096, expected_flops=1.023e15).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Caller
  participant FT as FLOPsTracker
  participant CFG as convert_config_to_flops_config
  participant F as flops_formulas

  U->>FT: count_flops(model_type="deepseek_v3", config)
  FT->>CFG: convert_config_to_flops_config(config)
  CFG->>CFG: Build FLOPSConfig (MoE/LoRA/QK/V dims)
  CFG-->>FT: FLOPSConfig + model_type="deepseek_v3"
  FT->>F: deepseekv3(FLOPSConfig)
  F-->>FT: computed_flops
  FT-->>U: computed_flops
  note over FT,F: New dispatch branch for deepseek_v3

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

fix: report the correct number of workers during FLOPs calculation #1034 — Also changes flops_tracker imports and convert_config_to_flops_config logic; closely related to adding a new model branch and formula dispatch.

Suggested labels

r0.4.0

Suggested reviewers

terrykong
wangshangsam

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The current title focuses on the cherry-pick operation and includes metadata like PR number and target branch instead of clearly summarizing the primary change of adding DeepSeek-v3 FLOPS tracking support. It doesn’t provide a concise description of the new feature for someone scanning the commit history.	Please update the pull request title to succinctly describe the main change, for example “Add DeepSeek-v3 FLOPS tracker support,” and remove the cherry-pick and branch metadata.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	The PR introduces new DeepSeek-V3 handling in the FLOPS tracker, which is a feature addition and therefore a major change under this check’s criteria. I reviewed the PR description and found only the automated cherry-pick note with no mention of executed tests, numerical validation, or performance comparisons. Because major changes must document testing evidence and this PR description does not, the check fails.	Please update the PR description to summarize the tests or validations run for the new DeepSeek-V3 FLOPS support, including any relevant numerical or performance checks, so the review can proceed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-1250-r0.4.0

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1860a52 and 33bc779.

📒 Files selected for processing (2)

nemo_rl/utils/flops_tracker.py (2 hunks)
tests/unit/utils/test_flops_counter.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/utils/flops_tracker.py
tests/unit/utils/test_flops_counter.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/utils/flops_tracker.py

🧬 Code graph analysis (1)

nemo_rl/utils/flops_tracker.py (1)

nemo_rl/utils/flops_formulas.py (2)

FLOPSConfig (21-59)

deepseekv3 (386-458)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (3)

nemo_rl/utils/flops_tracker.py (2)

28-28: LGTM!

The import of deepseekv3 is correctly added to support the new DeepSeek-V3 formula.

80-100: Verify DeepSeek-V3 configuration correctness.

Ensure consistency in type checking: either confirm there’s no dedicated DeepSeekV3Config and config.__class__.model_type == "deepseek_v3" is required, or switch to an isinstance() guard.

Confirm hardcoded architecture parameters (moe_layer_freq=1, mtp_num_layers=0, causal_self_attn=True) match all DeepSeek-V3 variants or replace with config attributes if available.

Validate that both moe_shared_expert_intermediate_size and moe_ffn_hidden_size should use config.moe_intermediate_size, or adjust if separate config fields exist.

tests/unit/utils/test_flops_counter.py (1)

31-31: Verify the batch size choice for consistency.

The test case uses gbs=1 while all other test cases use gbs=128. Please verify:

Is the smaller batch size intentional (e.g., to test edge cases or reduce computational requirements)?

Should it use gbs=128 for consistency with other model tests?

For reference, using gbs=128 with the same expected relative FLOPs would result in approximately 1.023e15 * 128 ≈ 1.31e17 FLOPs.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

feat: Add deepseek flops tracker (original #1250) (#1305)

33bc779

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

chtruong814 requested review from a team as code owners October 8, 2025 11:04

chtruong814 requested review from guyueh1 and removed request for a team October 8, 2025 11:04

chtruong814 added cherry-pick Run CICD labels Oct 8, 2025

terrykong enabled auto-merge (squash) October 8, 2025 16:06

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 8, 2025

terrykong temporarily deployed to nemo-ci October 8, 2025 16:06 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci October 8, 2025 16:10 — with GitHub Actions Inactive

guyueh1 approved these changes Oct 8, 2025

View reviewed changes

terrykong temporarily deployed to nemo-ci October 8, 2025 17:35 — with GitHub Actions Inactive

chtruong814 disabled auto-merge October 9, 2025 22:57

chtruong814 merged commit 1d66c86 into r0.4.0 Oct 9, 2025
69 of 72 checks passed

chtruong814 deleted the cherry-pick-1250-r0.4.0 branch October 9, 2025 22:58

coderabbitai Bot mentioned this pull request Dec 29, 2025

Add FP8 support to nano-v3 branch #1704

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `feat: Add deepseek flops tracker (1250)` into `r0.4.0`#1309

cp: `feat: Add deepseek flops tracker (1250)` into `r0.4.0`#1309
chtruong814 merged 1 commit intor0.4.0from
cherry-pick-1250-r0.4.0

chtruong814 commented Oct 8, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 8, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chtruong814 commented Oct 8, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 8, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chtruong814 commented Oct 8, 2025 •

edited by coderabbitai Bot

Loading