fix: Fix process_weights_after_loading for fp8 dense by guyueh1 · Pull Request #1432 · NVIDIA-NeMo/RL

guyueh1 · 2025-10-27T20:10:44Z

What does this PR do ?

Fix process_weights_after_loading for fp8 dense after bumping vllm to 0.11.0

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Refactor
- Improved FP8 quantized weight processing and loading mechanisms for enhanced model handling.
Chores
- Added additional training metrics logging for improved observability during training.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

terrykong

small comment. could you also confirm the fp8 rollout test runs after this fix?

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

coderabbitai · 2025-10-31T16:53:43Z

📝 Walkthrough

Walkthrough

Two changes added: logging for the token_mult_prob_error metric in GRPO training output, and modifications to FP8 weight loading that replace the "_scale_inv" suffix with "_scale" and introduce post-processing strategy functions.

Changes

Cohort / File(s)	Summary
GRPO Training Logging `nemo_rl/algorithms/grpo.py`	Added runtime log output to display `token_mult_prob_error` metric during training. No functional behavior changes.
FP8 Weight Processing `nemo_rl/models/generation/fp8.py`	Updated FP8 weight loading: changed scale key suffix from "_scale_inv" to "_scale"; imported and integrated FP8 utility functions (`maybe_post_process_fp8_weight_block`, `process_fp8_weight_block_strategy`); modified weight extraction to support both scale variants; applied processing strategy to weights and scales; added post-processing step after loading.

Sequence Diagram(s)

sequenceDiagram
    participant Code as Model Code
    participant Load as load_weights()
    participant Extract as _create_param_from_subclass_attributes()
    participant Strategy as process_fp8_weight_block_strategy()
    participant PostProc as maybe_post_process_fp8_weight_block()
    participant Layer as Layer Attributes

    Code->>Load: load weights with "_scale" key
    Load->>Extract: extract layer attributes
    Extract->>Layer: retrieve weight_scale_inv or weight_scale
    Extract->>Strategy: process (layer.weight, weight_scale)
    Strategy-->>Extract: return (updated_weight, updated_scale)
    Extract->>Extract: create ModelWeightParameter with updated_weight.data
    Extract->>PostProc: post-process layer
    PostProc->>Layer: update layer state
    Extract-->>Code: return processed parameter

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

The first change (GRPO logging) is trivial and adds a single log line
The FP8 changes, while more involved, involve localized modifications to weight loading and processing with a consistent pattern applied across the affected methods; no complex logic branching or state management

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	The PR contains significant changes to FP8 weight loading and processing that directly affect numerical operations and model behavior, but the PR description lacks documentation of test results, regression testing, or performance validation. The pre-checks show tests remain unchecked, and there are incomplete sections in the PR description. Additionally, the existing review comment identifies an unaddressed bug in weight scale handling, further indicating insufficient testing verification before merge.	The PR should include documented test results demonstrating that FP8 weight loading and inference produce numerically correct results with no accuracy regression compared to the previous version. Performance benchmarks or at least confirmation that the fix resolves the vllm 0.11.0 compatibility issue without negative impact should be provided. The unresolved review comment regarding inconsistent `weight_scale_inv` vs `weight_scale` handling should also be addressed before merge to ensure robustness.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "fix: Fix process_weights_after_loading for fp8 dense" directly aligns with the primary changes in the changeset. The bulk of the modifications are concentrated in `fp8.py`, where the `process_weights_after_loading` function and related FP8 weight handling have been updated to support a new processing strategy and adjust how scales are stored and loaded. This matches the PR objective of fixing `process_weights_after_loading` for FP8 dense after bumping vllm to 0.11.0. The secondary change in `grpo.py` (adding a log line) is minor and does not detract from the core focus. The title is concise, specific, and would allow a teammate to quickly understand the primary intent of the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd2e645 and c1d7fd2.

📒 Files selected for processing (2)

nemo_rl/algorithms/grpo.py (1 hunks)
nemo_rl/models/generation/fp8.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/algorithms/grpo.py
nemo_rl/models/generation/fp8.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/algorithms/grpo.py
nemo_rl/models/generation/fp8.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (3)

nemo_rl/algorithms/grpo.py (1)

1279-1279: Clarify relevance to PR objective.

This logging addition appears unrelated to the PR's stated purpose of fixing FP8 weight loading after the vllm 0.11.0 upgrade. Additionally, there's a past review comment suggesting gen_kl_error should be logged instead.

Consider either:

Moving this change to a separate PR focused on GRPO logging improvements

Updating the PR description to explain why this logging change is included

Addressing the past review comment about using gen_kl_error

nemo_rl/models/generation/fp8.py (2)

304-304: LGTM: Key naming updated for vllm 0.11.0 compatibility.

The change from _scale_inv to _scale suffix correctly aligns with vllm 0.11.0's FP8 parameter naming convention.

394-397: LGTM: FP8 utility imports added.

The imported functions (maybe_post_process_fp8_weight_block, process_fp8_weight_block_strategy) are vllm 0.11.0 utilities that enable proper post-processing of FP8 weight blocks.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

…-RL into fix_fp8_rollout_dense

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Fix process_weights_after_loading for fp8 dense

41abdf1

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 requested review from a team as code owners October 27, 2025 20:10

guyueh1 added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Oct 27, 2025

guyueh1 had a problem deploying to nemo-ci October 27, 2025 20:11 — with GitHub Actions Failure

terrykong reviewed Oct 29, 2025

View reviewed changes

Comment thread nemo_rl/algorithms/grpo.py Outdated

guyueh1 and others added 2 commits October 31, 2025 09:49

Merge branch 'main' into fix_fp8_rollout_dense

c1d7fd2

Change printed metric to gen_kl_error

bf1ad6b

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

coderabbitai Bot reviewed Oct 31, 2025

View reviewed changes

Comment thread nemo_rl/models/generation/fp8.py Outdated

guyueh1 added 3 commits November 1, 2025 21:01

Fix for non deep_gemm weights

c3b5f73

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Fix disable_tp=True

958d745

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Merge branch 'fix_fp8_rollout_dense' of ssh://github.com/guyueh1/NeMo…

0f7128e

…-RL into fix_fp8_rollout_dense

guyueh1 linked an issue Nov 3, 2025 that may be closed by this pull request

FP8 fix after vllm 0.11 upgrade #1368

Closed

guyueh1 added 2 commits November 6, 2025 14:44

Merge branch 'main' into fix_fp8_rollout_dense

4cc2976

simplify code

22f615e

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 6, 2025

guyueh1 had a problem deploying to nemo-ci November 6, 2025 23:36 — with GitHub Actions Error

guyueh1 requested a review from terrykong November 6, 2025 23:37

Rm unused code

335a3e2

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 6, 2025

guyueh1 temporarily deployed to nemo-ci November 6, 2025 23:56 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci November 7, 2025 00:32 — with GitHub Actions Inactive

Fix unit test; log KL in async function also

78e45b6

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 requested a review from a team as a code owner November 7, 2025 21:22

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 7, 2025

guyueh1 temporarily deployed to nemo-ci November 7, 2025 21:23 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci November 7, 2025 23:45 — with GitHub Actions Inactive

terrykong approved these changes Nov 10, 2025

View reviewed changes

terrykong merged commit 6a035bc into NVIDIA-NeMo:main Nov 10, 2025
40 of 42 checks passed

zpqiu pushed a commit to sharonyu-115/RL that referenced this pull request Nov 17, 2025

fix: Fix process_weights_after_loading for fp8 dense (NVIDIA-NeMo#1432)

9a21af6

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

fix: Fix process_weights_after_loading for fp8 dense (NVIDIA-NeMo#1432)

96e36c7

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

coderabbitai Bot mentioned this pull request Dec 19, 2025

fix: Fix fp8 after vllm v0.11.2 bump #1660

Merged

4 tasks

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

fix: Fix process_weights_after_loading for fp8 dense (NVIDIA-NeMo#1432)

158ba46

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

fix: Fix process_weights_after_loading for fp8 dense (NVIDIA-NeMo#1432)

e30f3df

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix process_weights_after_loading for fp8 dense#1432

fix: Fix process_weights_after_loading for fp8 dense#1432
terrykong merged 10 commits intoNVIDIA-NeMo:mainfrom
guyueh1:fix_fp8_rollout_dense

guyueh1 commented Oct 27, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

terrykong left a comment

Uh oh!

Uh oh!

coderabbitai Bot commented Oct 31, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guyueh1 commented Oct 27, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented Oct 31, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guyueh1 commented Oct 27, 2025 •

edited by coderabbitai Bot

Loading