Skip to content

fix: Fix process_weights_after_loading for fp8 dense#1432

Merged
terrykong merged 10 commits intoNVIDIA-NeMo:mainfrom
guyueh1:fix_fp8_rollout_dense
Nov 10, 2025
Merged

fix: Fix process_weights_after_loading for fp8 dense#1432
terrykong merged 10 commits intoNVIDIA-NeMo:mainfrom
guyueh1:fix_fp8_rollout_dense

Conversation

@guyueh1
Copy link
Copy Markdown
Contributor

@guyueh1 guyueh1 commented Oct 27, 2025

What does this PR do ?

Fix process_weights_after_loading for fp8 dense after bumping vllm to 0.11.0

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Refactor

    • Improved FP8 quantized weight processing and loading mechanisms for enhanced model handling.
  • Chores

    • Added additional training metrics logging for improved observability during training.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 requested review from a team as code owners October 27, 2025 20:10
@guyueh1 guyueh1 added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Oct 27, 2025
Copy link
Copy Markdown
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment. could you also confirm the fp8 rollout test runs after this fix?

Comment thread nemo_rl/algorithms/grpo.py Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 31, 2025

📝 Walkthrough

Walkthrough

Two changes added: logging for the token_mult_prob_error metric in GRPO training output, and modifications to FP8 weight loading that replace the "_scale_inv" suffix with "_scale" and introduce post-processing strategy functions.

Changes

Cohort / File(s) Summary
GRPO Training Logging
nemo_rl/algorithms/grpo.py
Added runtime log output to display token_mult_prob_error metric during training. No functional behavior changes.
FP8 Weight Processing
nemo_rl/models/generation/fp8.py
Updated FP8 weight loading: changed scale key suffix from "_scale_inv" to "_scale"; imported and integrated FP8 utility functions (maybe_post_process_fp8_weight_block, process_fp8_weight_block_strategy); modified weight extraction to support both scale variants; applied processing strategy to weights and scales; added post-processing step after loading.

Sequence Diagram(s)

sequenceDiagram
    participant Code as Model Code
    participant Load as load_weights()
    participant Extract as _create_param_from_subclass_attributes()
    participant Strategy as process_fp8_weight_block_strategy()
    participant PostProc as maybe_post_process_fp8_weight_block()
    participant Layer as Layer Attributes

    Code->>Load: load weights with "_scale" key
    Load->>Extract: extract layer attributes
    Extract->>Layer: retrieve weight_scale_inv or weight_scale
    Extract->>Strategy: process (layer.weight, weight_scale)
    Strategy-->>Extract: return (updated_weight, updated_scale)
    Extract->>Extract: create ModelWeightParameter with updated_weight.data
    Extract->>PostProc: post-process layer
    PostProc->>Layer: update layer state
    Extract-->>Code: return processed parameter
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • The first change (GRPO logging) is trivial and adds a single log line
  • The FP8 changes, while more involved, involve localized modifications to weight loading and processing with a consistent pattern applied across the affected methods; no complex logic branching or state management

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning The PR contains significant changes to FP8 weight loading and processing that directly affect numerical operations and model behavior, but the PR description lacks documentation of test results, regression testing, or performance validation. The pre-checks show tests remain unchecked, and there are incomplete sections in the PR description. Additionally, the existing review comment identifies an unaddressed bug in weight scale handling, further indicating insufficient testing verification before merge. The PR should include documented test results demonstrating that FP8 weight loading and inference produce numerically correct results with no accuracy regression compared to the previous version. Performance benchmarks or at least confirmation that the fix resolves the vllm 0.11.0 compatibility issue without negative impact should be provided. The unresolved review comment regarding inconsistent weight_scale_inv vs weight_scale handling should also be addressed before merge to ensure robustness.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "fix: Fix process_weights_after_loading for fp8 dense" directly aligns with the primary changes in the changeset. The bulk of the modifications are concentrated in fp8.py, where the process_weights_after_loading function and related FP8 weight handling have been updated to support a new processing strategy and adjust how scales are stored and loaded. This matches the PR objective of fixing process_weights_after_loading for FP8 dense after bumping vllm to 0.11.0. The secondary change in grpo.py (adding a log line) is minor and does not detract from the core focus. The title is concise, specific, and would allow a teammate to quickly understand the primary intent of the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd2e645 and c1d7fd2.

📒 Files selected for processing (2)
  • nemo_rl/algorithms/grpo.py (1 hunks)
  • nemo_rl/models/generation/fp8.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/algorithms/grpo.py
  • nemo_rl/models/generation/fp8.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/algorithms/grpo.py
  • nemo_rl/models/generation/fp8.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (3)
nemo_rl/algorithms/grpo.py (1)

1279-1279: Clarify relevance to PR objective.

This logging addition appears unrelated to the PR's stated purpose of fixing FP8 weight loading after the vllm 0.11.0 upgrade. Additionally, there's a past review comment suggesting gen_kl_error should be logged instead.

Consider either:

  1. Moving this change to a separate PR focused on GRPO logging improvements
  2. Updating the PR description to explain why this logging change is included
  3. Addressing the past review comment about using gen_kl_error
nemo_rl/models/generation/fp8.py (2)

304-304: LGTM: Key naming updated for vllm 0.11.0 compatibility.

The change from _scale_inv to _scale suffix correctly aligns with vllm 0.11.0's FP8 parameter naming convention.


394-397: LGTM: FP8 utility imports added.

The imported functions (maybe_post_process_fp8_weight_block, process_fp8_weight_block_strategy) are vllm 0.11.0 utilities that enable proper post-processing of FP8 weight blocks.

Comment thread nemo_rl/models/generation/fp8.py Outdated
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 linked an issue Nov 3, 2025 that may be closed by this pull request
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 6, 2025
@guyueh1 guyueh1 requested a review from terrykong November 6, 2025 23:37
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 6, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 requested a review from a team as a code owner November 7, 2025 21:22
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 7, 2025
@terrykong terrykong merged commit 6a035bc into NVIDIA-NeMo:main Nov 10, 2025
40 of 42 checks passed
zpqiu pushed a commit to sharonyu-115/RL that referenced this pull request Nov 17, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@coderabbitai coderabbitai Bot mentioned this pull request Dec 19, 2025
4 tasks
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FP8 fix after vllm 0.11 upgrade

2 participants