fix: Make the optimizer offloading optional by youngeunkwon0405 · Pull Request #1404 · NVIDIA-NeMo/RL

youngeunkwon0405 · 2025-10-22T00:44:55Z

What does this PR do ?

This PR removes optimizer state offloading in policy.prepare_for_lp_inference() and corresponding on-loading in policy.prepare_for_training().

~~You can turn on optimizer state offloading by setting NRL_OFFLOAD_OPTIMIZER_STATE_FOR_LOGPROB=True (default value is False).~~
Changed this to yaml config following a comment from @guyueh1. Now it is policy.offload_optimizer_states_for_logprob=true to turn it on.

This will save 3 ~ 15s per step, depending on the model size.

I am not sure why it is required in both colocated and non-colocated cases. Maybe it could help increase logprob batch size to help increase MFU, but I'm not sure if it's worth it.

@yuki-97, do you think this change will cause additional memory overhead?

@parthchadha, this is the topic I mentioned today.
@guyueh1, do you think it should be removed in the colocated path as well?

Performance (logprob_inference_prep + training_prep)

QWEN3 235B: 10 s --> 1 s
QWEN3 30B: 13 s --> ~1 s

Nsys rep (QWEN3 30B GBS64)

Current ToT

This PR

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Bug Fixes
- Refined optimizer state device management. Optimizer state transfers to GPU and CPU offloading are now conditioned on generation configuration, ensuring consistent behavior across distributed training setups.

github-actions · 2025-10-22T00:45:36Z

ℹ️ File Consistency Check

Check based on commit: 5416b9a (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai · 2025-10-22T00:49:10Z

📝 Walkthrough

Walkthrough

Three policy worker implementations now gate optimizer state movement operations (both device transfers and CPU offloading) to only occur when generation is colocated, by adding self.is_generation_colocated checks to existing conditions in prepare_for_training and offload_before_refit methods.

Changes

Cohort / File(s)	Change Summary
Policy Worker Optimizer State Gating `nemo_rl/models/policy/dtensor_policy_worker.py`, `nemo_rl/models/policy/dtensor_policy_worker_v2.py`, `nemo_rl/models/policy/megatron_policy_worker.py`	Added `self.is_generation_colocated` guard condition to optimizer state transfers in `prepare_for_training` and `offload_before_refit` methods, restricting state movement to CUDA and CPU offloading to colocated-generation scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Homogeneous guard condition additions repeated consistently across three closely related files with minimal logic density and straightforward boolean gating.

Possibly related PRs

fix: Fix OOM in validation during colocated training #1159: Addresses optimizer state offloading in GRPO to prevent OOM, shares focus on generation-colocated optimizer state behavior.

Suggested labels

r0.4.0

Suggested reviewers

parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	The PR makes significant changes to optimizer state offloading behavior by adding `is_generation_colocated` guards to critical code paths in `prepare_for_training()` and `offload_before_refit()` across three policy worker implementations. According to the PR objectives, the author claims the changes provide 3-15 seconds per-step performance improvements, which qualifies this as a performance-impacting change that should include test results and performance documentation. However, the PR description explicitly states that "checklist items for contributor guidelines, tests, and documentation are listed but not completed," and no concrete before-and-after performance metrics, test execution results, or convergence validation are documented in the provided context.	Before this PR can be merged, it must include: (1) documented test results showing that existing test suites pass with the new conditional optimizer offloading logic, (2) before-and-after performance benchmarks with specific model configurations (model size, batch size, hardware, number of GPUs) that validate the claimed 3-15 second per-step improvement, (3) verification that convergence is not regressed when `is_generation_colocated` is true versus false, and (4) explicit completion of the contributor testing and documentation checklist items. The PR should transition from draft status and include comprehensive evidence that the optimizer offloading changes provide the claimed performance benefits without introducing regressions in either colocated or non-colocated scenarios.
Title Check	❓ Inconclusive	The title “fix: Make the optimizer offloading optional” accurately refers to the change of gating optimizer offload behavior, but it is too generic and does not specify the key condition under which offloading now occurs (i.e., only when generation is colocated), which could obscure the intent for someone scanning the history.	Consider revising the title to explicitly mention the new gating condition, for example: “fix: Only perform optimizer offloading when generation is colocated,” to clearly convey the primary change.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yuki-97 · 2025-10-22T05:48:38Z

I feel it's unrelated to if it's colocated, b/c colocated will free vLLM's memory before prepare_for_lp_inference, so they should be the same situation. (maybe there are some uncleaned memory in colocated case, but overall they should be the same)

I think why we offload / on-load is for increasing logprob batch size as you said. I feel it's better to apply "not offload optimizer" as an option. wdyt? @parthchadha @terrykong

youngeunkwon0405 · 2025-10-22T06:49:29Z

I feel it's unrelated to if it's colocated, b/c colocated will free vLLM's memory before prepare_for_lp_inference, so they should be the same situation. (maybe there are some uncleaned memory in colocated case, but overall they should be the same)

I think why we offload / on-load is for increasing logprob batch size as you said. I feel it's better to apply "not offload optimizer" as an option. wdyt? @parthchadha @terrykong

I see, if that is the case, I think logprob might already be able to use enough mbs to saturate the GPU MFU, because of memory saving in not-doing-bwd. Of course, I can make this tunable by NRL_OFFLOAD_OPTIMIZER_STATE_FOR_LOGPROB. I suggest the false as the default since it has huge overheads in large or MOE models.
I am open to others' opinions.

github-actions · 2025-10-22T17:32:59Z

ℹ️ File Consistency Check

Check based on commit: ffd544d (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-10-22T17:40:16Z

ℹ️ File Consistency Check

Check based on commit: 6c80e8a (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-10-22T21:49:01Z

ℹ️ File Consistency Check

Check based on commit: ee5e3a6 (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-10-22T23:56:09Z

ℹ️ File Consistency Check

Check based on commit: 525252d (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

github-actions · 2025-11-05T00:39:13Z

ℹ️ File Consistency Check

Check based on commit: 5fcd404 (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

github-actions · 2025-11-05T17:48:51Z

ℹ️ File Consistency Check

Check based on commit: 09a44c3 (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

github-actions · 2025-11-05T19:43:16Z

ℹ️ File Consistency Check

Check based on commit: 1e5fa20 (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T22:31:44Z

ℹ️ File Consistency Check

Check based on commit: 8660c14 (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

guyueh1

LGTM

guyueh1 · 2025-11-06T16:10:17Z

@youngeunkwon0405 please fix CI tests

youngeunkwon0405 · 2025-11-06T17:09:14Z

Hi @terrykong, this PR fails in test_vllm_http_server test. Do you think the CI is broken? I can not think that this failure is caused by this change.

=================================== FAILURES ===================================
____________________________ test_vllm_http_server _____________________________

cluster = <nemo_rl.distributed.virtual_cluster.RayVirtualCluster object at 0x7f3c12fb3650>
tokenizer = Qwen2TokenizerFast(name_or_path='Qwen/Qwen3-0.6B', vocab_size=151643, model_max_length=131072, is_fast=True, padding_s...),
	151668: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)

    def test_vllm_http_server(cluster, tokenizer):
        """Test that vLLM http server works."""
    
        generation_config = configure_http_server_config(tokenizer)
    
        # Ensure we can get same output
        assert generation_config["model_name"] == "Qwen/Qwen3-0.6B", (
            "Model name should be Qwen/Qwen3-0.6B to get expected output"
        )
        assert generation_config["vllm_cfg"]["tensor_parallel_size"] == 1, (
            "Tensor parallel size should be 1 to get expected output"
        )
    
        # Set to greedy for test reproducibility.
        generation_config["temperature"] = 0.0
    
        # Create vLLM generation
        vllm_generation = VllmGeneration(cluster, generation_config)
    
        # We expect one server per vLLM DP rank.
        base_urls = vllm_generation.dp_openai_server_base_urls
        assert len(base_urls) == cluster.num_gpus_per_node
    
        body = dict(
            messages=[
                {"role": "user", "content": "count to 5"},
            ],
            temperature=generation_config["temperature"],
            top_p=generation_config["top_p"],
            # We want to test the actual train flow and how this is used. So we need to get logprobs here.
            logprobs=True,
            return_tokens_as_token_ids=True,
            max_tokens=1,
        )
    
        _wait_for_vllm_http_server_spinup(base_urls[0])
    
        # Generate and check result
        response = requests.post(url=f"{base_urls[0]}/chat/completions", json=body)
        actual_result = response.json()
    
        # This result assumes this exact model. The expected result here is what the full result looks like before we standardize.
        expected_result = {
            "id": "chatcmpl-7b8c0cdeeab34fd58ad260cf44b1a408",
            "object": "chat.completion",
            "created": 1756421711,
            "model": "Qwen/Qwen3-0.6B",
            "choices": [
                {
                    "index": 0,
                    "message": {
                        "role": "assistant",
                        "content": "<think>",
                        "refusal": None,
                        "annotations": None,
                        "audio": None,
                        "function_call": None,
                        "tool_calls": [],
                        "reasoning_content": None,
                    },
                    "logprobs": {
                        "content": [
                            {
                                "token": "token_id:151667",
                                "logprob": -0.00023779425828251988,
                                "bytes": [60, 116, 104, 105, 110, 107, 62],
                                "top_logprobs": [],
                            }
                        ]
                    },
                    "finish_reason": "length",
                    "stop_reason": None,
                    "token_ids": None,
                }
            ],
            "service_tier": None,
            "system_fingerprint": None,
            "usage": {
                "prompt_tokens": 12,
                "total_tokens": 13,
                "completion_tokens": 1,
                "prompt_tokens_details": None,
            },
            "prompt_logprobs": None,
            "prompt_token_ids": None,
            "kv_transfer_params": None,
        }
    
        def _standardize(d: dict) -> dict:
            d = deepcopy(d)
            d.pop("id")
            d.pop("created")
            # We don't want to implicate log prob accuracy in this test.
            d["choices"][0]["logprobs"]["content"][0].pop("logprob")
    
            return d
    
>       assert _standardize(expected_result) == _standardize(actual_result)
E       AssertionError: assert {'choices': [...pletion', ...} == {'choices': [...pletion', ...}
E         
E         Omitting 8 identical items, use -vv to show
E         Differing items:
E         {'model': 'Qwen/Qwen3-0.6B'} != {'model': '/home/TestData/nemo-rl/hf_home/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca'}
E         Use -v to get more diff

unit/models/generation/test_vllm_generation.py:1189: AssertionError

terrykong · 2025-11-06T17:32:26Z

@youngeunkwon0405 Could you try updating with main and testing just to ensure it's not a transient error?

youngeunkwon0405 · 2025-11-06T17:37:24Z

@youngeunkwon0405 Could you try updating with main and testing just to ensure it's not a transient error?

I already did this once. It was happening repeatedly.
Looks like this PR #1422 is facing the same issue.

github-actions · 2025-11-06T19:18:52Z

ℹ️ File Consistency Check

Check based on commit: ab089fa (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

github-actions · 2025-11-06T23:01:55Z

ℹ️ File Consistency Check

Check based on commit: d80a914 (PR #1404 from remove-logprob-loading)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

youngeunkwon0405 · 2025-11-07T04:40:36Z

Hi @terrykong, now it passes the CI. Could I ask for your final review and possibly a merge as well?

youngeunkwon0405 requested review from a team as code owners October 22, 2025 00:44

youngeunkwon0405 self-assigned this Oct 22, 2025

youngeunkwon0405 marked this pull request as draft October 22, 2025 00:45

youngeunkwon0405 changed the title ~~Do the optimizer offloading only in colocated case~~ fix: Do the optimizer offloading only in colocated case Oct 22, 2025

youngeunkwon0405 requested review from guyueh1, parthchadha and yuki-97 October 22, 2025 03:58

youngeunkwon0405 marked this pull request as ready for review October 22, 2025 05:42

youngeunkwon0405 force-pushed the remove-logprob-loading branch from ffd544d to 6c80e8a Compare October 22, 2025 17:39

youngeunkwon0405 added r0.4.0 Performance Related to improving performance labels Oct 22, 2025

youngeunkwon0405 requested a review from terrykong October 22, 2025 18:08

youngeunkwon0405 requested a review from a team as a code owner October 22, 2025 21:48

youngeunkwon0405 changed the title ~~fix: Do the optimizer offloading only in colocated case~~ fix: Make the optimizer offloading optional Oct 22, 2025

youngeunkwon0405 added the CI:L1 Run doctests, unit tests, and functional tests label Oct 22, 2025

youngeunkwon0405 temporarily deployed to nemo-ci October 22, 2025 22:15 — with GitHub Actions Inactive

youngeunkwon0405 temporarily deployed to nemo-ci October 22, 2025 23:23 — with GitHub Actions Inactive

youngeunkwon0405 requested review from a team as code owners October 22, 2025 23:55

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 22, 2025

add offload_optimizer_for_logprob to default configs

5fcd404

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 5, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 5, 2025 00:39 — with GitHub Actions Inactive

youngeunkwon0405 temporarily deployed to nemo-ci November 5, 2025 00:40 — with GitHub Actions Inactive

youngeunkwon0405 commented Nov 5, 2025

View reviewed changes

Comment thread nemo_rl/models/policy/dtensor_policy_worker_v2.py

add offload_optimizer_for_logprob to reward model environment config

09a44c3

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 5, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 5, 2025 17:49 — with GitHub Actions Inactive

add offload_optimizer_for_logprob to test configs

1e5fa20

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 5, 2025

Merge branch 'main' into remove-logprob-loading

8660c14

guyueh1 previously approved these changes Nov 6, 2025

View reviewed changes

Merge branch 'main' into remove-logprob-loading

ab089fa

fix the logic error

d80a914

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

terrykong approved these changes Nov 7, 2025

View reviewed changes

Conversation

youngeunkwon0405 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Performance (logprob_inference_prep + training_prep)

Nsys rep (QWEN3 30B GBS64)

Current ToT

This PR

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Oct 22, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

coderabbitai Bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

yuki-97 commented Oct 22, 2025

Uh oh!

youngeunkwon0405 commented Oct 22, 2025

Uh oh!

github-actions Bot commented Oct 22, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions Bot commented Oct 22, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions Bot commented Oct 22, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions Bot commented Oct 22, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions Bot commented Nov 5, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Uh oh!

github-actions Bot commented Nov 5, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions Bot commented Nov 5, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions Bot commented Nov 5, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

guyueh1 left a comment

Choose a reason for hiding this comment

Uh oh!

guyueh1 commented Nov 6, 2025

Uh oh!

youngeunkwon0405 commented Nov 6, 2025

Uh oh!

terrykong commented Nov 6, 2025

Uh oh!

youngeunkwon0405 commented Nov 6, 2025

Uh oh!

github-actions Bot commented Nov 6, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

youngeunkwon0405 commented Oct 22, 2025 •

edited

Loading

coderabbitai Bot commented Oct 22, 2025 •

edited

Loading