Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 3rdparty/NeMo-workspace/NeMo
Submodule NeMo updated from bab664 to 4b7ded
4 changes: 2 additions & 2 deletions docs/design-docs/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ NeMo RL provides two checkpoint formats for Hugging Face models: Torch distribut
A checkpoint converter is provided to convert a Torch distributed checkpoint checkpoint to Hugging Face format after training:

```sh
uv run examples/convert_dcp_to_hf.py --config=<YAML CONFIG USED DURING TRAINING> <ANY CONFIG OVERRIDES USED DURING TRAINING> --dcp-ckpt-path=<PATH TO DIST CHECKPOINT TO CONVERT> --hf-ckpt-path=<WHERE TO SAVE HF CHECKPOINT>
uv run examples/converters/convert_dcp_to_hf.py --config=<YAML CONFIG USED DURING TRAINING> <ANY CONFIG OVERRIDES USED DURING TRAINING> --dcp-ckpt-path=<PATH TO DIST CHECKPOINT TO CONVERT> --hf-ckpt-path=<WHERE TO SAVE HF CHECKPOINT>
```

Usually Hugging Face checkpoints keep the weights and tokenizer together (which we also recommend for provenance). You can copy it afterwards. Here's an end-to-end example:
Expand All @@ -14,6 +14,6 @@ Usually Hugging Face checkpoints keep the weights and tokenizer together (which
# Change to your appropriate checkpoint directory
CKPT_DIR=results/sft/step_10

uv run examples/convert_dcp_to_hf.py --config=$CKPT_DIR/config.yaml --dcp-ckpt-path=$CKPT_DIR/policy/weights --hf-ckpt-path=${CKPT_DIR}-hf
uv run examples/converters/convert_dcp_to_hf.py --config=$CKPT_DIR/config.yaml --dcp-ckpt-path=$CKPT_DIR/policy/weights --hf-ckpt-path=${CKPT_DIR}-hf
rsync -ahP $CKPT_DIR/policy/tokenizer ${CKPT_DIR}-hf/
```
4 changes: 2 additions & 2 deletions docs/guides/eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ To prepare for evaluation, first ensure your model is in the correct format, whi
### Convert DCP to HF (Optional)
If you have trained a model and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation.

Use the `examples/convert_dcp_to_hf.py` script. You'll need the path to the training configuration file (`config.yaml`), the DCP checkpoint directory, and specify an output path for the HF format model.
Use the `examples/converters/convert_dcp_to_hf.py` script. You'll need the path to the training configuration file (`config.yaml`), the DCP checkpoint directory, and specify an output path for the HF format model.

```sh
# Example for a GRPO checkpoint at step 170
uv run python examples/convert_dcp_to_hf.py \
uv run python examples/converters/convert_dcp_to_hf.py \
--config results/grpo/step_170/config.yaml \
--dcp-ckpt-path results/grpo/step_170/policy/weights/ \
--hf-ckpt-path results/grpo/hf
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/grpo-deepscaler.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ uv run examples/run_grpo_math.py --config=examples/configs/grpo-deepscaler-1.5b-
At the end of each stage, you need to specify the Hugging Face checkpoint to continue training with. To get this checkpoint, we convert a model checkpoint to a Hugging Face checkpoint with the following command:

```sh
uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
uv run examples/converters/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
```

When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node or on a single 8XA100 80GB node.
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/sft-openmathinstruct2.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The default config uses 8 GPUs (`cluster.gpus_per_node`) on 1 node (`cluster.num
Throughout training, the checkpoints of the model will be saved to the `results/sft_openmathinstruct2` folder (specified by `checkpointing.checkpoint_dir`). To evaluate the model, we first need to convert the PyTorch distributed checkpoint to Hugging Face format:

```
uv run examples/convert_dcp_to_hf.py \
uv run examples/converters/convert_dcp_to_hf.py \
--config=results/sft_openmathinstruct2/step_1855/config.yaml \
--dcp-ckpt-path=results/sft_openmathinstruct2/step_1855/policy/weights \
--hf-ckpt-path=results/sft_openmathinstruct2/step_1855/hf
Expand Down
67 changes: 67 additions & 0 deletions examples/converters/convert_megatron_to_hf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse

import yaml

from nemo_rl.models.megatron.community_import import export_model_from_megatron


def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description="Convert Torch DCP checkpoint to HF checkpoint"
)
parser.add_argument(
"--config",
type=str,
default=None,
help="Path to config.yaml file in the checkpoint directory",
)
parser.add_argument(
"--megatron-ckpt-path",
type=str,
default=None,
help="Path to Megatron checkpoint",
)
parser.add_argument(
"--hf-ckpt-path", type=str, default=None, help="Path to save HF checkpoint"
)
# Parse known args for the script
args = parser.parse_args()

return args


def main():
"""Main entry point."""
args = parse_args()

with open(args.config, "r") as f:
config = yaml.safe_load(f)

model_name = config["policy"]["model_name"]
tokenizer_name = config["policy"]["tokenizer"]["name"]

export_model_from_megatron(
hf_model_name=model_name,
input_path=args.megatron_ckpt_path,
output_path=args.hf_ckpt_path,
hf_tokenizer_path=tokenizer_name,
)


if __name__ == "__main__":
main()
45 changes: 44 additions & 1 deletion nemo_rl/models/megatron/community_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import os


def import_model_from_hf_name(hf_model_name: str, output_path: str):
if "llama" in hf_model_name.lower():
Expand All @@ -31,9 +33,50 @@ def import_model_from_hf_name(hf_model_name: str, output_path: str):
output_path=output_path,
)
else:
raise ValueError(f"Unknown model: {hf_model_name}")
raise ValueError(
f"Unknown model: {hf_model_name}. Currently, only Qwen2 and Llama are supported. "
"If you'd like to run with a different model, please raise an issue or consider adding your own converter."
)
importer.apply()
# resetting mcore state
import megatron.core.rerun_state_machine

megatron.core.rerun_state_machine.destroy_rerun_state_machine()


def export_model_from_megatron(
hf_model_name: str,
input_path: str,
output_path: str,
hf_tokenizer_path: str,
overwrite: bool = False,
):
if os.path.exists(output_path) and not overwrite:
raise FileExistsError(
f"HF checkpoint already exists at {output_path}. Delete it to run or set overwrite=True."
)

if "llama" in hf_model_name.lower():
from nemo.tron.converter.llama import HFLlamaExporter

exporter_cls = HFLlamaExporter
elif "qwen" in hf_model_name.lower():
from nemo.tron.converter.qwen import HFQwen2Exporter

exporter_cls = HFQwen2Exporter
else:
raise ValueError(
f"Unknown model: {hf_model_name}. Currently, only Qwen2 and Llama are supported. "
"If you'd like to run with a different model, please raise an issue or consider adding your own converter."
)
print(f"Exporting model {hf_model_name} to {output_path}...")
exporter = exporter_cls(
input_path=input_path,
output_path=output_path,
hf_tokenizer_path=hf_tokenizer_path,
)
exporter.apply()
# resetting mcore state
import megatron.core.rerun_state_machine

megatron.core.rerun_state_machine.destroy_rerun_state_machine()
Loading
Loading