NVIDIA-NeMo · terrykong · Jun 28, 2025 · Jun 25, 2025 · Jun 25, 2025 · Jun 26, 2025
@@ -5,7 +5,7 @@ NeMo RL provides two checkpoint formats for Hugging Face models: Torch distribut
 A checkpoint converter is provided to convert a Torch distributed checkpoint checkpoint to Hugging Face format after training:
 
 ```sh
-uv run examples/convert_dcp_to_hf.py --config=<YAML CONFIG USED DURING TRAINING> <ANY CONFIG OVERRIDES USED DURING TRAINING> --dcp-ckpt-path=<PATH TO DIST CHECKPOINT TO CONVERT> --hf-ckpt-path=<WHERE TO SAVE HF CHECKPOINT>
+uv run examples/converters/convert_dcp_to_hf.py --config=<YAML CONFIG USED DURING TRAINING> <ANY CONFIG OVERRIDES USED DURING TRAINING> --dcp-ckpt-path=<PATH TO DIST CHECKPOINT TO CONVERT> --hf-ckpt-path=<WHERE TO SAVE HF CHECKPOINT>
 ```
 
 Usually Hugging Face checkpoints keep the weights and tokenizer together (which we also recommend for provenance). You can copy it afterwards. Here's an end-to-end example:
@@ -14,6 +14,6 @@ Usually Hugging Face checkpoints keep the weights and tokenizer together (which
 # Change to your appropriate checkpoint directory
 CKPT_DIR=results/sft/step_10
 
-uv run examples/convert_dcp_to_hf.py --config=$CKPT_DIR/config.yaml --dcp-ckpt-path=$CKPT_DIR/policy/weights --hf-ckpt-path=${CKPT_DIR}-hf
+uv run examples/converters/convert_dcp_to_hf.py --config=$CKPT_DIR/config.yaml --dcp-ckpt-path=$CKPT_DIR/policy/weights --hf-ckpt-path=${CKPT_DIR}-hf
 rsync -ahP $CKPT_DIR/policy/tokenizer ${CKPT_DIR}-hf/
 ```
@@ -9,11 +9,11 @@ To prepare for evaluation, first ensure your model is in the correct format, whi
 ### Convert DCP to HF (Optional)
 If you have trained a model and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation.
 
-Use the `examples/convert_dcp_to_hf.py` script. You'll need the path to the training configuration file (`config.yaml`), the DCP checkpoint directory, and specify an output path for the HF format model.
+Use the `examples/converters/convert_dcp_to_hf.py` script. You'll need the path to the training configuration file (`config.yaml`), the DCP checkpoint directory, and specify an output path for the HF format model.
 
 ```sh
 # Example for a GRPO checkpoint at step 170
-uv run python examples/convert_dcp_to_hf.py \
+uv run python examples/converters/convert_dcp_to_hf.py \
     --config results/grpo/step_170/config.yaml \
     --dcp-ckpt-path results/grpo/step_170/policy/weights/ \
     --hf-ckpt-path results/grpo/hf

@@ -16,7 +16,7 @@ uv run examples/run_grpo_math.py --config=examples/configs/grpo-deepscaler-1.5b-
 At the end of each stage, you need to specify the Hugging Face checkpoint to continue training with. To get this checkpoint, we convert a model checkpoint to a Hugging Face checkpoint with the following command:
 
 ```sh
-uv run examples/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
+uv run examples/converters/convert_dcp_to_hf.py --config=results/grpo-deepscaler-1.5b-8K/step_240/config.yaml --dcp-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/policy/weights --hf-ckpt-path=results/grpo-deepscaler-1.5b-8K/step_240/hf
 ```
 
 When running the next command, we use the Hugging Face checkpoint as the initial checkpoint. We train with an 8K context window for 240 steps, a 16K context window for 290 steps, and a 24K context window for 50 steps. We run all experiments on a single 8XH100 80GB node or on a single 8XA100 80GB node.

@@ -26,7 +26,7 @@ The default config uses 8 GPUs (`cluster.gpus_per_node`) on 1 node (`cluster.num
 Throughout training, the checkpoints of the model will be saved to the `results/sft_openmathinstruct2` folder (specified by `checkpointing.checkpoint_dir`). To evaluate the model, we first need to convert the PyTorch distributed checkpoint to Hugging Face format:
 
 ```
-uv run examples/convert_dcp_to_hf.py \
+uv run examples/converters/convert_dcp_to_hf.py \
     --config=results/sft_openmathinstruct2/step_1855/config.yaml \
     --dcp-ckpt-path=results/sft_openmathinstruct2/step_1855/policy/weights \
     --hf-ckpt-path=results/sft_openmathinstruct2/step_1855/hf

@@ -0,0 +1,67 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+import yaml
+
+from nemo_rl.models.megatron.community_import import export_model_from_megatron
+
+
+def parse_args():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Convert Torch DCP checkpoint to HF checkpoint"
+    )
+    parser.add_argument(
+        "--config",
+        type=str,
+        default=None,
+        help="Path to config.yaml file in the checkpoint directory",
+    )
+    parser.add_argument(
+        "--megatron-ckpt-path",
+        type=str,
+        default=None,
+        help="Path to Megatron checkpoint",
+    )
+    parser.add_argument(
+        "--hf-ckpt-path", type=str, default=None, help="Path to save HF checkpoint"
+    )
+    # Parse known args for the script
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Main entry point."""
+    args = parse_args()
+
+    with open(args.config, "r") as f:
+        config = yaml.safe_load(f)
+
+    model_name = config["policy"]["model_name"]
+    tokenizer_name = config["policy"]["tokenizer"]["name"]
+
+    export_model_from_megatron(
+        hf_model_name=model_name,
+        input_path=args.megatron_ckpt_path,
+        output_path=args.hf_ckpt_path,
+        hf_tokenizer_path=tokenizer_name,
+    )
+
+
+if __name__ == "__main__":
+    main()
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import os
+
 
 def import_model_from_hf_name(hf_model_name: str, output_path: str):
     if "llama" in hf_model_name.lower():
@@ -31,9 +33,50 @@ def import_model_from_hf_name(hf_model_name: str, output_path: str):
             output_path=output_path,
         )
     else:
-        raise ValueError(f"Unknown model: {hf_model_name}")
+        raise ValueError(
+            f"Unknown model: {hf_model_name}. Currently, only Qwen2 and Llama are supported. "
+            "If you'd like to run with a different model, please raise an issue or consider adding your own converter."
+        )
     importer.apply()
     # resetting mcore state
     import megatron.core.rerun_state_machine
 
     megatron.core.rerun_state_machine.destroy_rerun_state_machine()
+
+
+def export_model_from_megatron(
+    hf_model_name: str,
+    input_path: str,
+    output_path: str,
+    hf_tokenizer_path: str,
+    overwrite: bool = False,
+):
+    if os.path.exists(output_path) and not overwrite:
+        raise FileExistsError(
+            f"HF checkpoint already exists at {output_path}. Delete it to run or set overwrite=True."
+        )
+
+    if "llama" in hf_model_name.lower():
+        from nemo.tron.converter.llama import HFLlamaExporter
+
+        exporter_cls = HFLlamaExporter
+    elif "qwen" in hf_model_name.lower():
+        from nemo.tron.converter.qwen import HFQwen2Exporter
+
+        exporter_cls = HFQwen2Exporter
+    else:
+        raise ValueError(
+            f"Unknown model: {hf_model_name}. Currently, only Qwen2 and Llama are supported. "
+            "If you'd like to run with a different model, please raise an issue or consider adding your own converter."
+        )
+    print(f"Exporting model {hf_model_name} to {output_path}...")
+    exporter = exporter_cls(
+        input_path=input_path,
+        output_path=output_path,
+        hf_tokenizer_path=hf_tokenizer_path,
+    )
+    exporter.apply()
+    # resetting mcore state
+    import megatron.core.rerun_state_machine
+
+    megatron.core.rerun_state_machine.destroy_rerun_state_machine()