Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion docs/guides/eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,46 @@
## Start Evaluation

### Start Script

**Evaluating Standard Models:**

To run evaluation using a model directly from Hugging Face Hub or a local path already in HF format, use the `run_eval.py` script.

```sh
# To run the evaluation with default config (examples/configs/eval.yaml)
uv run python examples/run_eval.py

# Specify a custom config file
uv run python examples/run_eval.py --config path/to/custom_config.yaml

# Override specific config values via command line
# Override specific config values via command line (e.g., model name)
uv run python examples/run_eval.py generation.model_name="Qwen/Qwen2.5-Math-7B-Instruct"
```

**Evaluating Models Trained with DCP Checkpoints (GRPO/SFT):**

If you have trained a model using GRPO or SFT and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation.

1. **Convert DCP to HF:**
Use the `examples/convert_dcp_to_hf.py` script. You'll need the path to the training configuration file (`config.json`), the DCP checkpoint directory, and specify an output path for the HF format model.

```sh
# Example for a GRPO checkpoint at step 170
uv run python examples/convert_dcp_to_hf.py \
--config results/grpo/step_170/config.json \
--dcp-ckpt-path results/grpo/step_170/policy/weights/ \
--hf-ckpt-path results/grpo/hf
```
*Note: Adjust the paths according to your training output directory structure.*

2. **Run Evaluation on Converted Model:**
Once the conversion is complete, run the evaluation script, overriding the `generation.model_name` to point to the directory containing the converted HF model.

```sh
# Example using the converted HF model from the previous step
uv run python examples/run_eval.py generation.model_name=$PWD/results/grpo/hf
```

### Example Output

```
Expand Down
6 changes: 3 additions & 3 deletions nemo_reinforcer/evals/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,10 +165,10 @@ def run_env_eval(vllm_generation, dataloader, env, master_config):
get_keys_from_message_log(batch["message_log"][i], ["role", "content"])
for i in range(len(batch["message_log"]))
]
_, _, rewards, _ = ray.get(env.step.remote(to_env, batch["extra_env_info"]))
env_return = ray.get(env.step.remote(to_env, batch["extra_env_info"]))

score += rewards.sum().item()
count += len(rewards)
score += env_return.rewards.sum().item()
count += len(env_return.rewards)

# Cleanup before printing results
ray.get(env.shutdown.remote())
Expand Down