From cf2c168e607e027d9650b1f987c4106c9dd7ad9e Mon Sep 17 00:00:00 2001 From: Shun Kiyono Date: Wed, 11 Jun 2025 16:07:10 +0900 Subject: [PATCH 1/2] add missing arguments Signed-off-by: Shun Kiyono --- docs/guides/grpo-deepscaler.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index 5beddf1689..d93aa4649d 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -33,7 +33,9 @@ Throughout training, the checkpoints of the model will be saved to the `results` ```sh uv run examples/run_eval.py \ - generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf + generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf \ + data.prompt_file=examples/prompts/cot.txt \ + generation.vllm_cfg.max_model_len=8192 ``` Use `generation.model_name` to specify the path to the Hugging Face checkpoint. In addition, we use AIME24 as the validation dataset and calculate pass@1 on it throughout training. From d1564af9170ce8bead504c6ff667070137278260 Mon Sep 17 00:00:00 2001 From: Shun Kiyono Date: Mon, 30 Jun 2025 10:34:32 +0900 Subject: [PATCH 2/2] 8k --> 32k Signed-off-by: Shun Kiyono --- docs/guides/grpo-deepscaler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index d93aa4649d..203934324f 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -35,7 +35,7 @@ Throughout training, the checkpoints of the model will be saved to the `results` uv run examples/run_eval.py \ generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf \ data.prompt_file=examples/prompts/cot.txt \ - generation.vllm_cfg.max_model_len=8192 + generation.vllm_cfg.max_model_len=32768 ``` Use `generation.model_name` to specify the path to the Hugging Face checkpoint. In addition, we use AIME24 as the validation dataset and calculate pass@1 on it throughout training.