From cf2c168e607e027d9650b1f987c4106c9dd7ad9e Mon Sep 17 00:00:00 2001
From: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Date: Wed, 11 Jun 2025 16:07:10 +0900
Subject: [PATCH 1/2] add missing arguments

Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
---
 docs/guides/grpo-deepscaler.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md
index 5beddf1689..d93aa4649d 100644
--- a/docs/guides/grpo-deepscaler.md
+++ b/docs/guides/grpo-deepscaler.md
@@ -33,7 +33,9 @@ Throughout training, the checkpoints of the model will be saved to the `results`
 
 ```sh
 uv run examples/run_eval.py \
-    generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf
+    generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf \
+    data.prompt_file=examples/prompts/cot.txt \
+    generation.vllm_cfg.max_model_len=8192
 ```
 
 Use `generation.model_name` to specify the path to the Hugging Face checkpoint. In addition, we use AIME24 as the validation dataset and calculate pass@1 on it throughout training.

From d1564af9170ce8bead504c6ff667070137278260 Mon Sep 17 00:00:00 2001
From: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Date: Mon, 30 Jun 2025 10:34:32 +0900
Subject: [PATCH 2/2] 8k --> 32k

Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
---
 docs/guides/grpo-deepscaler.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md
index d93aa4649d..203934324f 100644
--- a/docs/guides/grpo-deepscaler.md
+++ b/docs/guides/grpo-deepscaler.md
@@ -35,7 +35,7 @@ Throughout training, the checkpoints of the model will be saved to the `results`
 uv run examples/run_eval.py \
     generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf \
     data.prompt_file=examples/prompts/cot.txt \
-    generation.vllm_cfg.max_model_len=8192
+    generation.vllm_cfg.max_model_len=32768
 ```
 
 Use `generation.model_name` to specify the path to the Hugging Face checkpoint. In addition, we use AIME24 as the validation dataset and calculate pass@1 on it throughout training.