From 05228bc8c443c8b0d14994da9110d76cf353ab65 Mon Sep 17 00:00:00 2001 From: arcticfly Date: Thu, 28 Aug 2025 11:51:54 -0700 Subject: [PATCH 1/3] Add step: to steps --- docs/tutorials/open-deep-research.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/tutorials/open-deep-research.mdx b/docs/tutorials/open-deep-research.mdx index c4b57c2d7..54ec373ab 100644 --- a/docs/tutorials/open-deep-research.mdx +++ b/docs/tutorials/open-deep-research.mdx @@ -27,7 +27,7 @@ Total cost: ~$100 -## Step 1: Clone the starter repo and install dependencies +### Step 1: Clone the starter repo and install dependencies To get started, clone [Open Deep Research Training](https://github.com/OpenPipe/open_deep_research_training), which contains the following pieces of our RL pipeline: @@ -40,7 +40,7 @@ Once the repository is cloned, install dependencies. If you haven't already, ins Then install the project dependencies by running `uv sync`. -### 2. Install SkyPilot/RunPod +### Step 2: Install SkyPilot/RunPod We'll be using `LocalBackend` to manage the GPU that your model will be trained on. In order to provision a GPU for your training run, you'll need to have SkyPilot installed on your machine and provide it with the credentials to spin up machines on at least one infra provider. @@ -48,11 +48,11 @@ We recommend using RunPod because of their ease of use, but any infra provider t Follow RunPod's **Getting Started** guide [here](https://docs.runpod.io/integrations/skypilot/). You'll have to provide a credit card to use RunPod, but you'll only pay for the time your GPUs are running. -### 3. Set up optional environment variables found in `.env.example` +### Step 3: Set up optional environment variables found in `.env.example` Copy `.env.example` to `.env` at the root of the repository, and fill in the values for the environment variables. If you're unsure about any of the values, refer to [ENV_INSTRUCTIONS.md](https://github.com/OpenPipe/open_deep_research_training/blob/main/ENV_INSTRUCTIONS.md). -### 4. Run the training scripts +### Step 4: Run the training scripts You'll want to run these scripts in this order: @@ -87,7 +87,7 @@ The following steps execute when a training run on a new cluster begins: - **Upload the final model checkpoint.** - This usually takes a few minutes. -### 5. Generate the benchmarks +### Step 5: Generate the benchmarks Run the benchmark script to evaluate your trained models: @@ -103,7 +103,7 @@ This script will: Then run the `display_benchmarks.ipynb` notebook to visualize the results and generate comparison charts. -### 6. Shutting down the cluster +### Step 6: Shutting down the cluster When you're done training and running benchmarks, you can shut down the cluster by running: From 5aee4825e14ada90114b5edec8e291c049b1ba4d Mon Sep 17 00:00:00 2001 From: arcticfly Date: Thu, 28 Aug 2025 11:52:31 -0700 Subject: [PATCH 2/3] Increase estimated cost --- docs/tutorials/open-deep-research.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/open-deep-research.mdx b/docs/tutorials/open-deep-research.mdx index 54ec373ab..30a3c3b87 100644 --- a/docs/tutorials/open-deep-research.mdx +++ b/docs/tutorials/open-deep-research.mdx @@ -23,7 +23,7 @@ Reading time: 45 min Training time: 30hr -Total cost: ~$100 +Total cost: ~$350 From 657b5b916a350ddb99985e87f4abf238d9dd9d90 Mon Sep 17 00:00:00 2001 From: arcticfly Date: Thu, 28 Aug 2025 11:55:47 -0700 Subject: [PATCH 3/3] Update deep research tutorial --- docs/tutorials/open-deep-research.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorials/open-deep-research.mdx b/docs/tutorials/open-deep-research.mdx index 30a3c3b87..79c08ccd9 100644 --- a/docs/tutorials/open-deep-research.mdx +++ b/docs/tutorials/open-deep-research.mdx @@ -5,7 +5,7 @@ description: "Train a deep research agent to exceed SOTA performance using GRPO icon: "magnifying-glass" --- -This tutorial demonstrates how to train an LLM using GRPO to exceed SOTA performance at deep research. Specifically, you will be using the [ART](https://github.com/OpenPipe/ART) library to specialize an agent for [Langchain's open deep research](https://github.com/langchain-ai/open_deep_research) framework, and will evaluate your agent's performance using [DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents](https://github.com/Ayanami0730/deep_research_bench). +This tutorial demonstrates how to train your own deep research agent using GRPO to exceed Sonnet 4's perfromance. Specifically, you will be using the [ART](https://github.com/OpenPipe/ART) library to specialize Qwen 2.5 14B for [Langchain's open deep research](https://github.com/langchain-ai/open_deep_research) framework, and will evaluate your agent's performance using [DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents](https://github.com/Ayanami0730/deep_research_bench). In addition to the GRPO training step, you will also run an initial SFT training run to improve the model's baseline performance. @@ -21,7 +21,7 @@ In addition to the GRPO training step, you will also run an initial SFT training Reading time: 45 min -Training time: 30hr +Training time: 30 hr Total cost: ~$350 @@ -144,7 +144,7 @@ To learn more about ART, check out another tutorial or look through our notebook