diff --git a/docs/tutorials/open-deep-research.mdx b/docs/tutorials/open-deep-research.mdx index c4b57c2d7..79c08ccd9 100644 --- a/docs/tutorials/open-deep-research.mdx +++ b/docs/tutorials/open-deep-research.mdx @@ -5,7 +5,7 @@ description: "Train a deep research agent to exceed SOTA performance using GRPO icon: "magnifying-glass" --- -This tutorial demonstrates how to train an LLM using GRPO to exceed SOTA performance at deep research. Specifically, you will be using the [ART](https://github.com/OpenPipe/ART) library to specialize an agent for [Langchain's open deep research](https://github.com/langchain-ai/open_deep_research) framework, and will evaluate your agent's performance using [DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents](https://github.com/Ayanami0730/deep_research_bench). +This tutorial demonstrates how to train your own deep research agent using GRPO to exceed Sonnet 4's perfromance. Specifically, you will be using the [ART](https://github.com/OpenPipe/ART) library to specialize Qwen 2.5 14B for [Langchain's open deep research](https://github.com/langchain-ai/open_deep_research) framework, and will evaluate your agent's performance using [DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents](https://github.com/Ayanami0730/deep_research_bench). In addition to the GRPO training step, you will also run an initial SFT training run to improve the model's baseline performance. @@ -21,13 +21,13 @@ In addition to the GRPO training step, you will also run an initial SFT training Reading time: 45 min -Training time: 30hr +Training time: 30 hr -Total cost: ~$100 +Total cost: ~$350 -## Step 1: Clone the starter repo and install dependencies +### Step 1: Clone the starter repo and install dependencies To get started, clone [Open Deep Research Training](https://github.com/OpenPipe/open_deep_research_training), which contains the following pieces of our RL pipeline: @@ -40,7 +40,7 @@ Once the repository is cloned, install dependencies. If you haven't already, ins Then install the project dependencies by running `uv sync`. -### 2. Install SkyPilot/RunPod +### Step 2: Install SkyPilot/RunPod We'll be using `LocalBackend` to manage the GPU that your model will be trained on. In order to provision a GPU for your training run, you'll need to have SkyPilot installed on your machine and provide it with the credentials to spin up machines on at least one infra provider. @@ -48,11 +48,11 @@ We recommend using RunPod because of their ease of use, but any infra provider t Follow RunPod's **Getting Started** guide [here](https://docs.runpod.io/integrations/skypilot/). You'll have to provide a credit card to use RunPod, but you'll only pay for the time your GPUs are running. -### 3. Set up optional environment variables found in `.env.example` +### Step 3: Set up optional environment variables found in `.env.example` Copy `.env.example` to `.env` at the root of the repository, and fill in the values for the environment variables. If you're unsure about any of the values, refer to [ENV_INSTRUCTIONS.md](https://github.com/OpenPipe/open_deep_research_training/blob/main/ENV_INSTRUCTIONS.md). -### 4. Run the training scripts +### Step 4: Run the training scripts You'll want to run these scripts in this order: @@ -87,7 +87,7 @@ The following steps execute when a training run on a new cluster begins: - **Upload the final model checkpoint.** - This usually takes a few minutes. -### 5. Generate the benchmarks +### Step 5: Generate the benchmarks Run the benchmark script to evaluate your trained models: @@ -103,7 +103,7 @@ This script will: Then run the `display_benchmarks.ipynb` notebook to visualize the results and generate comparison charts. -### 6. Shutting down the cluster +### Step 6: Shutting down the cluster When you're done training and running benchmarks, you can shut down the cluster by running: @@ -144,7 +144,7 @@ To learn more about ART, check out another tutorial or look through our notebook