Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/tutorials/open-deep-research.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: "Train a deep research agent to exceed SOTA performance using GRPO
icon: "magnifying-glass"
---

This tutorial demonstrates how to train an LLM using GRPO to exceed SOTA performance at deep research. Specifically, you will be using the [ART](https://github.com/OpenPipe/ART) library to specialize an agent for [Langchain's open deep research](https://github.com/langchain-ai/open_deep_research) framework, and will evaluate your agent's performance using [DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents](https://github.com/Ayanami0730/deep_research_bench).
This tutorial demonstrates how to train your own deep research agent using GRPO to exceed Sonnet 4's perfromance. Specifically, you will be using the [ART](https://github.com/OpenPipe/ART) library to specialize Qwen 2.5 14B for [Langchain's open deep research](https://github.com/langchain-ai/open_deep_research) framework, and will evaluate your agent's performance using [DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents](https://github.com/Ayanami0730/deep_research_bench).

In addition to the GRPO training step, you will also run an initial SFT training run to improve the model's baseline performance.

Expand All @@ -21,13 +21,13 @@ In addition to the GRPO training step, you will also run an initial SFT training

Reading time: <b>45 min</b>

Training time: <b>30hr</b>
Training time: <b>30 hr</b>

Total cost: <b>~$100</b>
Total cost: <b>~$350</b>

</Info>

## Step 1: Clone the starter repo and install dependencies
### Step 1: Clone the starter repo and install dependencies

To get started, clone [Open Deep Research Training](https://github.com/OpenPipe/open_deep_research_training), which contains the following pieces of our RL pipeline:

Expand All @@ -40,19 +40,19 @@ Once the repository is cloned, install dependencies. If you haven't already, ins

Then install the project dependencies by running `uv sync`.

### 2. Install SkyPilot/RunPod
### Step 2: Install SkyPilot/RunPod

We'll be using `LocalBackend` to manage the GPU that your model will be trained on. In order to provision a GPU for your training run, you'll need to have SkyPilot installed on your machine and provide it with the credentials to spin up machines on at least one infra provider.

We recommend using RunPod because of their ease of use, but any infra provider that SkyPilot [supports](https://docs.skypilot.co/en/latest/overview.html#bringing-your-infra) will work.

Follow RunPod's **Getting Started** guide [here](https://docs.runpod.io/integrations/skypilot/). You'll have to provide a credit card to use RunPod, but you'll only pay for the time your GPUs are running.

### 3. Set up optional environment variables found in `.env.example`
### Step 3: Set up optional environment variables found in `.env.example`

Copy `.env.example` to `.env` at the root of the repository, and fill in the values for the environment variables. If you're unsure about any of the values, refer to [ENV_INSTRUCTIONS.md](https://github.com/OpenPipe/open_deep_research_training/blob/main/ENV_INSTRUCTIONS.md).

### 4. Run the training scripts
### Step 4: Run the training scripts

You'll want to run these scripts in this order:

Expand Down Expand Up @@ -87,7 +87,7 @@ The following steps execute when a training run on a new cluster begins:
- **Upload the final model checkpoint.**
- This usually takes a few minutes.

### 5. Generate the benchmarks
### Step 5: Generate the benchmarks

Run the benchmark script to evaluate your trained models:

Expand All @@ -103,7 +103,7 @@ This script will:

Then run the `display_benchmarks.ipynb` notebook to visualize the results and generate comparison charts.

### 6. Shutting down the cluster
### Step 6: Shutting down the cluster

When you're done training and running benchmarks, you can shut down the cluster by running:

Expand Down Expand Up @@ -144,7 +144,7 @@ To learn more about ART, check out another tutorial or look through our notebook
</div>
<div className="card-wrapper">
<Card
title="All Notebooks"
title="ART Notebooks"
icon="book"
href="/getting-started/notebooks"
horizontal={true}
Expand Down