Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 5 additions & 21 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,31 +67,19 @@ To create a new release:
- Publish the curated release notes
- Build and publish the package to PyPI

Then follow the SkyPilot or Local Training instructions below.
Then follow the GPU training instructions below.

### SkyPilot
### GPU Training (Local or Cloud VM)

Copy the `.env.example` file to `.env` and set the environment variables:

```bash
cp .env.example .env
```

Ensure you have a valid SkyPilot cloud available:
Make sure you're on a machine with at least one H100 or A100-80GB GPU. Machines equipped with lower-end GPUs may work, but training will be slower.

```bash
uv run sky check
```

Launch a cluster:

```bash
./scripts/launch-cluster.sh # you can pass any sky launch arguments here
```

Make sure you are on a machine with at least one H100 or A100-80GB GPU. Machines equipped with lower-end GPUs may work, but training will be slower.

You can now SSH into the `art` cluster, using either VSCode or the command line.
If you're using a cloud VM, you can SSH into the machine using either VSCode or the command line.

### Connecting via Command Line

Expand Down Expand Up @@ -145,8 +133,4 @@ If you run into any issues, the training output is set to maximum verbosity. Cop

### Cleaning Up

When you're done, you can tear down the cluster with:

```bash
uv run sky down art
```
When you're done, shut down your GPU instance (if using a cloud VM) or stop the local training process.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,6 @@ ART stands on the shoulders of giants. While we owe many of the ideas and early
- [vLLM](https://github.com/vllm-project/vllm)
- [trl](https://github.com/huggingface/trl)
- [torchtune](https://github.com/pytorch/torchtune)
- [SkyPilot](https://github.com/skypilot-org/skypilot)

Finally, thank you to our partners who've helped us test ART in the wild! We're excited to see what you all build with it.

Expand Down
24 changes: 0 additions & 24 deletions dev/test_skypilot/launch.py

This file was deleted.

27 changes: 0 additions & 27 deletions dev/test_skypilot/launch_tail.py

This file was deleted.

49 changes: 0 additions & 49 deletions dev/test_skypilot/register_model.py

This file was deleted.

2 changes: 1 addition & 1 deletion docs/features/checkpoint-deletion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ To delete all but the most recent and best-performing checkpoints of a model, ca

```python
import art
# also works with LocalBackend and SkyPilotBackend
# also works with LocalBackend
from art.serverless.backend import ServerlessBackend

model = art.TrainableModel(
Expand Down
81 changes: 5 additions & 76 deletions docs/fundamentals/art-backend.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,6 @@ While the backend's training and inference settings are highly configurable, the
arrow={true}
></Card>
</div>
<div className="card-wrapper">
<Card
title="SkyPilotBackend"
icon="cloud"
href="/fundamentals/art-backend#skypilotbackend"
horizontal={true}
arrow={true}
></Card>
</div>
<div className="card-wrapper">
<Card
title="LocalBackend"
Expand All @@ -40,17 +31,14 @@ While the backend's training and inference settings are highly configurable, the
</div>
</div>

## Managed, remote, or local training
## Managed or local training

ART provides three backend classes:
ART provides two backend classes:

* `ServerlessBackend` - train remotely on autoscaling GPUs
* `SkyPilotBackend` - train remotely on self-managed infra
* `LocalBackend` - run your agent and training code on the same machine

If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `SkyPilotBackend` or `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters. `SkyPilotBackend` lets you use your own infra.

All three backend types implement the `art.Backend` class and the client interacts with all three in the exact same way. Under the hood, `SkyPilotBackend` configures a remote machine equipped with a GPU to run `LocalBackend`, and forwards requests from the client to the remote instance. `ServerlessBackend` runs within W&B Training clusters and autoscales GPUs to meet training and inference demand.
If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use `LocalBackend`. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use `ServerlessBackend` instead. `ServerlessBackend` optimizes speed and cost by autoscaling across managed clusters.

### ServerlessBackend

Expand All @@ -67,59 +55,6 @@ backend = ServerlessBackend(

As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.

### SkyPilotBackend

To use SkyPilotBackend, you'll need to install the optional dependency:

```bash
pip install openpipe-art[skypilot]
```

When a `SkyPilotBackend` instance is initialized, it does a few things:

- Provisions a remote machine with an advanced GPU (by default on RunPod)
- Installs `openpipe-art` and its dependencies
- Initializes a `LocalBackend` instance with vLLM and a training server (unsloth or torchtune)
- Registers the `LocalBackend` instance to forward requests to it over http

To initialize a `SkyPilotBackend` instance, follow the code sample below:

```python
from art.skypilot import SkyPilotBackend

backend = await SkyPilotBackend.initialize_cluster(
# name of the cluster in SkyPilot's registry
cluster_name="my-cluster",
# version of openpipe-art that should be installed on the remote cluster
# default to version installed on the client
art_version="0.3.12",
# path to environment variables (e.g. WANDB_API_KEY) to set on the remote cluster
env_path=".env",
# the GPU the cluster is equipped with
gpu="H100"
# alternatively, more complicated requirements can be specified in
# the `resources` argument
)
```

When a training job is finished, you can shut down a cluster either through code or the CLI.

**Code:**

```python
backend = await SkyPilotBackend.initialize_cluster(...)

# ...training code...

backend.down()
```

**CLI:**

```bash
uv run sky down my-cluster
```

### LocalBackend

The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
Expand Down Expand Up @@ -148,12 +83,6 @@ BACKEND_TYPE = "serverless"
if BACKEND_TYPE == "serverless":
from art.serverless.backend import ServerlessBackend
backend = await ServerlessBackend()
else if BACKEND_TYPE="remote":
from art.skypilot import SkyPilotBackend
backend = await SkyPilotBackend.initialize_cluster(
cluster_name="my-cluster",
gpu="H100"
)
else:
from art.local import LocalBackend
backend = LocalBackend()
Expand Down Expand Up @@ -182,12 +111,12 @@ To see `LocalBackend` and `ServerlessBackend` in action, try the examples below.
<div className="card-wrapper">
<Card
title="Summarizer"
icon="cloud"
icon="laptop-code"
href="/tutorials/summarizer"
horizontal={true}
arrow={true}
>
Use SkyPilotBackend to train a SOTA summarizing agent.
Use LocalBackend to train a SOTA summarizing agent.
</Card>
</div>
</div>
14 changes: 0 additions & 14 deletions docs/fundamentals/art-client.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,6 @@ If you're curious about how ART allows you to run training and inference either
>
Run training and inference on autoscaling GPUs.
</Card>
<Card
title="SkyPilotBackend"
icon="cloud"
href="/fundamentals/art-backend#skypilotbackend"
horizontal={true}
arrow={true}
>
Run training and inference on a separate ephemeral machine.
</Card>
<Card
title="LocalBackend"
icon="laptop-code"
Expand Down Expand Up @@ -63,11 +54,6 @@ Once you've initialized your [backend](/fundamentals/art-backend), you can regis
# managed training
backend = ServerlessBackend()

# remote training
backend = SkyPilotBackend.initialize_cluster(
cluster_name="art", gpu="H100"
)

# local training
backend = LocalBackend()

Expand Down
30 changes: 0 additions & 30 deletions docs/getting-started/installation-setup.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -65,36 +65,6 @@ await model.register(backend)
... the rest of your code ...
```

### Running the server on remote dedicated GPUs

The ART client can also be run locally and connected to a remote server, which ART will automatically provision for you. To use SkyPilot, you'll need to install the optional dependency:

```bash
pip install openpipe-art[skypilot]
```

Then you can use SkyPilotBackend in your code:

```python
from art import TrainableModel, gather_trajectory_groups
from art.skypilot.backend import SkyPilotBackend

backend = await SkyPilotBackend.initialize_cluster(
cluster_name="my-cluster",
gpu="H100"
)

model = TrainableModel(
name="agent-001",
project="my-agentic-task",
base_model="OpenPipe/Qwen3-14B-Instruct",
)

await model.register(backend)

... the rest of your code ...
```

To learn more about the ART client and server, see the docs below.

<div className="cards-container">
Expand Down
4 changes: 2 additions & 2 deletions docs/integrations/langgraph-integration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ To use ART with LangGraph, install ART with the required extras:
uv pip install -U openpipe-art[backend,langgraph]>=0.4.9
```

The `langgraph` extra includes the LangGraph integration dependencies, while `backend` provides the training backend components. If running using the [SkyPilotBackend](/fundamentals/art-backend#skypilotbackend), substitute `skypilot` for `backend` in the extras array.
The `langgraph` extra includes the LangGraph integration dependencies, while `backend` provides the training backend components.

## Why Use ART with LangGraph?

Expand Down Expand Up @@ -264,7 +264,7 @@ from art.utils import iterate_dataset

# Initialize model and backend
model = art.Model(name="Qwen/Qwen2.5-7B-Instruct")
backend = art.backends.SkyPilotBackend()
backend = art.LocalBackend()

# Data models
class EmailResult(BaseModel):
Expand Down
Loading