Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 21 additions & 32 deletions content/docs/start/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,39 +21,34 @@ the [`example-dvc-experiments`][ede] project.

<details>

### ⚙️ Installing the example project
## ⚙️ Initializing a project with DVC experiments

These commands are run in the [`example-dvc-experiments`][ede] project. You can
run the commands in this document after cloning the repository, installing the
requirements, and pulling the data.
If you already have a DVC project, that's great. You can start to use `dvc exp`
commands right away to run experiments in your project. (See the [User Guide]
for detailed information.) Here, we briefly discuss how to structure an ML
project with DVC experiments using `dvc exp init`.

#### Clone the project and create virtual environment
[user guide]: /doc/user-guide/experiment-management/experiments-overview

Please clone the project and create a virtual environment.

> We strongly recommend to create a virtual environment to keep the libraries we
> use isolated from the rest of your system. This prevents version conflicts.
A typical machine learning project has data, a set of scripts that train a
model, a bunch of hyperparameters that tune training and models, and outputs
metrics and plots to evaluate the models. `dvc exp init` has sane defaults about
the names of these elements to initialize a project:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


```dvc
$ git clone https://github.com/iterative/example-dvc-experiments -b get-started
$ cd example-dvc-experiments
$ virtualenv .venv
$ . .venv/bin/activate
$ python -m pip install -r requirements.txt
$ dvc exp init python src/train.py
```

#### Get the dataset
Here, `python src/train.py` specifies how you run experiments. It could be any
other command.

The repository we cloned doesn't contain the dataset. Instead of storing the
data in the Git repository, we use DVC to retrieve from a shared data store. In
this case, we use `dvc pull` to update the missing data files.
If your project uses different names for them, you can set directories for
source code (default: `src/`), data (`data/`), models (`models/`), plots
(`plots/`), and files for hyperparameters (`params.yaml`), metrics
(`metrics.json`) with the options supplied to `dvc exp init`.

```dvc
$ dvc pull
```

The repository already contains the necessary configuration to run the
experiments.
You can also set these options in a dialog format with
`dvc exp init --interactive`.
Comment thread
jorgeorpinel marked this conversation as resolved.

</details>

Expand All @@ -68,19 +63,13 @@ Experiment results have been applied to your workspace.
...
```

It runs the specified command (`python train.py`) in `dvc.yaml`. That command
writes the metrics values to `metrics.json`.
It runs the command we specified (`python train.py`), and creates models, plots
and metrics in respective directories.

This experiment is then associated with the values found in the parameters file
(`params.yaml`), and other dependencies (`data/images/`) with these produced
metrics.

The purpose of the `dvc exp` family of commands is to let you run, capture, and
compare the machine learning experiments at once as you iterate on your project.
The artifacts like models and metrics produced by each experiment are tracked by
DVC, and the associated parameters and metrics can be committed to Git as text
files.

You can review the experiment results with `dvc exp show` and see these metrics
and results in a nicely formatted table:

Expand Down