Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,9 @@ specific one is given with `--remote`.
[remote storage](/doc/command-reference/remote) to fetch from (see
`dvc remote list`).

- `--run-cache` - downloads all available history of stage runs from the remote
repository.
- `--run-cache` - downloads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from
the remote repository. See the same option in `dvc push`.

- `-d`, `--with-deps` - determines files to download by tracking dependencies to
the `targets`. If none are provided, this option is ignored. By traversing all
Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@ used to see what files `dvc pull` would download.
[remote storage](/doc/command-reference/remote) to pull from (see
`dvc remote list`).

- `--run-cache` - downloads all available history of stage runs from the remote
repository (to the cache only, like `dvc fetch --run-cache`). Note that
`dvc repro <stage_name>` is necessary to checkout these files (into the
- `--run-cache` - downloads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from
the remote repository (to the cache only, like `dvc fetch --run-cache`). Note
that `dvc repro <stage_name>` is necessary to checkout these files (into the
workspace) and update `dvc.lock`.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data
Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ in the cache (compared to the default remote.) It can be used to see what files
[remote storage](/doc/command-reference/remote) to push to (see
`dvc remote list`).

- `--run-cache` - uploads all available history of stage runs to the remote
repository.
- `--run-cache` - uploads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to
the remote repository.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to upload data to
remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the
Expand Down
13 changes: 7 additions & 6 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,11 @@ up-to-date and only execute the final stage.
present in the DVC project. Specifying `targets` has no effects with this
option, as all possible targets are already included.

- `--no-run-cache` - execute stage commands even if they have already been run
with the same dependencies/outputs/etc. before.
- `--no-run-cache` - execute stage command(s) even if they have already been run
with the same dependencies and outputs (see the
[details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful
for example if the stage command/s is/are non-deterministic
([not recommended](/doc/command-reference/run#avoiding-unexpected-behavior)).

- `--force-downstream` - in cases like `... -> A (changed) -> B -> C` it will
reproduce `A` first and then `B`, even if `B` was previously executed with the
Expand All @@ -178,10 +181,8 @@ up-to-date and only execute the final stage.

- `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs
involved in the stages being reproduced, if they are found in the
[default](/doc/command-reference/remote/default) remote storage. Note that it
checks the local run-cache too (available history of stage runs).

> Has no effect if combined with `--no-run-cache`.
[default remote storage](/doc/command-reference/remote/default). Note that it
tries the local run-cache first (unless `--no-run-cache` is also used).

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,9 +240,10 @@ $ dvc run -n second_stage './another_script.sh $MYENVVAR'
- `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without
asking for confirmation.

- `--no-run-cache` - execute the stage `command` even if it has already been run
with the same dependencies/outputs/etc. before. Useful for example if the
command's code is non-deterministic
- `--no-run-cache` - execute the stage command(s) even if they have already been
run with the same dependencies and outputs (see the
[details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful
for example if the stage command/s is/are non-deterministic
([not recommended](#avoiding-unexpected-behavior)).

- `--no-commit` - do not store the outputs of this execution in the cache
Expand Down
11 changes: 11 additions & 0 deletions content/docs/user-guide/basic-concepts/run-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
name: 'Run-cache'
match: ['run-cache']
---

The DVC run-cache is a log of stages that have been run in the project. It's
comprised of `dvc.lock` file backups, identified as combinations of
dependencies, commands, and outputs that correspond to each other. `dvc repro`
and `dvc run` populate and reutilize the run-cache. See
[Run-cache](/doc/user-guide/project-structure/internal-files#run-cache) for more
details.
41 changes: 36 additions & 5 deletions content/docs/user-guide/project-structure/internal-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,22 @@ operation.
(credentials, private locations, etc). The local config file can be edited by
hand or with the command `dvc config --local`.

- `.dvc/cache`: The <abbr>cache</abbr> directory will store your data in a
special [structure](#structure-of-the-cache-directory). The data files and
directories in the <abbr>workspace</abbr> will only contain links to the data
files in the cache. (Refer to
- `.dvc/cache`: Default location of the <abbr>cache</abbr> directory. The cache
stores the project data in a special
[structure](#structure-of-the-cache-directory). The data files and directories
in the <abbr>workspace</abbr> will only contain links to the data files in the
cache (refer to
[Large Dataset Optimization](/doc/user-guide/large-dataset-optimization). See
`dvc config cache` for related configuration options.
`dvc config cache` for related configuration options, including changing the
its location.

> Note that DVC includes the cache directory in `.gitignore` during
> initialization. No data tracked by DVC should ever be pushed to the Git
> repository, only the <abbr>DVC files</abbr> that are needed to download or
> reproduce that data.

- `.dvc/cache/runs`: Default location of the [run-cache](#run-cache).

- `.dvc/plots`: Directory for
[plot templates](/doc/command-reference/plots#plot-templates)

Expand Down Expand Up @@ -120,3 +124,30 @@ $ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir
```

That's how DVC knows that the other two cached files belong in the directory.

### Run-cache

`dvc repro` and `dvc run` by default populate and reutilize a log of stages that
have been run in the project. It is found in the `runs/` directory inside the
cache (or [remote storage](/doc/command-reference/remote)).
Comment on lines +128 to +132
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than in the DVC Internals guide should this be elsewhere?

If anyone can think of scenarios/examples/how-tos we should consider to mention the run-cache please lmk.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. do we still want a section in run (or repro) like this: https://github.com/iterative/dvc.org/pull/1464/files ?


Runs are identified as combinations of <abbr>dependencies</abbr>, commands, and
<abbr>outputs</abbr> that correspond to each other. These combinations are
hashed into special values that make up the file paths inside the run-cache dir.
Comment thread
jorgeorpinel marked this conversation as resolved.

```dvc
$ tree .dvc/cache/runs
.dvc/cache/runs
└── 86
└── 8632e1555283d6e23ec808c9ee1fadc30630c888d5c08695333609ef341508bf
└── e98a34c44fa6b564ef211e76fb3b265bc67f19e5de2e255217d3900d8f...
```

The files themselves are backups of the `dvc.lock` file that resulted from that
run.

> Note that the run's <abbr>outputs</abbr> are stored and retrieved from the
> regular cache.

💡 `dvc push` and `dvc pull` (and `dvc fetch`) can download and upload the
run-cache to remote storage for sharing and/or as a back up.