Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
28f25c6
Initial commit with outline of new doc and navigation
jeremydesroches Nov 14, 2020
6434169
Attempt to rename basic-concepts -> glossary, test tooltip link
jeremydesroches Nov 17, 2020
fa56570
Add tooltip field that overrides basic concept tooltip content
rogermparent Nov 18, 2020
1136b3b
Add example tooltip that shows off tooltip overrides
rogermparent Nov 18, 2020
0f99868
Merge pull request #1954 from iterative/basic-concepts-with-glossary-…
jeremydesroches Nov 19, 2020
852da9c
break out concepts, add example links in tooltips
jeremydesroches Nov 19, 2020
491b367
Update content/docs/user-guide/concepts/pipelines.md
jorgeorpinel Nov 20, 2020
98bbed1
Update content/docs/user-guide/concepts/remote.md
jorgeorpinel Nov 20, 2020
1a1a1ba
Enable glossary page doc creation
rogermparent Nov 20, 2020
9573870
Add DVC Project example for glossary page
rogermparent Nov 20, 2020
7ed4ddf
Merge concepts and glossary, add frontmatter, remove duplicate content
jeremydesroches Nov 20, 2020
53c99b7
Files -> metafiles. Comments. Reorder concepts. Nav Glossary -> Concepts
jeremydesroches Nov 20, 2020
5e9ed4f
guide: full Basic Concepts name on nav
jorgeorpinel Nov 20, 2020
05b28ca
Revert DVC Project glossary item.
jeremydesroches Nov 20, 2020
c4d45b2
Update basic concepts nav titles
jeremydesroches Nov 20, 2020
02aa8ea
Update content/docs/sidebar.json
jeremydesroches Nov 20, 2020
155c9bf
Update dvc-cache and workspace tooltips.
jeremydesroches Nov 20, 2020
5195366
Add link placeholders for basic concept tooltips.
jeremydesroches Nov 25, 2020
badc157
Outline workspace, add notes
jeremydesroches Nov 25, 2020
e3deb51
Initial extract of cache content into basic concepts
jeremydesroches Nov 25, 2020
77d82bc
Fix broken link
jeremydesroches Nov 25, 2020
b7aed74
Add remotes note
jeremydesroches Nov 25, 2020
d966b69
Extract data pipeline concept from dag -> basic concepts, add tooltip
jeremydesroches Nov 25, 2020
2b6b9cb
Extract remote storage concept from dvc remote -> basic concepts
jeremydesroches Nov 25, 2020
3d952dd
Initial extract metrics, plots -> basic concepts. Add tooltip.
jeremydesroches Nov 27, 2020
9ec88c1
Move supported file formats to Plots index description.
jeremydesroches Nov 27, 2020
487f1fb
Remove abbr from metrics and plots tooltip.
jeremydesroches Nov 27, 2020
4bb4ec6
Merge branch 'master' into ug-basic-concepts
jorgeorpinel Nov 28, 2020
e448395
Update content/docs/command-reference/dag.md
jorgeorpinel Nov 29, 2020
184339e
cmd: remove the cache index since it's a basic concept now
jorgeorpinel Nov 29, 2020
784f6bb
cmd: simplify metrics refs
jorgeorpinel Nov 29, 2020
918ab49
cmd: consistent metrics and plots index refs
jorgeorpinel Nov 29, 2020
02ac3c6
cmd: revert some changes in plots and remote refs
jorgeorpinel Nov 29, 2020
467819a
guide: add some TODOs...
jorgeorpinel Nov 29, 2020
e7df675
/glossary -> /concepts for files, nav, and js engine
jeremydesroches Nov 30, 2020
cb49542
Update new links to /concepts
jeremydesroches Nov 30, 2020
06a9cf7
Add parameters concept and initial content sample.
jeremydesroches Nov 30, 2020
db5de08
Update dvc cache link in config cmd ref
jeremydesroches Nov 30, 2020
0a423fa
Update pipeline glossary tooltip, add meta description
jeremydesroches Nov 30, 2020
231ee60
Remove abbr from data pipeline glossary tooltip
jeremydesroches Nov 30, 2020
80a5dff
Revert plots/index examples
jeremydesroches Nov 30, 2020
6b43953
Revert plots/show examples
jeremydesroches Nov 30, 2020
ffcbd0d
Revert plots/show example indent
jeremydesroches Nov 30, 2020
ac7ab00
Move metrics and plots supported file formats sections to concepts page
jeremydesroches Nov 30, 2020
2ed367f
Extracted params -> concept, removed index, added description, tooltip
jeremydesroches Dec 1, 2020
7d01b7e
Fix links: file-and-directories -> dvc cache concept
jeremydesroches Dec 1, 2020
7899e14
Fix dvc cache link -> concept
jeremydesroches Dec 1, 2020
2556cf8
Fix links to dvc params -> concept
jeremydesroches Dec 1, 2020
53901b2
Add meta descriptions for concepts
jeremydesroches Dec 1, 2020
b168db4
Fix broken link in add -> stop-tracking-data
jeremydesroches Dec 1, 2020
8a61682
Merge branch 'master' into ug-basic-concepts
jeremydesroches Dec 1, 2020
2adcf77
Add keyword notes for basic concepts
jeremydesroches Dec 2, 2020
4bbd393
Update content/docs/command-reference/config.md
jorgeorpinel Dec 2, 2020
05d380e
Fix formatting in config
jeremydesroches Dec 3, 2020
35640e6
Fix #structure-of-the-cache-directory links
jeremydesroches Dec 3, 2020
fe86a57
Update content/docs/user-guide/concepts/data-pipelines.md
jorgeorpinel Dec 5, 2020
29170fe
Update content/docs/user-guide/large-dataset-optimization.md
jorgeorpinel Dec 5, 2020
7e870d7
Add anchor to cache link
jeremydesroches Dec 10, 2020
1c8fb6b
Change links to abbr in params diff
jeremydesroches Dec 16, 2020
de28e19
Add cache directory to match, replace link with abbr in push cmd ref
jeremydesroches Dec 16, 2020
023edb9
Replace parameters links with tooltips in run
jeremydesroches Dec 16, 2020
4834c1f
Update data pipelines concept terms and description
jeremydesroches Dec 16, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/api-reference/get_url.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ URL returned depends on the
`remote` used (see the [Parameters](#parameters) section).

If the target is a directory, the returned URL will end in `.dir`. Refer to
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
[Structure of the cache directory](/doc/user-guide/concepts/dvc-cache#structure-of-the-cache-directory)
and `dvc add` to learn more about how DVC handles data directories.

⚠️ This function does not check for the actual existence of the file or
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ other DVC commands), a few actions are taken under the hood:
1. Calculate the file hash.
2. Move the file contents to the cache (by default in `.dvc/cache`), using the
file hash to form the cached file path. (See
[Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
[Structure of the cache directory](/doc/user-guide/concepts/dvc-cache#structure-of-the-cache-directory)
for more details.)
3. Attempt to replace the file with a link to the cached data (more details on
file linking further down).
Expand All @@ -60,8 +60,8 @@ files that can be easily tracked with Git.
It's possible to prevent files or directories from being added by DVC by adding
the corresponding patterns in a [`.dvcignore`](/doc/user-guide/dvcignore) file.

You can also [undo `dvc add`](/doc/user-guide/how-to/stop-tracking-data) to stop
tracking files or directories.
You can also [undo `dvc add`](/docs/user-guide/how-to/stop-tracking-data) to
stop tracking files or directories.

By default, DVC tries to use reflinks (see
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
Expand All @@ -82,8 +82,8 @@ entire tree. Instead, the single `.dvc` file references a special JSON file in
the cache (with `.dir` extension), that in turn points to the added files.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info. on `.dir` cache entries.
> [Structure of the cache directory](/doc/user-guide/concepts/dvc-cache#structure-of-the-cache-directory)
> for more info on `.dir` cache entries.

Note that DVC commands that use tracked data support granular targeting of files
and directories, even when contained in a parent directory added as a whole.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/cache/dir.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ positional arguments:
## Description

Helper to set the `cache.dir` configuration option. (See
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).)
[cache directory](/doc/user-guide/concepts/dvc-cache#structure-of-the-cache-directory).)
Unlike doing so with `dvc config cache`, `dvc cache dir` transform paths
(`value`) that are provided relative to the current working directory into paths
**relative to the config file location**. However, if the `value` provided is an
Expand Down
37 changes: 0 additions & 37 deletions content/docs/command-reference/cache/index.md

This file was deleted.

5 changes: 2 additions & 3 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,8 @@ remote. See `dvc remote` for more information.

A DVC project <abbr>cache</abbr> is the hidden storage (by default located in
the `.dvc/cache` directory) for files that are tracked by DVC, and their
different versions. (See `dvc cache` and
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
for more details.) This section contains the following options:
different versions. (See [DVC cache](/doc/user-guide/concepts/dvc-cache) for
more details.) This section contains the following options:

- `cache.dir` - set/unset cache directory location. A correct value is either an
absolute path, or a path **relative to the config file location**. The default
Expand Down
19 changes: 1 addition & 18 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,7 @@ positional arguments:

## Description

A data pipeline, in general, is a series of data processing
[stages](/doc/command-reference/run) (for example, console commands that take an
input and produce an <abbr>output</abbr>). A pipeline may produce intermediate
data, and has a final result.

Data science and machine learning pipelines typically start with large raw
datasets, include intermediate featurization and training stages, and produce a
final model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
written manually or built using the helper command `dvc run`. This allows DVC to
restore one or more pipelines later (see `dvc repro`).

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.

`dvc dag` command displays the stages of a pipeline up to the target stage. If
Displays the stages of a <abbr>data pipeline</abbr> up to the target stage. If
`target` is omitted, it will show the full project DAG.

## Options
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ $ tree .dvc/cache
Note that the `.dvc/cache` directory was created and populated.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> [Structure of the cache directory](/doc/user-guide/concepts/dvc-cache#structure-of-the-cache-directory)
> for more info.

Used without arguments (as above), `dvc fetch` downloads all files and
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/gc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ of commits (determined by reading the DVC-files in them). See the
[Options](#options) section for more details.

> Note that `dvc gc` tries to fetch any missing
> [`.dir` files](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> [`.dir` files](/doc/user-guide/concepts/dvc-cache#structure-of-the-cache-directory)
> from [remote storage](/doc/command-reference/remote) to the local
> <abbr>cache</abbr>, in order to determine which files should exist inside
> cached directories. These files may be missing if the cache directory was
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ positional arguments:

## Description

This command provides a quick way to compare metrics among experiments in the
repository history. All metrics defined in `dvc.yaml` are used by default. The
differences shown by this command include the new value, and numeric difference
(delta) from the previous value of metrics (rounded to 5 digits precision).
Provides a quick way to compare metrics among experiments in the repository
history. All metrics defined in `dvc.yaml` are used by default. The differences
shown by this command include the new value, and numeric difference (delta) from
the previous value of metrics (rounded to 5 digits precision).

`a_rev` and `b_rev` are Git commit hashes, tag, or branch names. If none are
specified, `dvc metrics diff` compares metrics currently present in the
specified, this command compares metrics currently present in the
<abbr>workspace</abbr> (uncommitted changes) with the latest committed versions
(required). A single specified revision results in comparing the workspace and
that version.
Expand Down
67 changes: 6 additions & 61 deletions content/docs/command-reference/metrics/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# metrics

A set of commands to display and compare _metrics_:
[show](/doc/command-reference/metrics/show), and
A set of commands to display and compare <abbr>metrics</abbr> (JSON, YAML
files): [show](/doc/command-reference/metrics/show), and
[diff](/doc/command-reference/metrics/diff).

## Synopsis
Expand All @@ -15,40 +15,10 @@ positional arguments:
diff Show changes in metrics between commits.
```

## Types of metrics

DVC has two concepts for metrics, that represent different results of machine
learning training or data processing:

1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_,
etc.
2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss
functions, confusion matrices, etc.

## Description

In order to follow the performance of machine learning experiments, DVC has the
ability to mark a certain stage <abbr>outputs</abbr> as metrics. These metrics
are project-specific floating-point or integer values e.g. AUC, ROC, false
positives, etc.

This type of metrics files are typically generated by user data processing code,
and are tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`)
options of `dvc run`.

In contrast to `dvc plots`, these metrics should be stored in hierarchical
files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the
numeric difference between the metrics in different experiments, for example an
`AUC` metrics that is `0.801807` and gets increase by `+0.037826`:

```dvc
$ dvc metrics diff
Path Metric Value Change
summary.json AUC 0.801807 0.037826
```

`dvc metrics` subcommands by default use the metrics files specified in
`dvc.yaml` (if any), for example `summary.json` below:
`dvc metrics` subcommands by default use all metrics files found in `dvc.yaml`
(if any), for example `summary.json` below:

```yaml
stages:
Expand All @@ -63,33 +33,8 @@ stages:
cache: false
```

> `cache: false` above specifies that `summary.json` is not tracked or
> <abbr>cached</abbr> by DVC (`-M` option of `dvc run`). These metrics files are
> normally committed with Git instead. See `dvc.yaml` for more information on
> the file format above.
Comment thread
jorgeorpinel marked this conversation as resolved.

### Supported file formats

Metrics can be organized as tree hierarchies in JSON or YAML 1.2 files. DVC
addresses specific metrics by the tree path. In the JSON example below, five
metrics are presented: `train.accuracy`, `train.loss`, `train.TN`, `train.FP`
and `time_real`.

```json
{
"train": {
"accuracy": 0.9886999726295471,
"loss": 0.041855331510305405,
"TN": 473,
"FP": 845
},
"time_real": 344.61309599876404
}
```

DVC itself does not ascribe any specific meaning for these numbers. Usually they
are produced by the model training or model evaluation code and serve as a way
to compare and pick the best performing experiment.
Note that metrics files are normally committed with Git (that's what
`cache: false` above is for). See `dvc.yaml` for more information.

## Options

Expand Down
13 changes: 6 additions & 7 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# params diff

Show changes in [parameter dependencies](/doc/command-reference/params) between
commits in the <abbr>DVC repository</abbr>, or between a commit and the
<abbr>workspace</abbr>.
Show changes in <abbr>parameter</abbr> dependencies between commits in the
<abbr>DVC repository</abbr>, or between a commit and the <abbr>workspace</abbr>.

## Synopsis

Expand All @@ -22,8 +21,8 @@ This command provides a quick way to compare parameter values among experiments
in the repository history. Requires that Git is being used to version the
project params.

> Parameter dependencies are defined with the `-p` option in `dvc run`. See also
> `dvc params`.
> <abbr>Parameter</abbr> dependencies are defined with the `-p` option in
> `dvc run`.

Run without arguments, this command compares parameters currently present in the
<abbr>workspace</abbr> (uncommitted changes) with the latest committed version.
Expand Down Expand Up @@ -51,8 +50,8 @@ itself does not ascribe any specific meaning for these values.

## Examples

Let's create a simple YAML parameters file named `params.yaml` (default params
file name, see `dvc params` to learn more):
Let's create a simple YAML parameters file named `params.yaml` (default
<abbr>params</abbr> file name):

```yaml
lr: 0.0041
Expand Down
83 changes: 18 additions & 65 deletions content/docs/command-reference/plots/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# plots

A set of commands to visualize and compare _plot metrics_ in structured files
(JSON, YAML, CSV, or TSV): [show](/doc/command-reference/plots/show),
A set of commands to visualize and compare <abbr>plots</abbr> (JSON, YAML, CSV,
or TSV files): [show](/doc/command-reference/plots/show),
[diff](/doc/command-reference/plots/diff), and
[modify](/doc/command-reference/plots/modify).

Expand All @@ -17,73 +17,26 @@ positional arguments:
modify Modify plot properties associated with a target file.
```

## Types of metrics

DVC has two concepts for metrics, that represent different results of machine
learning training or data processing:

1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_,
etc.
2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss
functions, confusion matrices, etc.

## Description

DVC provides a set of commands to visualize certain metrics of machine learning
experiments as plots. Usual plot examples are AUC curves, loss functions,
confusion matrices, among others.

This type of metrics files are created by users, or generated by user data
processing code, and can be defined in `dvc.yaml` (`plots` field) for tracking
(optional).

DVC generates plots as HTML files that can be open with a web browser. These
HTML files use [Vega-Lite](https://vega.github.io/vega-lite/). Vega is a
declarative grammar for defining plots using JSON. The plots can also be saved
as SVG or PNG image filed from the browser.

In contrast to `dvc metrics`, these metrics should be stored as data series.
Unlike its `dvc metrics` counterpart, `dvc plots diff` cannot calculate numeric
differences between the metrics in different experiments.

### Supported file formats

Plot metrics can be organized as data series in JSON, YAML 1.2, CSV, or TSV
files. DVC expects to see an array (or multiple arrays) of objects (usually
_float numbers_) in the file.

In tabular file formats such as CSV and TSV, each column is an array.
`dvc plots` subcommands can produce plots for a specified column or a set of
them. For example, `epoch`, `AUC`, and `loss` are the column names below:

`dvc plots` subcommands by default use all plots files found in `dvc.yaml` (if
any), for example `accuracy.json` below:

```yaml
stages:
train:
cmd: python train.py
deps:
- users.csv
outs:
- model.pkl
plots:
- accuracy.json:
cache: false
```
epoch, AUC, loss
34, 0.91935, 0.0317345
35, 0.91913, 0.0317829
36, 0.92256, 0.0304632
37, 0.92302, 0.0299015
```

In hierarchical file formats (JSON or YAML), an array of consistent objects is
expected: every object should have the same structure.

`dvc plots` subcommands can produce plots for a specified field or a set of
them, from the array's objects. For example, `val_loss` is one of the field
names in the `train` array below:

```
{
"train": [
{"val_accuracy": 0.9665, "val_loss": 0.10757},
{"val_accuracy": 0.9764, "val_loss": 0.07324},
{"val_accuracy": 0.8770, "val_loss": 0.08136},
{"val_accuracy": 0.8740, "val_loss": 0.09026},
{"val_accuracy": 0.8795, "val_loss": 0.07640},
{"val_accuracy": 0.8803, "val_loss": 0.07608},
{"val_accuracy": 0.8987, "val_loss": 0.08455}
]
}
```
Note that metrics files are normally committed with Git (that's what
`cache: false` above is for). See `dvc.yaml` for more information.

## Plot templates

Expand Down
Loading