diff --git a/static/docs/commands-reference/fetch.md b/static/docs/commands-reference/fetch.md index f4dff8707c..d0e3d65c90 100644 --- a/static/docs/commands-reference/fetch.md +++ b/static/docs/commands-reference/fetch.md @@ -100,7 +100,7 @@ specified in DVC-files currently in the workspace are considered by `dvc fetch` of a DVC-file ([experiments](/doc/get-started/experiments)), not just the current one. -- `-T`, `--all-tags` - fetch cache for all tags. Similar to `-a` above +- `-T`, `--all-tags` - fetch cache for all tags. Similar to `-a` above. - `--show-checksums` - show checksums instead of file names when printing the download progress. diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index a021ef0096..c430232a75 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -23,10 +23,10 @@ In some cases it's convenient to add a data file or directory from a remote location into the workspace, such that it will be automatically updated (by `dvc repro`) when the external data source changes. Examples: -- a remote system may produce occasional data files that are used in other - projects; -- a batch process running regularly updates a data file to import; and -- a shared dataset on a remote storage that is managed and updated outside DVC. +- A remote system may produce occasional data files that are used in other + projects. +- A batch process running regularly updates a data file to import. +- A shared dataset on a remote storage that is managed and updated outside DVC. The `dvc import-url` command helps the user create such an external data dependency. The `url` argument specifies the external location of the data to be diff --git a/static/docs/commands-reference/index.md b/static/docs/commands-reference/index.md index 87f5243bbd..3ae493b0cb 100644 --- a/static/docs/commands-reference/index.md +++ b/static/docs/commands-reference/index.md @@ -1,17 +1,17 @@ # Using DVC Commands -DVC is a command-line tool. The typical use case for DVC goes as follows +DVC is a command-line tool. The typical use case for DVC goes as follows: -- In an existing Git repository, initialize a DVC repository with `dvc init`, +- In an existing Git repository, initialize a DVC repository with `dvc init`. - Copy source code files for modeling into the repository and convert the files - into DVC data files with `dvc add` command; + into DVC data files with `dvc add` command. - Process raw data files through your data processing and modeling code using - the `dvc run` command; + the `dvc run` command. - Use `--outs` option to specify `dvc run` command outputs which will be - converted to DVC data files after the code runs; + converted to DVC data files after the code runs. - Clone a git repo with the code of your ML application pipeline. However, this will not copy your DVC cache. Use [data remotes](/doc/commands-reference/remote) and `dvc push` to share the - cache (data); + cache (data). - Use `dvc repro` to quickly reproduce your pipeline on a new iteration, after your data item files or source code of your ML application are modified. diff --git a/static/docs/commands-reference/install.md b/static/docs/commands-reference/install.md index 059b88099e..670596e3f4 100644 --- a/static/docs/commands-reference/install.md +++ b/static/docs/commands-reference/install.md @@ -46,9 +46,9 @@ The installed Git hook automates executing `dvc push`. ## Installed Git hooks - Git `pre-commit` hook executes `dvc status` before `git commit` to inform the - user about the workspace status; + user about the workspace status. - Git `post-checkout` hook executes `dvc checkout` after `git checkout` to - automatically synchronize the data files with the new workspace state; + automatically synchronize the data files with the new workspace state. - Git `pre-push` hook executes `dvc push` before `git push` to upload files and directories under DVC control to remote. diff --git a/static/docs/commands-reference/pull.md b/static/docs/commands-reference/pull.md index 0e6f244666..7f668957a7 100644 --- a/static/docs/commands-reference/pull.md +++ b/static/docs/commands-reference/pull.md @@ -200,4 +200,3 @@ the `model.p.dvc` stage occurs later, its data was not pulled. Then we ran `dvc pull` specifying the last stage, `model.p.dvc`, and its data was downloaded. Finally, we ran `dvc pull` with no options to make sure that all data was already pulled with the previous commands. - diff --git a/static/docs/commands-reference/push.md b/static/docs/commands-reference/push.md index 4b04884686..da3507ecdd 100644 --- a/static/docs/commands-reference/push.md +++ b/static/docs/commands-reference/push.md @@ -339,4 +339,3 @@ Data and pipelines are up to date. And running `dvc status --cloud` verifies that indeed there are no more files to upload to the remote cache. - diff --git a/static/docs/commands-reference/run.md b/static/docs/commands-reference/run.md index 1895f865e9..724fcfa047 100644 --- a/static/docs/commands-reference/run.md +++ b/static/docs/commands-reference/run.md @@ -58,7 +58,7 @@ pipeline. dependencies can be specified like this: `-d data.csv -d process.py`. Usually, each dependency is a file or a directory with data, or a code file, or a configuration file. DVC also supports certain - [external dependencies](/doc/user-guide/external-dependencies) + [external dependencies](/doc/user-guide/external-dependencies). DVC builds a computation graph and this list of dependencies is a way to connect different stages with each other. When you run `dvc repro` to diff --git a/static/docs/commands-reference/status.md b/static/docs/commands-reference/status.md index 99a0950093..d4e1f6f0aa 100644 --- a/static/docs/commands-reference/status.md +++ b/static/docs/commands-reference/status.md @@ -71,13 +71,13 @@ outputs described in it. commands like `dvc commit` or `dvc repro`, `dvc run` should be run to update the file. Possible states are: - - _new_: output exists in workspace, but there is no corresponding checksum + - _new_: Output exists in workspace, but there is no corresponding checksum calculated and saved in the DVC-file for this output yet. - - _modified_: output or dependency exists in workspace, but the corresponding + - _modified_: Output or dependency exists in workspace, but the corresponding checksum in the DVC-file is not up to date. - - _deleted_: output or dependency does not exist in workspace, but still + - _deleted_: Output or dependency does not exist in workspace, but still referred in the DVC-file. - - _not in cache_: output exists in workspace and the corresponding checksum in + - _not in cache_: Output exists in workspace and the corresponding checksum in the DVC-file is up to date, but there is no corresponding cache entry. diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index 8baa7e1598..16a364329e 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -61,11 +61,11 @@ The detail of `Binary` depends on the way DVC was downloading and - **`Binary: True`** - displayed when DVC is downloaded/installed as one of: - Debian package (`.deb`) - file used to install packages in several Linux - distributions, like Ubuntu. + distributions, like Ubuntu - Red Hat package (`.rpm`) - file used to install packages in some Linux based distributions, such as Fedora, CentOS, etc. - - PKG file (`.pkg`) - file used to install apps on macOS. - - Windows executable (`.exe`) - file used to install applications on Windows. + - PKG file (`.pkg`) - file used to install apps on macOS + - Windows executable (`.exe`) - file used to install applications on Windows These downloads are available from our [home page](/). They ultimately contain a binary bundle, which is the executable version of a software program, @@ -76,11 +76,11 @@ The detail of `Binary` depends on the way DVC was downloading and - **`Binary: False`** - shown when DVC is downloaded and installed from: - [DVC's GitHub repository](https://github.com/iterative/dvc) - where core - source code is hosted. + source code is hosted - [The Python Package Index (PyPI)](https://pypi.org/project/dvc/) - source - code is stored as a Python package. + code is stored as a Python package - [Homebrew package manager](https://github.com/iterative/homebrew-dvc) (for - macOS systems) - source code is stored as Python package. + macOS systems) - source code is stored as Python package This method of installation involves downloading DVC source code, and following certain setup instructions (See the @@ -125,3 +125,4 @@ Platform: Linux-4.15.0-50-generic-x86_64-with-debian-buster-sid Binary: False Filesystem type (workspace): ('ext4', '/dev/sdb3') ``` + diff --git a/static/docs/get-started/agenda.md b/static/docs/get-started/agenda.md index 647654a5f6..f86f8a17c9 100644 --- a/static/docs/get-started/agenda.md +++ b/static/docs/get-started/agenda.md @@ -29,9 +29,9 @@ datasets and you want to: - Capture and save those data artifacts the same way we capture code - Track and switch between different versions of the data easily -- Being able to answer the question of how data artifacts (e.g. ML models) were +- Be able to answer the question of how data artifacts (e.g. ML models) were built in the first place -- Being able to compare them +- Be able to compare them - Bring best practices to your team and get everyone on the same page Then you are in a good place! Click the `Next` button below to start ↘ diff --git a/static/docs/understanding-dvc/how-it-works.md b/static/docs/understanding-dvc/how-it-works.md index 6e27d394e0..a5f6dc6980 100644 --- a/static/docs/understanding-dvc/how-it-works.md +++ b/static/docs/understanding-dvc/how-it-works.md @@ -90,4 +90,4 @@ -r-------- 2 501 staff 273M Jan 27 03:48 Posts-test.tsv ``` -8. DVC works on Mac, Linux ,and Windows. +8. DVC works on Mac, Linux, and Windows. diff --git a/static/docs/understanding-dvc/related-technologies.md b/static/docs/understanding-dvc/related-technologies.md index 0ca460b1d8..506131e762 100644 --- a/static/docs/understanding-dvc/related-technologies.md +++ b/static/docs/understanding-dvc/related-technologies.md @@ -9,119 +9,119 @@ process. 1. **Git**. The difference is: - - DVC extends Git by introducing the concept of _data files_ - large files - that should NOT be stored in a Git repository but still need to be tracked - and versioned. + - DVC extends Git by introducing the concept of _data files_ – large files + that should NOT be stored in a Git repository but still need to be tracked + and versioned. 2. **Workflow management tools** (pipelines and DAGs): Airflow, Luigi, etc. The differences are: - - DVC is focused on data science and modeling. As a result, DVC pipelines are - lightweight, easy to create and modify. However, DVC lacks pipeline - execution features like execution monitoring, execution error handling, and - recovering. + - DVC is focused on data science and modeling. As a result, DVC pipelines are + lightweight, easy to create and modify. However, DVC lacks pipeline + execution features like execution monitoring, execution error handling, and + recovering. - - DVC is purely a command line tool without a graphical user interface (GUI) - and doesn't run any daemons or servers. Nevertheless, DVC can generate - images with pipeline and experiment workflow visualization. + - DVC is purely a command line tool without a graphical user interface (GUI) + and doesn't run any daemons or servers. Nevertheless, DVC can generate + images with pipeline and experiment workflow visualization. 3. **Experiment management software** today is mostly designed for enterprise usage. An open-sourced experimentation tool example: http://studio.ml/. The differences are: - - DVC uses Git as the underlying platform for experiment tracking instead of - a web application. + - DVC uses Git as the underlying platform for experiment tracking instead of + a web application. - - DVC doesn't need to run any services. No graphical user interface as a - result, but we expect some GUI services will be created on top of DVC. + - DVC doesn't need to run any services. No graphical user interface as a + result, but we expect some GUI services will be created on top of DVC. - - DVC has transparent design: - [meta files and directories](/doc/user-guide/dvc-files-and-directories) - (including the data cache) have a human-readable format and can be easily - reused by external tools. + - DVC has transparent design: + [meta files and directories](/doc/user-guide/dvc-files-and-directories) + (including the data cache) have a human-readable format and can be easily + reused by external tools. 4. **Git workflows** and Git usage methodologies such as Gitflow. The differences are: - - DVC supports a new experimentation methodology that integrates easily with - a Git workflow. A separate branch should be created for each experiment, - with a subsequent merge of this branch if it was successful. + - DVC supports a new experimentation methodology that integrates easily with + a Git workflow. A separate branch should be created for each experiment, + with a subsequent merge of this branch if it was successful. - - DVC innovates by giving experimenters the ability to easily navigate - through past experiments without recomputing them. + - DVC innovates by giving experimenters the ability to easily navigate + through past experiments without recomputing them. 5) **Makefile** (and it's analogues). The differences are: - - DVC utilizes a DAG: + - DVC utilizes a DAG: - - The DAG is defined by [DVC-files](/doc/user-guide/dvc-file-format) (with - file names `.dvc` or `Dvcfile`). + - The DAG is defined by [DVC-files](/doc/user-guide/dvc-file-format) (with + file names `.dvc` or `Dvcfile`). - - One DVC-file defines one node in the DAG. All DVC-files in a repository - make up a single pipeline (think a single Makefile). All DVC-files (and - corresponding pipeline commands) are implicitly combined through their - inputs and outputs, to simplify conflict resolving during merges. + - One DVC-file defines one node in the DAG. All DVC-files in a repository + make up a single pipeline (think a single Makefile). All DVC-files (and + corresponding pipeline commands) are implicitly combined through their + inputs and outputs, to simplify conflict resolving during merges. - - DVC provides a simple command `dvc run CMD` to generate a DVC-file - automatically based on the provided command, dependencies, and outputs. + - DVC provides a simple command `dvc run CMD` to generate a DVC-file + automatically based on the provided command, dependencies, and outputs. - - File tracking: + - File tracking: - - DVC tracks files based on checksum (md5) instead of file timestamps. This - helps avoid running into heavy processes like model re-training when you - checkout a previous, trained version of a modeling code (Makefile will - retrain the model). + - DVC tracks files based on checksum (md5) instead of file timestamps. This + helps avoid running into heavy processes like model re-training when you + checkout a previous, trained version of a modeling code (Makefile will + retrain the model). - - DVC uses file timestamps and inodes for optimization. This allows DVC to - avoid recomputing all dependency files checksum, which would be highly - problematic when working with large files (10 GB+). + - DVC uses file timestamps and inodes for optimization. This allows DVC to + avoid recomputing all dependency files checksum, which would be highly + problematic when working with large files (10 GB+). 6. **Git-annex**. The differences are: - - DVC uses the idea of storing the content of large files (that you don't - want to see in your Git repository) in a local key-value store and use file - symlinks instead of the actual files. + - DVC uses the idea of storing the content of large files (that you don't + want to see in your Git repository) in a local key-value store and use file + symlinks instead of the actual files. - - DVC can use reflinks\* or hardlinks (depending on the system) instead of - symlinks to improve performance and make the user experience better. + - DVC can use reflinks\* or hardlinks (depending on the system) instead of + symlinks to improve performance and make the user experience better. - - DVC optimizes checksum calculation. + - DVC optimizes checksum calculation. - - Git-annex is a datafile-centric system whereas DVC is focused on providing - a workflow for machine learning and reproducible experiments. When a DVC or - Git-annex repository is cloned via git clone, data files won't be copied to - the local machine as file content is stored in separate data remotes. - However, [DVC-files](/doc/user-guide/dvc-file-format) (which provide the - reproducible workflow) are always included in the cloned Git repository and - hence can be recreated locally with minimal effort. + - Git-annex is a datafile-centric system whereas DVC is focused on providing + a workflow for machine learning and reproducible experiments. When a DVC or + Git-annex repository is cloned via git clone, data files won't be copied to + the local machine as file content is stored in separate data remotes. + However, [DVC-files](/doc/user-guide/dvc-file-format) (which provide the + reproducible workflow) are always included in the cloned Git repository and + hence can be recreated locally with minimal effort. - - DVC is not fundamentally bound to Git, having the option of changing the - repository format. + - DVC is not fundamentally bound to Git, having the option of changing the + repository format. 7) **Git-LFS** (Large File Storage). The differences are: - - DVC does not require special Git servers like Git-LFS demands. Any cloud - storage like S3, GCS, or on-premises SSH server can be used as a backend - for datasets and models, no additional databases, servers or infrastructure - are required. + - DVC does not require special Git servers like Git-LFS demands. Any cloud + storage like S3, GCS, or on-premises SSH server can be used as a backend + for datasets and models, no additional databases, servers or infrastructure + are required. - - DVC is not fundamentally bound to Git, having the option of changing the - repository format. + - DVC is not fundamentally bound to Git, having the option of changing the + repository format. - - DVC does not add any hooks to Git by default. To checkout data files, the - `dvc checkout` command has to be run after each `git checkout` and - `git clone` command. It gives more granularity on managing data and code - separately. Hooks could be configured to make workflow simpler. + - DVC does not add any hooks to Git by default. To checkout data files, the + `dvc checkout` command has to be run after each `git checkout` and + `git clone` command. It gives more granularity on managing data and code + separately. Hooks could be configured to make workflow simpler. - - DVC attempts to use reflinks\* and has other - [file linking options](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache). - This way the `dvc checkout` command does not actually copy data files from - cache to the workspace, as copying files is a heavy operation for large - files (30 GB+). + - DVC attempts to use reflinks\* and has other + [file linking options](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache). + This way the `dvc checkout` command does not actually copy data files from + cache to the workspace, as copying files is a heavy operation for large + files (30 GB+). - - `git-lfs` was not made with data science scenarios in mind, so it does not - provide related features (e.g. pipelines, metrics), and thus Github has a - limit of 2 GB per repository. + - `git-lfs` was not made with data science scenarios in mind, so it does not + provide related features (e.g. pipelines, metrics), and thus Github has a + limit of 2 GB per repository. --- diff --git a/static/docs/understanding-dvc/resources.md b/static/docs/understanding-dvc/resources.md index d63ce08f8a..80f21ae405 100644 --- a/static/docs/understanding-dvc/resources.md +++ b/static/docs/understanding-dvc/resources.md @@ -27,16 +27,16 @@ ## Articles -- [Using DVC to create an efficient version control system for data projects](https://medium.com/qonto-engineering/using-dvc-to-create-an-efficient-version-control-system-for-data-projects-96efd94355fe); -- [Our Machine Learning Workflow: DVC, MLFlow and Training in Docker Containers](https://medium.com/ixorthink/our-machine-learning-workflow-dvc-mlflow-and-training-in-docker-containers-5b9c80cdf804); -- [Principled Machine Learning: Practices and Tools for Efficient Collaboration](https://dev.to/robogeek/principled-machine-learning-4eho); -- [Data version control with DVC. What do the authors have to say?](https://towardsdatascience.com/data-version-control-with-dvc-what-do-the-authors-have-to-say-3c3b10f27ee); -- [Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis](https://towardsdatascience.com/why-git-and-git-lfs-is-not-enough-to-solve-the-machine-learning-reproducibility-crisis-f733b49e96e8); -- [My First Try at DVC](https://stdiff.net/MB2019051301.html); -- [Machine Learning Reproducibility crisis](https://petewarden.com/2018/03/19/the-machine-learning-reproducibility-crisis/); -- [Data Science Workflow](http://fouryears.eu/2018/11/29/the-data-science-workflow/); -- [The Data Science Workflow](https://towardsdatascience.com/the-data-science-workflow-43859db0415); -- [Data Versioning Notebook](https://www.kaggle.com/rtatman/kerneld4769833fe); +- [Using DVC to create an efficient version control system for data projects](https://medium.com/qonto-engineering/using-dvc-to-create-an-efficient-version-control-system-for-data-projects-96efd94355fe) +- [Our Machine Learning Workflow: DVC, MLFlow and Training in Docker Containers](https://medium.com/ixorthink/our-machine-learning-workflow-dvc-mlflow-and-training-in-docker-containers-5b9c80cdf804) +- [Principled Machine Learning: Practices and Tools for Efficient Collaboration](https://dev.to/robogeek/principled-machine-learning-4eho) +- [Data version control with DVC. What do the authors have to say?](https://towardsdatascience.com/data-version-control-with-dvc-what-do-the-authors-have-to-say-3c3b10f27ee) +- [Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis](https://towardsdatascience.com/why-git-and-git-lfs-is-not-enough-to-solve-the-machine-learning-reproducibility-crisis-f733b49e96e8) +- [My First Try at DVC](https://stdiff.net/MB2019051301.html) +- [Machine Learning Reproducibility crisis](https://petewarden.com/2018/03/19/the-machine-learning-reproducibility-crisis/) +- [Data Science Workflow](http://fouryears.eu/2018/11/29/the-data-science-workflow/) +- [The Data Science Workflow](https://towardsdatascience.com/the-data-science-workflow-43859db0415) +- [Data Versioning Notebook](https://www.kaggle.com/rtatman/kerneld4769833fe) - [First Impressions of Data Science Version Control](https://medium.com/@christopher.samiullah/first-impressions-of-data-science-version-control-dvc-fe96ab29cdda?sk=05e1f1d1ba16c9037046f3568956f16c) ## Slides diff --git a/static/docs/user-guide/analytics.md b/static/docs/user-guide/analytics.md index cd06d47dfc..3b154a4b66 100644 --- a/static/docs/user-guide/analytics.md +++ b/static/docs/user-guide/analytics.md @@ -11,9 +11,9 @@ current work. Anonymous aggregate user analytics allow us to prioritize fixes and features based on how, where and when people use DVC. For example: - If reflinks (depends on a file system type) are supported for most users, we - can keep cache protected mode off by default (see `dvc unprotect`); + can keep cache protected mode off by default (see `dvc unprotect`). - Collecting the OS version and the way DVC was installed allows us to decide - what versions of OS to prioritize and support; + what versions of OS to prioritize and support. - If usage of some command is negligible small it makes us think about issues with a command or documentation. diff --git a/static/docs/user-guide/autocomplete.md b/static/docs/user-guide/autocomplete.md index e49dc5c4b6..ee36122961 100644 --- a/static/docs/user-guide/autocomplete.md +++ b/static/docs/user-guide/autocomplete.md @@ -44,7 +44,7 @@ In this case, follow the steps to configure Bash as it is your active shell. First, make sure Bash completion support is installed: - On a current Linux OS (in a non-minimal installation), bash completion should - be available; + be available. - On a Mac, install with `brew install bash-completion`. The DVC specific completion script is located in this path of our main diff --git a/static/docs/user-guide/contributing-documentation.md b/static/docs/user-guide/contributing-documentation.md index 01c854acc6..a7b2db3087 100644 --- a/static/docs/user-guide/contributing-documentation.md +++ b/static/docs/user-guide/contributing-documentation.md @@ -10,14 +10,14 @@ run the website. To contribute documentation you need to know these locations: - [Content](https://github.com/iterative/dvc.org/tree/master/static/docs) - (`/static/docs`) - + (`/static/docs`): [Markdown](https://guides.github.com/features/mastering-markdown/) files of - the different pages to render dynamically in the browser; + the different pages to render dynamically in the browser. - [Images](https://github.com/iterative/dvc.org/tree/master/static/img) - (`/static/img`) - add new images, gif, svg, etc here. Reference them from the - Markdown files like this: `![](/static/img/reproducibility.png)`; + (`/static/img`): Add new images, gif, svg, etc here. Reference them from the + Markdown files like this: `![](/static/img/reproducibility.png)`. - [Sections](https://github.com/iterative/dvc.org/tree/master/src/Documentation/sidebar.json) - (`.../sidebar.json`) - edit it to register a new section for the navigation + (`.../sidebar.json`): Edit it to register a new section for the navigation menu. Merging the appropriate changes to these files into the master branch is enough diff --git a/static/docs/user-guide/contributing.md b/static/docs/user-guide/contributing.md index f95b845ec0..67e02fad8d 100644 --- a/static/docs/user-guide/contributing.md +++ b/static/docs/user-guide/contributing.md @@ -17,13 +17,13 @@ to learn how to submit your changes. ## Submitting changes - Open a new issue in the - [issue tracker](https://github.com/iterative/dvc/issues); + [issue tracker](https://github.com/iterative/dvc/issues). - Setup the [development environment](#development-environment) if you need to - run tests or [run](#running-development-version) the DVC with your changes; + run tests or [run](#running-development-version) the DVC with your changes. - Fork [DVC](https://github.com/iterative/dvc.git) and prepare necessary - changes; -- Add tests for your changes to `tests/test_*.py`; -- [Run tests](#running-tests) and make sure all of them pass; + changes. +- Add tests for your changes to `tests/test_*.py`. +- [Run tests](#running-tests) and make sure all of them pass. - Submit a pull request, referencing any issues it addresses. We will review your pull request as soon as possible. Thank you for @@ -288,11 +288,11 @@ Fixes #(Github issue id). Message types: - *component* - name of a component that this patch is affecting. Use `dvc` in a - general case; -- _short description_ - short description of the patch; -- _long description_ - If needed, longer message describing the patch in more - details; -- _github issue id_ - An id of the Github issue that this patch is addressing + general case +- _short description_ - short description of the patch +- _long description_ - if needed, longer message describing the patch in more + details +- _github issue id_ - id of the GitHub issue that this patch is addressing Example: diff --git a/static/docs/user-guide/dvc-file-format.md b/static/docs/user-guide/dvc-file-format.md index 57ad6bcc6b..dc6e1794fb 100644 --- a/static/docs/user-guide/dvc-file-format.md +++ b/static/docs/user-guide/dvc-file-format.md @@ -45,25 +45,25 @@ outs: On the top level, `.dvc` file consists of these fields: -- `cmd`: a command that is being run in this stage -- `deps`: a list of dependencies for this stage -- `outs`: a list of outputs for this stage +- `cmd`: Command that is being run in this stage +- `deps`: List of dependencies for this stage +- `outs`: List of outputs for this stage - `md5`: md5 checksum for this DVC-file -- `locked`: whether or not this stage is locked from reproduction -- `wdir`: directory to run command in (default `.`) +- `locked`: Whether or not this stage is locked from reproduction +- `wdir`: Directory to run command in (default `.`) A dependency entry consists of a pair of fields: -- `path`: path to the dependency, relative to the `wdir` path (always present) +- `path`: Path to the dependency, relative to the `wdir` path (always present) - `md5`: md5 checksum for the dependency (most [stages](/doc/commands-reference/run)) -- `etag`: strong ETag response header (only HTTP external +- `etag`: Strong ETag response header (only HTTP external dependencies created with `dvc import-url`) -- `repo`: this entry is only for DVC repository external dependencies created +- `repo`: This entry is only for DVC repository external dependencies created with `dvc import`, and in itself contains the following fields: - `url`: URL of Git repository with source DVC project - - `rev_lock`: revision or version (Git commit hash) of the DVC repo at the + - `rev_lock`: Revision or version (Git commit hash) of the DVC repo at the time of importing the dependency > See the examples in @@ -72,15 +72,15 @@ A dependency entry consists of a pair of fields: An output entry consists of these fields: -- `path`: path to the output, relative to the `wdir` path +- `path`: Path to the output, relative to the `wdir` path - `md5`: md5 checksum for the output -- `cache`: whether or not dvc should cache the output -- `metric`: whether or not this file is a metric file +- `cache`: Whether or not dvc should cache the output +- `metric`: Whether or not this file is a metric file A metric entry consists of these fields: -- `type`: type of the metrics file (e.g. raw/json/tsv/htsv/csv/hcsv) -- `xpath`: path within the metrics file to the metrics data(e.g. `AUC.value` for +- `type`: Type of the metrics file (e.g. raw/json/tsv/htsv/csv/hcsv) +- `xpath`: Path within the metrics file to the metrics data(e.g. `AUC.value` for `{"AUC": {"value": 0.624321}}`) A `meta` entry consists of `key: value` pairs such as `name: John`. A meta entry diff --git a/static/docs/user-guide/dvc-files-and-directories.md b/static/docs/user-guide/dvc-files-and-directories.md index dc43e98b0e..d5f69652ad 100644 --- a/static/docs/user-guide/dvc-files-and-directories.md +++ b/static/docs/user-guide/dvc-files-and-directories.md @@ -3,6 +3,8 @@ Once initialized in a project, DVC populates its installation directory (`.dvc/`) with special DVC internal files and directories: +### Special DVC internal files and directories + - `.dvc/config` - this is a configuration file. The config file can be edited by hand or with a special command: `dvc config`. @@ -36,9 +38,9 @@ Once initialized in a project, DVC populates its installation directory - `.dvc/updater` - this file is used store latest available version of dvc, which is used to remind user to upgrade. -- `.dvc/updater.lock` - a lock file for `.dvc/updater`. +- `.dvc/updater.lock` - lock file for `.dvc/updater` -- `.dvc/lock` - a lock file for the whole dvc project. +- `.dvc/lock` - lock file for the whole dvc project ## Structure of cache directory