treeverse · shcheklein · Aug 12, 2019 · Jul 22, 2019 · Jul 23, 2019 · Jul 26, 2019
diff --git a/static/docs/commands-reference/fetch.md b/static/docs/commands-reference/fetch.md
@@ -100,7 +100,7 @@ specified in DVC-files currently in the workspace are considered by `dvc fetch`
   of a DVC-file ([experiments](/doc/get-started/experiments)), not just the
   current one.
 
-- `-T`, `--all-tags` - fetch cache for all tags. Similar to `-a` above
+- `-T`, `--all-tags` - fetch cache for all tags. Similar to `-a` above.
 
 - `--show-checksums` - show checksums instead of file names when printing the
   download progress.

diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md
@@ -23,10 +23,10 @@ In some cases it's convenient to add a data file or directory from a remote
 location into the workspace, such that it will be automatically updated (by
 `dvc repro`) when the external data source changes. Examples:
 
-- a remote system may produce occasional data files that are used in other
-  projects;
-- a batch process running regularly updates a data file to import; and
-- a shared dataset on a remote storage that is managed and updated outside DVC.
+- A remote system may produce occasional data files that are used in other
+  projects.
+- A batch process running regularly updates a data file to import.
+- A shared dataset on a remote storage that is managed and updated outside DVC.
 
 The `dvc import-url` command helps the user create such an external data
 dependency. The `url` argument specifies the external location of the data to be

diff --git a/static/docs/commands-reference/index.md b/static/docs/commands-reference/index.md
@@ -1,17 +1,17 @@
 # Using DVC Commands
 
-DVC is a command-line tool. The typical use case for DVC goes as follows
+DVC is a command-line tool. The typical use case for DVC goes as follows:
 
-- In an existing Git repository, initialize a DVC repository with `dvc init`,
+- In an existing Git repository, initialize a DVC repository with `dvc init`.
 - Copy source code files for modeling into the repository and convert the files
-  into DVC data files with `dvc add` command;
+  into DVC data files with `dvc add` command.
 - Process raw data files through your data processing and modeling code using
-  the `dvc run` command;
+  the `dvc run` command.
 - Use `--outs` option to specify `dvc run` command outputs which will be
-  converted to DVC data files after the code runs;
+  converted to DVC data files after the code runs.
 - Clone a git repo with the code of your ML application pipeline. However, this
   will not copy your DVC cache. Use
   [data remotes](/doc/commands-reference/remote) and `dvc push` to share the
-  cache (data);
+  cache (data).
 - Use `dvc repro` to quickly reproduce your pipeline on a new iteration, after
   your data item files or source code of your ML application are modified.
diff --git a/static/docs/commands-reference/install.md b/static/docs/commands-reference/install.md
@@ -46,9 +46,9 @@ The installed Git hook automates executing `dvc push`.
 ## Installed Git hooks
 
 - Git `pre-commit` hook executes `dvc status` before `git commit` to inform the
-  user about the workspace status;
+  user about the workspace status.
 - Git `post-checkout` hook executes `dvc checkout` after `git checkout` to
-  automatically synchronize the data files with the new workspace state;
+  automatically synchronize the data files with the new workspace state.
 - Git `pre-push` hook executes `dvc push` before `git push` to upload files and
   directories under DVC control to remote.
 

diff --git a/static/docs/commands-reference/pull.md b/static/docs/commands-reference/pull.md
@@ -200,4 +200,3 @@ the `model.p.dvc` stage occurs later, its data was not pulled.
 Then we ran `dvc pull` specifying the last stage, `model.p.dvc`, and its data
 was downloaded. Finally, we ran `dvc pull` with no options to make sure that all
 data was already pulled with the previous commands.
-
diff --git a/static/docs/commands-reference/push.md b/static/docs/commands-reference/push.md
@@ -339,4 +339,3 @@ Data and pipelines are up to date.
 
 And running `dvc status --cloud` verifies that indeed there are no more files to
 upload to the remote cache.
-
diff --git a/static/docs/commands-reference/run.md b/static/docs/commands-reference/run.md
@@ -58,7 +58,7 @@ pipeline.
   dependencies can be specified like this: `-d data.csv -d process.py`. Usually,
   each dependency is a file or a directory with data, or a code file, or a
   configuration file. DVC also supports certain
-  [external dependencies](/doc/user-guide/external-dependencies)
+  [external dependencies](/doc/user-guide/external-dependencies).
 
   DVC builds a computation graph and this list of dependencies is a way to
   connect different stages with each other. When you run `dvc repro` to

diff --git a/static/docs/commands-reference/status.md b/static/docs/commands-reference/status.md
@@ -71,13 +71,13 @@ outputs described in it.
   commands like `dvc commit` or `dvc repro`, `dvc run` should be run to update
   the file. Possible states are:
 
-  - _new_: output exists in workspace, but there is no corresponding checksum
+  - _new_: Output exists in workspace, but there is no corresponding checksum
     calculated and saved in the DVC-file for this output yet.
-  - _modified_: output or dependency exists in workspace, but the corresponding
+  - _modified_: Output or dependency exists in workspace, but the corresponding
     checksum in the DVC-file is not up to date.
-  - _deleted_: output or dependency does not exist in workspace, but still
+  - _deleted_: Output or dependency does not exist in workspace, but still
     referred in the DVC-file.
-  - _not in cache_: output exists in workspace and the corresponding checksum in
+  - _not in cache_: Output exists in workspace and the corresponding checksum in
     the DVC-file is up to date, but there is no corresponding <abbr>cache</abbr>
     entry.
 

diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md
@@ -61,11 +61,11 @@ The detail of `Binary` depends on the way DVC was downloading and
 - **`Binary: True`** - displayed when DVC is downloaded/installed as one of:
 
   - Debian package (`.deb`) - file used to install packages in several Linux
-    distributions, like Ubuntu.
+    distributions, like Ubuntu
   - Red Hat package (`.rpm`) - file used to install packages in some Linux based
     distributions, such as Fedora, CentOS, etc.
-  - PKG file (`.pkg`) - file used to install apps on macOS.
-  - Windows executable (`.exe`) - file used to install applications on Windows.
+  - PKG file (`.pkg`) - file used to install apps on macOS
+  - Windows executable (`.exe`) - file used to install applications on Windows
 
   These downloads are available from our [home page](/). They ultimately contain
   a binary bundle, which is the executable version of a software program,
@@ -76,11 +76,11 @@ The detail of `Binary` depends on the way DVC was downloading and
 - **`Binary: False`** - shown when DVC is downloaded and installed from:
 
   - [DVC's GitHub repository](https://github.com/iterative/dvc) - where core
-    source code is hosted.
+    source code is hosted
   - [The Python Package Index (PyPI)](https://pypi.org/project/dvc/) - source
-    code is stored as a Python package.
+    code is stored as a Python package
   - [Homebrew package manager](https://github.com/iterative/homebrew-dvc) (for
-    macOS systems) - source code is stored as Python package.
+    macOS systems) - source code is stored as Python package
 
   This method of installation involves downloading DVC source code, and
   following certain setup instructions (See the
@@ -125,3 +125,4 @@ Platform: Linux-4.15.0-50-generic-x86_64-with-debian-buster-sid
 Binary: False
 Filesystem type (workspace): ('ext4', '/dev/sdb3')
 ```
+
diff --git a/static/docs/get-started/agenda.md b/static/docs/get-started/agenda.md
@@ -29,9 +29,9 @@ datasets and you want to:
 - Capture and save those <abbr>data artifacts</abbr> the same way we capture
   code
 - Track and switch between different versions of the data easily
-- Being able to answer the question of how data artifacts (e.g. ML models) were
+- Be able to answer the question of how data artifacts (e.g. ML models) were
   built in the first place
-- Being able to compare them
+- Be able to compare them
 - Bring best practices to your team and get everyone on the same page
 
 Then you are in a good place! Click the `Next` button below to start ↘
diff --git a/static/docs/understanding-dvc/how-it-works.md b/static/docs/understanding-dvc/how-it-works.md
@@ -90,4 +90,4 @@
    -r--------  2 501  staff   273M Jan 27 03:48 Posts-test.tsv
    ```
 
-8. DVC works on Mac, Linux ,and Windows.
+8. DVC works on Mac, Linux, and Windows.
diff --git a/static/docs/understanding-dvc/related-technologies.md b/static/docs/understanding-dvc/related-technologies.md
@@ -9,119 +9,119 @@ process.
 
 1. **Git**. The difference is:
 
-   - DVC extends Git by introducing the concept of _data files_ - large files
-     that should NOT be stored in a Git repository but still need to be tracked
-     and versioned.
+  - DVC extends Git by introducing the concept of _data files_ – large files
+    that should NOT be stored in a Git repository but still need to be tracked
+    and versioned.
 
 2. **Workflow management tools** (pipelines and DAGs): Airflow, Luigi, etc. The
    differences are:
 
-   - DVC is focused on data science and modeling. As a result, DVC pipelines are
-     lightweight, easy to create and modify. However, DVC lacks pipeline
-     execution features like execution monitoring, execution error handling, and
-     recovering.
+  - DVC is focused on data science and modeling. As a result, DVC pipelines are
+    lightweight, easy to create and modify. However, DVC lacks pipeline
+    execution features like execution monitoring, execution error handling, and
+    recovering.
 
-   - DVC is purely a command line tool without a graphical user interface (GUI)
-     and doesn't run any daemons or servers. Nevertheless, DVC can generate
-     images with pipeline and experiment workflow visualization.
+  - DVC is purely a command line tool without a graphical user interface (GUI)
+    and doesn't run any daemons or servers. Nevertheless, DVC can generate
+    images with pipeline and experiment workflow visualization.
 
 3. **Experiment management software** today is mostly designed for enterprise
    usage. An open-sourced experimentation tool example: http://studio.ml/. The
    differences are:
 
-   - DVC uses Git as the underlying platform for experiment tracking instead of
-     a web application.
+  - DVC uses Git as the underlying platform for experiment tracking instead of
+    a web application.
 
-   - DVC doesn't need to run any services. No graphical user interface as a
-     result, but we expect some GUI services will be created on top of DVC.
+  - DVC doesn't need to run any services. No graphical user interface as a
+    result, but we expect some GUI services will be created on top of DVC.
 
-   - DVC has transparent design:
-     [meta files and directories](/doc/user-guide/dvc-files-and-directories)
-     (including the data cache) have a human-readable format and can be easily
-     reused by external tools.
+  - DVC has transparent design:
+    [meta files and directories](/doc/user-guide/dvc-files-and-directories)
+    (including the data cache) have a human-readable format and can be easily
+    reused by external tools.
 
 4. **Git workflows** and Git usage methodologies such as Gitflow. The
    differences are:
 
-   - DVC supports a new experimentation methodology that integrates easily with
-     a Git workflow. A separate branch should be created for each experiment,
-     with a subsequent merge of this branch if it was successful.
+  - DVC supports a new experimentation methodology that integrates easily with
+    a Git workflow. A separate branch should be created for each experiment,
+    with a subsequent merge of this branch if it was successful.
 
-   - DVC innovates by giving experimenters the ability to easily navigate
-     through past experiments without recomputing them.
+  - DVC innovates by giving experimenters the ability to easily navigate
+    through past experiments without recomputing them.
 
 5) **Makefile** (and it's analogues). The differences are:
 
-   - DVC utilizes a DAG:
+  - DVC utilizes a DAG:
 
-     - The DAG is defined by [DVC-files](/doc/user-guide/dvc-file-format) (with
-       file names `<file>.dvc` or `Dvcfile`).
+    - The DAG is defined by [DVC-files](/doc/user-guide/dvc-file-format) (with
+      file names `<file>.dvc` or `Dvcfile`).
 
-     - One DVC-file defines one node in the DAG. All DVC-files in a repository
-       make up a single pipeline (think a single Makefile). All DVC-files (and
-       corresponding pipeline commands) are implicitly combined through their
-       inputs and outputs, to simplify conflict resolving during merges.
+    - One DVC-file defines one node in the DAG. All DVC-files in a repository
+      make up a single pipeline (think a single Makefile). All DVC-files (and
+      corresponding pipeline commands) are implicitly combined through their
+      inputs and outputs, to simplify conflict resolving during merges.
 
-     - DVC provides a simple command `dvc run CMD` to generate a DVC-file
-       automatically based on the provided command, dependencies, and outputs.
+    - DVC provides a simple command `dvc run CMD` to generate a DVC-file
+      automatically based on the provided command, dependencies, and outputs.
 
-   - File tracking:
+  - File tracking:
 
-     - DVC tracks files based on checksum (md5) instead of file timestamps. This
-       helps avoid running into heavy processes like model re-training when you
-       checkout a previous, trained version of a modeling code (Makefile will
-       retrain the model).
+    - DVC tracks files based on checksum (md5) instead of file timestamps. This
+      helps avoid running into heavy processes like model re-training when you
+      checkout a previous, trained version of a modeling code (Makefile will
+      retrain the model).
 
-     - DVC uses file timestamps and inodes for optimization. This allows DVC to
-       avoid recomputing all dependency files checksum, which would be highly
-       problematic when working with large files (10 GB+).
+    - DVC uses file timestamps and inodes for optimization. This allows DVC to
+      avoid recomputing all dependency files checksum, which would be highly
+      problematic when working with large files (10 GB+).
 
 6. **Git-annex**. The differences are:
 
-   - DVC uses the idea of storing the content of large files (that you don't
-     want to see in your Git repository) in a local key-value store and use file
-     symlinks instead of the actual files.
+  - DVC uses the idea of storing the content of large files (that you don't
+    want to see in your Git repository) in a local key-value store and use file
+    symlinks instead of the actual files.
 
-   - DVC can use reflinks\* or hardlinks (depending on the system) instead of
-     symlinks to improve performance and make the user experience better.
+  - DVC can use reflinks\* or hardlinks (depending on the system) instead of
+    symlinks to improve performance and make the user experience better.
 
-   - DVC optimizes checksum calculation.
+  - DVC optimizes checksum calculation.
 
-   - Git-annex is a datafile-centric system whereas DVC is focused on providing
-     a workflow for machine learning and reproducible experiments. When a DVC or
-     Git-annex repository is cloned via git clone, data files won't be copied to
-     the local machine as file content is stored in separate data remotes.
-     However, [DVC-files](/doc/user-guide/dvc-file-format) (which provide the
-     reproducible workflow) are always included in the cloned Git repository and
-     hence can be recreated locally with minimal effort.
+  - Git-annex is a datafile-centric system whereas DVC is focused on providing
+    a workflow for machine learning and reproducible experiments. When a DVC or
+    Git-annex repository is cloned via git clone, data files won't be copied to
+    the local machine as file content is stored in separate data remotes.
+    However, [DVC-files](/doc/user-guide/dvc-file-format) (which provide the
+    reproducible workflow) are always included in the cloned Git repository and
+    hence can be recreated locally with minimal effort.
 
-   - DVC is not fundamentally bound to Git, having the option of changing the
-     repository format.
+  - DVC is not fundamentally bound to Git, having the option of changing the
+    repository format.
 
 7) **Git-LFS** (Large File Storage). The differences are:
 
-   - DVC does not require special Git servers like Git-LFS demands. Any cloud
-     storage like S3, GCS, or on-premises SSH server can be used as a backend
-     for datasets and models, no additional databases, servers or infrastructure
-     are required.
+  - DVC does not require special Git servers like Git-LFS demands. Any cloud
+    storage like S3, GCS, or on-premises SSH server can be used as a backend
+    for datasets and models, no additional databases, servers or infrastructure
+    are required.
 
-   - DVC is not fundamentally bound to Git, having the option of changing the
-     repository format.
+  - DVC is not fundamentally bound to Git, having the option of changing the
+    repository format.
 
-   - DVC does not add any hooks to Git by default. To checkout data files, the
-     `dvc checkout` command has to be run after each `git checkout` and
-     `git clone` command. It gives more granularity on managing data and code
-     separately. Hooks could be configured to make workflow simpler.
+  - DVC does not add any hooks to Git by default. To checkout data files, the
+    `dvc checkout` command has to be run after each `git checkout` and
+    `git clone` command. It gives more granularity on managing data and code
+    separately. Hooks could be configured to make workflow simpler.
 
-   - DVC attempts to use reflinks\* and has other
-     [file linking options](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache).
-     This way the `dvc checkout` command does not actually copy data files from
-     cache to the workspace, as copying files is a heavy operation for large
-     files (30 GB+).
+  - DVC attempts to use reflinks\* and has other
+    [file linking options](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache).
+    This way the `dvc checkout` command does not actually copy data files from
+    cache to the workspace, as copying files is a heavy operation for large
+    files (30 GB+).
 
-   - `git-lfs` was not made with data science scenarios in mind, so it does not
-     provide related features (e.g. pipelines, metrics), and thus Github has a
-     limit of 2 GB per repository.
+  - `git-lfs` was not made with data science scenarios in mind, so it does not
+    provide related features (e.g. pipelines, metrics), and thus Github has a
+    limit of 2 GB per repository.
 
 ---
Original file line number	Diff line number	Diff line change
Expand Up		@@ -200,4 +200,3 @@ the `model.p.dvc` stage occurs later, its data was not pulled.
		Then we ran `dvc pull` specifying the last stage, `model.p.dvc`, and its data
		was downloaded. Finally, we ran `dvc pull` with no options to make sure that all
		data was already pulled with the previous commands.

Comment thread dnabanita7 marked this conversation as resolved.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -339,4 +339,3 @@ Data and pipelines are up to date.

		And running `dvc status --cloud` verifies that indeed there are no more files to
		upload to the remote cache.

Comment thread dnabanita7 marked this conversation as resolved.