From caeb95ba4fa14583aa6ada4424ab5c9af3725223 Mon Sep 17 00:00:00 2001 From: "Mr. Outis" Date: Mon, 27 Jan 2020 23:06:16 -0600 Subject: [PATCH 1/5] diff: update docs according to the new patch --- public/static/docs/command-reference/diff.md | 282 ++++++++----------- 1 file changed, 113 insertions(+), 169 deletions(-) diff --git a/public/static/docs/command-reference/diff.md b/public/static/docs/command-reference/diff.md index 724b6c59b8..3963e6eb7d 100644 --- a/public/static/docs/command-reference/diff.md +++ b/public/static/docs/command-reference/diff.md @@ -1,196 +1,140 @@ # diff -Show changes between versions of the DVC project. It can be -narrowed down to specific target files and directories under DVC control. +Compare two different versions of your DVC project (tracked by Git) and shows a +_list of outputs_ grouped in the following categories: _added, modified, or +deleted_. -> This command requires that the project is a [Git](https://git-scm.com/) -> repository. +> This feature is only supported when using DVC among +> [Git](https://git-scm.com/). -## Synopsis - -```usage -usage: dvc diff [-h] [-q | -v] [-t TARGET] a_ref [b_ref] - -positional arguments: - a_ref Git reference from which diff calculates - b_ref Git reference until which diff calculates, if - omitted diff shows the difference between - current HEAD and a_ref -``` - -## Description - -Given two Git commit references (commit hash, branch or tag name, etc) `a_ref` -and `b_ref`, this command shows a a summary of basic statistics: how many files -were deleted/changed, and the file size differences. - -Note that `dvc diff` does not show the line-to-line comparison among the target -files in each revision, like `git diff` does. +Note that `dvc diff` does not show the line-to-line comparison of the outputs. > For an example on how to create line-to-line text file comparison, refer to > [issue #770](https://github.com/iterative/dvc/issues/770#issuecomment-512693256) > in our GitHub repository. -If the `-t` option is used, the diff is limited to the `TARGET` file or -directory specified. - -Note that `dvc diff` does not have an effect when the repository is not tracked -by the Git SCM, for example when `dvc init` was used with the `--no-scm` option. - -## Options - -- `-t TARGET`, `--target TARGET` - path to a data file or directory. If not - specified, compares all files and directories that are under DVC control in - the workspace. - -- `-h`, `--help` - prints the usage/help message, and exit. - -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no - problems arise, otherwise 1. - -- `-v`, `--verbose` - displays detailed tracing information. - -## Examples - -For these examples we can use the chapters in our -[Get Started](/doc/get-started) section, up to -[Add Files](/doc/get-started/add-files). - -
- -### Click and expand to setup example - -Start by cloning our example repo if you don't already have it. Then move into -the repo and checkout the -[version](https://github.com/iterative/example-get-started/releases/tag/3-add-file) -corresponding to the _Add Files_ chapter: - -```dvc -$ git clone https://github.com/iterative/example-get-started -$ cd example-get-started -$ git checkout 3-add-file -``` - -Download the precomputed data using: - -```dvc -$ dvc pull -Preparing to download data from 'https://remote.dvc.org/get-started' -... -``` - -
- -## Example: Previous version of the same branch - -The minimal `dvc diff` command only includes the A reference (`a_ref`) from -which the difference is to be calculated. The B reference (`b_ref`) defaults to -Git `HEAD` (the currently checked out version). To find the general differences -with the very previous committed version of the project, we can use the `HEAD^` -Git reference. +## Synopsis -```dvc -$ dvc diff HEAD^ -dvc diff from df613bc to ed10968 +```usage +usage: dvc diff [-h] [-q | -v] [--show-json] [--checksums] [a_ref] [b_ref] -diff for 'data/data.xml' -+data/data.xml with md5 a304afb96060aad90176268345e10355 +positional arguments: + a_ref Git reference from which diff calculates (defaults to HEAD) + b_ref Git reference until which diff calculates, if omitted diff + shows the difference between the working tree and a_ref -added file with size 37.9 MB +optional arguments: + --show-json Format the output into a JSON + --checksums Display checksums for each entry ``` -## Example: Specific targets across Git references - -We can base this example in the [Metrics](/doc/get-started/metrics) and -[Compare Experiments](/doc/get-started/compare-experiments) chapters of our _Get -Started_ section, that describe different experiments to produce the `model.pkl` -file. Our example repository has the `bigrams-experiment` and -`baseline-experiment` -[tags](https://github.com/iterative/example-get-started/tags) respectively to -reference these experiments. - -
+## Description -### Click and expand to setup example +By default, it compares the working tree with the last commit tree (`HEAD`). -Having followed the previous example's setup, move into the -`example-get-started/` directory. Then make sure that you have the latest code -and data with the following commands. +You can pass two different Git [revisions](https://git-scm.com/docs/revisions) +(e.g. commit hash, branch name, tag, etc.) as arguments to specify which +versions to compare. -```dvc -$ git checkout master -$ dvc fetch -T ``` +Added: + d3b07384 file -The `-T` flag passed to `dvc fetch` makes sure we have all the data files -related to all existing tags in the repo. You take a look at the -[available tags](https://github.com/iterative/example-get-started/tags) of our -example repo. - -
- -To see the difference in `model.pkl` among these versions, we can run the -following command. +Deleted: + dc635f02 dir/ + 85f55f75 dir/1 + 703a80e0 dir/2 -```dvc -$ dvc diff -t model.pkl baseline-experiment bigrams-experiment -dvc diff from bc1722d to 8c1169d - -diff for 'model.pkl' --model.pkl with md5 a664896 -+model.pkl with md5 3863d0e - ... +Modified: + 7fbff877..9cd599a3 data.csv ``` -The output from this command confirms that there's a difference in the -`model.pkl` file between the 2 Git references we indicated. - -### What about directories? - -Unlike Git, DVC features controlling entire directories without having to add -each individual file. See `dvc add` without `--recursive` for example. `dvc run` -can also put whole directories under DVC control (when these are specified as -command dependencies or outputs). - -We can use `dvc diff` to check for changes in a directory by specifying the -directory as the target (with option `-t`). Note that we skip the `b_ref` -argument this time, that defaults to `HEAD`. - -```dvc -$ dvc diff -t data/features baseline-experiment -dvc diff from bc1722d to 8c1169d - -diff for 'data/features' --data/features with md5 3338d2c.dir -+data/features with md5 42c7025.dir - -0 files not changed, 0 files modified, 0 files added, -0 files deleted, size was increased by 2.9 MB +You can use the following options to modify the output: `--checksums` and +`--json`. + +The former will include checksums in the output, and the latter one generates a +JSON like the following: + +```json +{ + "added": [ + { "filename": "file", "checksum": "d3b07384d113edec49eaa6238ad5ff00" } + ], + "deleted": [ + { "filename": "dir/", "checksum": "dc635f02c2886e2cd79736f4a56b631f.dir" }, + { "filename": "dir/1", "checksum": "85f55f7530699d7470d4455e92981155" }, + { "filename": "dir/2", "checksum": "703a80e05b4573db5100959403e4da08" } + ], + "modified": [ + { + "filename": "data.csv", + "checksum": { + "old": "7fbff8771b9db1b495d2e404dec4334c", + "new": "9cd599a3523898e6a12e13ec787da50a" + } + } + ] +} ``` -## Example: Confirming that a target has not changed - -Let's use our example repo once again, that has several -[available tags](https://github.com/iterative/example-get-started/tags) for -conveniency. The `5-preparation` tag corresponds to the -[Connect Code and Data](/doc/get-started/connect-code-and-data) chapter of our -_Get Started_ section, where the `dvc run` command is used to create a -`prepare.dvc` stage file. This DVC-file tracks the `data/prepared` directory -output. +## Example ```dvc -$ dvc diff -t data/prepared 5-preparation -dvc diff from 3deeec1 to 8c1169d - -diff for 'data/prepared' --data/prepared with md5 6836f79.dir -+data/prepared with md5 6836f79.dir - -2 files not changed, 0 files modified, 0 files added, -0 files deleted, size was not changed +$ git init +$ dvc init +$ git commit -m "initial commit with Git and DVC" + +$ echo "first version" > file +$ dvc add file +$ dvc diff + +Added: + file + +$ git add -A +$ git commit -m "file: first version" +$ dvc diff HEAD~1 + +Added: + file + +$ echo "second version" > file +$ dvc add file +$ dvc diff + +Modified: + file + +$ dvc diff --checksums + +Modified: + 9f089b63..27f60b34 file + +$ dvc diff --checksums --show-json + +{ + "added": [], + "deleted": [], + "modified": [ + { + "filename": "file", + "checksum": { + "old": "9f089b639127e2f5a79c4eda189678d6", + "new": "27f60b341727cb8ed1de139b0da7c173" + } + } + ] +} + +$ git add -A +$ git commit -m "file: second version" + +$ mkdir data +$ echo "some text" > data/1 +$ dvc add data +$ dvc diff + +Added: + data/ + data/1 ``` - -The command above checks whether there have been any changes to the -`data/prepared` directory after the `5-preparation` version (since the `b_ref` -is the current version, `HEAD` by default). The output tells us that there have -been no changes to that directory (or to any other file). From 0865454212e3b81164d49a84955665ab66367141 Mon Sep 17 00:00:00 2001 From: Ivan Shcheklein Date: Sun, 23 Feb 2020 16:11:41 -0800 Subject: [PATCH 2/5] restore PR template --- .github/PULL_REQUEST_TEMPLATE.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index d821fa8d5b..6285eaf646 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,15 +1,9 @@ -> You may disregard these recommendations if you used the **Edit on GitHub** -> button from dvc.org to improve a doc in place. +> You may disregard these recommendations if you used the **Edit on GitHub** button from dvc.org to improve a doc in place. -❗ Please read the guidelines in the -[Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) -list if you make any substantial changes to the documentation or JS engine. +❗ Please read the guidelines in the [Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) list if you make any substantial changes to the documentation or JS engine. -🐛 Please make sure to mention `Fix #issue` (if applicable) in the description -of the PR. This causes GitHub to close it automatically when the PR is merged. +🐛 Please make sure to mention `Fix #issue` (if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged. -Please chose to -[allow us](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) -to edit your branch when creating the PR. +Please chose to [allow us](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to edit your branch when creating the PR. Thank you for the contribution - we'll try to review it as soon as possible. 🙏 From 5948eaa3ac8efd13a870d9048b51010423f915db Mon Sep 17 00:00:00 2001 From: Ivan Shcheklein Date: Sun, 23 Feb 2020 19:06:27 -0800 Subject: [PATCH 3/5] address diff command review --- public/static/docs/command-reference/diff.md | 44 ++++++++++---------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/public/static/docs/command-reference/diff.md b/public/static/docs/command-reference/diff.md index e895538f7b..05fe669cb1 100644 --- a/public/static/docs/command-reference/diff.md +++ b/public/static/docs/command-reference/diff.md @@ -18,16 +18,17 @@ positional arguments: ## Description -Output files and directories added, modified, deleted in a Git commit `b_rev` as +Prints files and directories added, modified, deleted in a Git commit `b_rev` as compared to another Git commit `a_rev`. Both `a_rev` and `b_rev` accept any -[Git revision]("https://git-scm.com/docs/gitrevisions") - branch or tag name, -Git commit hash, etc. +[Git revision](https://git-scm.com/docs/gitrevisions) - branch or tag name, Git +commit hash, etc. It defaults to comparing the current workspace and the last commit (`HEAD`), if arguments `a_rev` and `b_rev` are not specified. -Options `--show-hash` and `--show-json` can be used to modify format and details -of the output produced. See the details and example below. +Options `--show-json` and `--show-hash` can be used to modify format and details +of the output produced. See the [Options](#options) and (Examples)(#examples) +sections below for more details. `dvc diff` does not have an effect when the repository is not tracked by Git, for example when `dvc init` was used with the `--no-scm` option. @@ -57,7 +58,7 @@ for example when `dvc init` was used with the `--no-scm` option. ## Examples -For these examples we can use the [Get Started](/doc/get-started) repository. +For these examples we can use the [Get Started](/doc/get-started) project.
@@ -79,25 +80,21 @@ Preparing to download data from 'https://remote.dvc.org/get-started' ``` The `-T` flag passed to `dvc fetch` makes sure we have all the data files -related to all existing tags in the repo. You take a look at the -[available tags](https://github.com/iterative/example-get-started/tags) of our -example repo. +related to all existing tags in the repo. You may see the available tags of our +example repo [here](https://github.com/iterative/example-get-started/tags).
## Example: Checking workspace changes -The minimal `dvc diff`, run without arguments, defaults to comparing DVC-tacked -files between `HEAD` (current Git commit) and the current workspace +The minimal `dvc diff`, run without arguments, defaults to comparing DVC-tracked +files between `HEAD` (last Git commit) and the current workspace (uncommitted changes, if any): ```dvc $ dvc diff ``` -> It produces an empty result if there are no changes to any files or -> directories in the current workspace. - ## Example: Comparing workspace with arbitrary commits
@@ -133,16 +130,17 @@ files summary: 1 added, 0 deleted, 0 modified ### Click and expand to setup the example -Our example repository has the `bigrams-experiment` and `baseline-experiment` -[tags](https://github.com/iterative/example-get-started/tags) respectively to -reference these experiments. +Our example repository has the `baseline-experiment` and `bigrams-experiment` +[tags](https://github.com/iterative/example-get-started/tags) tags, that +reference two different modeling experiments. -Having followed the previous example's setup, move into the -`example-get-started/` directory. Then make sure that you have the latest code -and data with the following commands. +Having followed the example's setup, move into the `example-get-started/` +directory. Then make sure that you have the latest code and data with the +following commands: ```dvc $ git checkout master +$ dvc checkout ```
@@ -164,15 +162,15 @@ between the tags `baseline-experiment` and `bigrams-experiment`. ## Example: Using different output formats -Same command as above but using JSON output and include hash values for files -and directories: +Let's use the same command as above, but with JSON output and including hash +values: ```dvc $ dvc diff --show-json --show-hash \ baseline-experiment bigrams-experiment ``` -outputs: +It outputs: ```json { From f8f1e384f8a8e13a369fe9ceddbf50e9e098d014 Mon Sep 17 00:00:00 2001 From: Ivan Shcheklein Date: Sun, 23 Feb 2020 19:08:09 -0800 Subject: [PATCH 4/5] address diff command review --- public/static/docs/command-reference/diff.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/public/static/docs/command-reference/diff.md b/public/static/docs/command-reference/diff.md index 05fe669cb1..7efc695afa 100644 --- a/public/static/docs/command-reference/diff.md +++ b/public/static/docs/command-reference/diff.md @@ -62,7 +62,7 @@ For these examples we can use the [Get Started](/doc/get-started) project.
-### Click and expand to setup the examples project +### Click and expand to setup the project to run examples Start by cloning our example repo if you don't already have it: From 3f00e585c335caf644749bdbdfcc816c1741d5de Mon Sep 17 00:00:00 2001 From: Ivan Shcheklein Date: Sun, 23 Feb 2020 23:34:28 -0800 Subject: [PATCH 5/5] minor change to diff descritption, address PR comment --- public/static/docs/command-reference/diff.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/public/static/docs/command-reference/diff.md b/public/static/docs/command-reference/diff.md index 7efc695afa..942724d6a8 100644 --- a/public/static/docs/command-reference/diff.md +++ b/public/static/docs/command-reference/diff.md @@ -18,10 +18,10 @@ positional arguments: ## Description -Prints files and directories added, modified, deleted in a Git commit `b_rev` as -compared to another Git commit `a_rev`. Both `a_rev` and `b_rev` accept any -[Git revision](https://git-scm.com/docs/gitrevisions) - branch or tag name, Git -commit hash, etc. +Prints a list of files and directories added, modified, deleted in a Git commit +`b_rev` as compared to another Git commit `a_rev`. Both `a_rev` and `b_rev` +accept any [Git revision](https://git-scm.com/docs/gitrevisions) - branch or tag +name, Git commit hash, etc. It defaults to comparing the current workspace and the last commit (`HEAD`), if arguments `a_rev` and `b_rev` are not specified.