-
Notifications
You must be signed in to change notification settings - Fork 409
diff: update docs according to the new patch #953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
caeb95b
2535e84
8f73e54
0865454
5948eaa
f8f1e38
3f00e58
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,41 +1,53 @@ | ||
| # diff | ||
|
|
||
| Show changes between commits in the <abbr>DVC repository</abbr>, or between a | ||
| commit and the <abbr>workspace</abbr>. The comparison can be narrowed down to | ||
| specific target files/directories tracked by DVC. | ||
| Show added, modified, or deleted DVC-tracked files and directories between | ||
| commits in the <abbr>DVC repository</abbr>, or between a commit and the | ||
| workspace. | ||
|
|
||
| ## Synopsis | ||
|
|
||
| ```usage | ||
| usage: dvc diff [-h] [-q | -v] [-t TARGET] a_ref [b_ref] | ||
| usage: dvc diff [-h] [-q | -v] | ||
| [--show-json] [--show-hash] | ||
| [a_rev] [b_rev] | ||
|
|
||
| positional arguments: | ||
| a_rev Old Git commit to compare (defaults to HEAD) | ||
| b_rev New Git commit to compare (defaults to the | ||
| current workspace) | ||
| a_rev Old Git commit to compare (defaults to HEAD) | ||
| b_rev New Git commit to compare (defaults to the current workspace) | ||
|
shcheklein marked this conversation as resolved.
|
||
| ``` | ||
|
|
||
| ## Description | ||
|
|
||
| Given two commit hashes, branch or tag names, etc. | ||
| ([references](https://git-scm.com/docs/revisions)) `a_ref` and `b_ref`, this | ||
| command shows a comparative summary of basic statistics related to files tracked | ||
| by DVC: how many files were deleted/changed, and the file size differences. | ||
| Prints a list of files and directories added, modified, deleted in a Git commit | ||
| `b_rev` as compared to another Git commit `a_rev`. Both `a_rev` and `b_rev` | ||
| accept any [Git revision](https://git-scm.com/docs/gitrevisions) - branch or tag | ||
| name, Git commit hash, etc. | ||
|
|
||
| > Note that `dvc diff` does not show the line-to-line comparisons like | ||
| > `git diff` or [GNU `diff`](https://www.gnu.org/software/diffutils/) can. This | ||
| > is because the data data tracked by DVC comes in many formats such as | ||
| > structured text, binary blobs, etc. For an example on how to create | ||
| > line-to-line text file comparison, refer to | ||
| > [issue #770](https://github.com/iterative/dvc/issues/770#issuecomment-512693256). | ||
| It defaults to comparing the current workspace and the last commit (`HEAD`), if | ||
| arguments `a_rev` and `b_rev` are not specified. | ||
|
|
||
| Options `--show-json` and `--show-hash` can be used to modify format and details | ||
| of the output produced. See the [Options](#options) and (Examples)(#examples) | ||
| sections below for more details. | ||
|
|
||
| `dvc diff` does not have an effect when the repository is not tracked by Git, | ||
| for example when `dvc init` was used with the `--no-scm` option. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Somewhat unrelated but actually, I checked and this isn't correct. You can create a git repo, then a Is this buggy behavior? If not, we just need to rewrite the paragraph above to something more correct.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it feels more or less correct to me - the main thing here is
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes it's tracked by Git, but I initialized the DVC project with
This comment was marked as off-topic.
Sorry, something went wrong.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's fine, I still don't see a problem. It's intuitively clear what does it mean, even if implementation is not 100% correct and allows to have a mix of Git and --no-scm.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm OK but just removing that last incorrect statement wouldn't hurt either. |
||
|
|
||
| > Note that current `dvc diff` implementation does not show the line-to-line | ||
| > comparison among the files in each revision, like `git diff` or | ||
| > [GNU `diff`](https://www.gnu.org/software/diffutils/) can. This is because the | ||
| > data data tracked by DVC can come in many possible formats e.g. structured | ||
| > text, or binary blobs, etc. For an example on how to create line-to-line text | ||
| > file comparison, refer to this | ||
| > [comment](https://github.com/iterative/dvc/issues/770#issuecomment-512693256). | ||
|
|
||
| ## Options | ||
|
|
||
| - `-t TARGET`, `--target TARGET` - path to a data file or directory to limit | ||
| diff for. | ||
| - `--show-json` - generate output in JSON format. Usually needed to integrate | ||
| DVC into scripts. | ||
|
|
||
| - `--show-hash` - print file and directory hash values along with their path. | ||
| Useful for debug purposes. | ||
|
|
||
| - `-h`, `--help` - prints the usage/help message, and exit. | ||
|
|
||
|
|
@@ -46,148 +58,139 @@ for example when `dvc init` was used with the `--no-scm` option. | |
|
|
||
| ## Examples | ||
|
|
||
| For these examples we can use the chapters in our | ||
| [Get Started](/doc/get-started) section, up to | ||
| [Add Files](/doc/get-started/add-files). | ||
| For these examples we can use the [Get Started](/doc/get-started) project. | ||
|
|
||
| <details> | ||
|
|
||
| ### Click and expand to setup example | ||
| ### Click and expand to setup the project to run examples | ||
|
|
||
| Start by cloning our example repo if you don't already have it. Then move into | ||
| the repo and checkout the | ||
| [3-add-file](https://github.com/iterative/example-get-started/releases/tag/3-add-file) | ||
| tag, corresponding to the [Add Files](/doc/get-started/add-files) _Get Started_ | ||
| chapter: | ||
| Start by cloning our example repo if you don't already have it: | ||
|
|
||
| ```dvc | ||
| $ git clone https://github.com/iterative/example-get-started | ||
| $ cd example-get-started | ||
| $ git checkout 3-add-file | ||
| ``` | ||
|
|
||
| Download the precomputed data using: | ||
| Download data using: | ||
|
|
||
| ```dvc | ||
| $ dvc pull | ||
| $ dvc fetch -T | ||
| Preparing to download data from 'https://remote.dvc.org/get-started' | ||
| ... | ||
| ``` | ||
|
|
||
| </details> | ||
| The `-T` flag passed to `dvc fetch` makes sure we have all the data files | ||
| related to all existing tags in the repo. You may see the available tags of our | ||
| example repo [here](https://github.com/iterative/example-get-started/tags). | ||
|
|
||
| ## Example: Previous commit in the same branch | ||
| </details> | ||
|
|
||
| The minimal `dvc diff`, run without arguments, defaults to comparing DVC-tacked | ||
| files between `HEAD` (current Git commit) and the current <abbr>workspace</abbr> | ||
| (uncommitted changes, if any). | ||
| ## Example: Checking workspace changes | ||
|
|
||
| To see the difference between the very previous commit of the project and the | ||
| workspace, we can use `HEAD^` as `a_ref`: | ||
| The minimal `dvc diff`, run without arguments, defaults to comparing DVC-tracked | ||
| files between `HEAD` (last Git commit) and the current <abbr>workspace</abbr> | ||
| (uncommitted changes, if any): | ||
|
shcheklein marked this conversation as resolved.
|
||
|
|
||
| ```dvc | ||
| $ dvc diff HEAD^ | ||
| dvc diff from df613bc to ed10968 | ||
|
|
||
| diff for 'data/data.xml' | ||
| +data/data.xml with md5 a304afb96060aad90176268345e10355 | ||
|
|
||
| added file with size 37.9 MB | ||
| $ dvc diff | ||
| ``` | ||
|
|
||
| ## Example: Specific targets across Git commits | ||
|
|
||
| We can base this example in the [Metrics](/doc/get-started/metrics) and | ||
| [Compare Experiments](/doc/get-started/compare-experiments) chapters of our _Get | ||
| Started_ section, that describe different experiments to produce the `model.pkl` | ||
| file. Our example repository has the `bigrams-experiment` and | ||
| `baseline-experiment` | ||
| [tags](https://github.com/iterative/example-get-started/tags) respectively to | ||
| reference these experiments. | ||
| ## Example: Comparing workspace with arbitrary commits | ||
|
|
||
| <details> | ||
|
|
||
| ### Click and expand to setup example | ||
| ### Click and expand to setup the example | ||
|
|
||
| Having followed the previous example's setup, move into the | ||
| `example-get-started/` directory. Then make sure that you have the latest code | ||
| and data with the following commands. | ||
| Let's checkout the | ||
| [3-add-file](https://github.com/iterative/example-get-started/releases/tag/3-add-file) | ||
| tag, corresponding to the [Add Files](/doc/get-started/add-files) _Get Started_ | ||
| chapter, right after we added `data.xml` file with DVC: | ||
|
|
||
| ```dvc | ||
| $ git checkout master | ||
| $ dvc fetch -T | ||
| $ git checkout 3-add-file | ||
| $ dvc pull | ||
| ``` | ||
|
|
||
| The `-T` flag passed to `dvc fetch` makes sure we have all the data files | ||
| related to all existing tags in the repo. You take a look at the | ||
| [available tags](https://github.com/iterative/example-get-started/tags) of our | ||
| example repo. | ||
|
|
||
| </details> | ||
|
|
||
| To see the difference in `model.pkl` among these tags, we can run the following | ||
| command. | ||
| To see the difference between the very previous commit of the project and the | ||
| workspace, we can use `HEAD^` as `a_ref`: | ||
|
|
||
| ```dvc | ||
| $ dvc diff -t model.pkl baseline-experiment bigrams-experiment | ||
| dvc diff from bc1722d to 8c1169d | ||
| $ dvc diff HEAD^ | ||
| Added: | ||
| data/data.xml | ||
|
|
||
| diff for 'model.pkl' | ||
| -model.pkl with md5 a664896 | ||
| +model.pkl with md5 3863d0e | ||
| ... | ||
| files summary: 1 added, 0 deleted, 0 modified | ||
| ``` | ||
|
|
||
| The output from this command confirms that there's a difference in the | ||
| `model.pkl` file between the 2 Git commits (tags `baseline-experiment` and | ||
| `bigrams-experiment`) we indicated. | ||
| ## Example: Comparing tags or branches | ||
|
|
||
| ### What about directories? | ||
| <details> | ||
|
|
||
| Unlike Git, DVC features controlling entire directories without having to add | ||
| each individual file. See `dvc add` without `--recursive` for example. `dvc run` | ||
| can track entire directories (when these are specified as command dependencies | ||
| or <abbr>outputs</abbr>). | ||
| ### Click and expand to setup the example | ||
|
|
||
| We can use `dvc diff` to check for changes in a directory by specifying the | ||
| directory as the target (with option `-t`). Note that we skip the `b_ref` | ||
| argument this time, that defaults to `HEAD`. | ||
| Our example repository has the `baseline-experiment` and `bigrams-experiment` | ||
| [tags](https://github.com/iterative/example-get-started/tags) tags, that | ||
| reference two different modeling experiments. | ||
|
|
||
| Having followed the example's setup, move into the `example-get-started/` | ||
| directory. Then make sure that you have the latest code and data with the | ||
| following commands: | ||
|
|
||
| ```dvc | ||
| $ dvc diff -t data/features baseline-experiment | ||
| dvc diff from bc1722d to 8c1169d | ||
| $ git checkout master | ||
| $ dvc checkout | ||
| ``` | ||
|
|
||
| diff for 'data/features' | ||
| -data/features with md5 3338d2c.dir | ||
| +data/features with md5 42c7025.dir | ||
| </details> | ||
|
|
||
| 0 files not changed, 0 files modified, 0 files added, | ||
| 0 files deleted, size was increased by 2.9 MB | ||
| ```dvc | ||
| $ dvc diff baseline-experiment bigrams-experiment | ||
| Modified: | ||
| auc.metric | ||
| data/features/ | ||
| data/features/test.pkl | ||
| data/features/train.pkl | ||
| model.pkl | ||
|
|
||
| files summary: 0 added, 0 deleted, 4 modified | ||
| ``` | ||
|
|
||
| ## Example: Confirming that a target has not changed | ||
| The output from this command confirms that there's a difference in 4 files | ||
| between the tags `baseline-experiment` and `bigrams-experiment`. | ||
|
|
||
| Let's use our example repo once again, that has several | ||
| [available tags](https://github.com/iterative/example-get-started/tags) for | ||
| conveniency. The `5-preparation` tag corresponds to the | ||
| [Connect Code and Data](/doc/get-started/connect-code-and-data) chapter of our | ||
| _Get Started_ section, where the `dvc run` command is used to create a | ||
| `prepare.dvc` stage file. This DVC-file tracks the `data/prepared` directory | ||
| <abbr>output</abbr>. | ||
| ## Example: Using different output formats | ||
|
|
||
| ```dvc | ||
| $ dvc diff -t data/prepared 5-preparation | ||
| dvc diff from 3deeec1 to 8c1169d | ||
|
|
||
| diff for 'data/prepared' | ||
| -data/prepared with md5 6836f79.dir | ||
| +data/prepared with md5 6836f79.dir | ||
| Let's use the same command as above, but with JSON output and including hash | ||
| values: | ||
|
|
||
| 2 files not changed, 0 files modified, 0 files added, | ||
| 0 files deleted, size was not changed | ||
| ```dvc | ||
| $ dvc diff --show-json --show-hash \ | ||
| baseline-experiment bigrams-experiment | ||
| ``` | ||
|
|
||
| The command above checks whether there have been any changes to the | ||
| `data/prepared` directory after the `5-preparation` tag (since the `b_ref` is | ||
| `HEAD` by default). The output tells us that there have been no changes to that | ||
| directory (or to any other file). | ||
| It outputs: | ||
|
|
||
| ```json | ||
| { | ||
| "added": [], | ||
| "deleted": [], | ||
| "modified": [ | ||
| ...{ | ||
| "path": "data/features/", | ||
| "hash": { | ||
| "old": "3338d2c21bdb521cda0ba4add89e1cb0.dir", | ||
| "new": "42c7025fc0edeb174069280d17add2d4.dir" | ||
| } | ||
| }, | ||
| ...{ | ||
| "path": "model.pkl", | ||
| "hash": { | ||
| "old": "43630cce66a2432dcecddc9dd006d0a7", | ||
| "new": "662eb7f64216d9c2c1088d0a5e2c6951" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.