diff --git a/config/prismjs/dvc-commands.js b/config/prismjs/dvc-commands.js index 937f9da24e..4e877960f2 100644 --- a/config/prismjs/dvc-commands.js +++ b/config/prismjs/dvc-commands.js @@ -48,6 +48,7 @@ module.exports = [ 'config', 'commit', 'checkout', + 'check-ignore', 'cache dir', 'cache', 'add' diff --git a/content/docs/command-reference/check-ignore.md b/content/docs/command-reference/check-ignore.md new file mode 100644 index 0000000000..4a99f6d591 --- /dev/null +++ b/content/docs/command-reference/check-ignore.md @@ -0,0 +1,90 @@ +# check-ignore + +Check whether any given files or directories are excluded from DVC due to the +patterns found in [`.dvcignore`](/doc/user-guide/dvcignore). + +## Synopsis + +```usage +usage: usage: dvc check-ignore [-h] [-q | -v] [-d] [-n] + targets [targets ...] + +positional arguments: + targets File or directory paths to check (wildcards supported) +``` + +## Description + +This helper command checks whether the given `targets` are ignored by DVC +according to the [`.dvcignore` file](/doc/user-guide/dvcignore) (if any). The +ones that are ignored indeed are printed back. + +> Note that your shell may support path wildcards such as `dir/file*` and these +> can be fed as `targets` to `dvc check-ignore`, as shown in the +> [examples](#examples). + +## Options + +- `-d`, `--details` - show the exclude pattern together with each target path. + +- `-n`, `--non-matching` - show the target paths which don’t match any pattern. + Only usable when `--details` is also employed + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. + +## Examples + +First, let's create a `.dvcignore` file with some patterns in it, and some files +to check against it. + +```dvc +$ echo "file*\n\!file2" >> .dvcignore +$ cat .dvcignore +file* +!file2 +$ touch file1 file2 other +$ ls +file1 file2 other +``` + +Then, let's use `dvc check-ignore` to see which of these files would be excluded +given our `.dvcignore` file: + +```dvc +$ dvc check-ignore file1 +file1 +$ dvc check-ignore file1 file2 +file1 +file2 +$ dvc check-ignore other + # There's no command output, meaning `other` is not excluded. +$ dvc check-ignore file* +file1 +file2 +``` + +If the `--details` option is used, a series of lines are printed using this +format: `:: | ` + +```dvc +$ dvc check-ignore -d file1 file2 +.dvcignore:1:file* file1 +.dvcignore:2:!file2 file2 +$ dvc check-ignore -d other +$ dvc check-ignore -d file* +.dvcignore:1:file* file1 +.dvcignore:2:!file2 file2 +``` + +With the `--non-matching` option, non-matching `targets` will also be included +in the list. All fields in each line, except for ``, will be empty. + +```dvc +$ dvc check-ignore -d -n other +:: other +``` diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 7535efb2b0..9cf5e1b783 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -163,6 +163,10 @@ } ] }, + { + "label": "check-ignore", + "slug": "check-ignore" + }, { "label": "checkout", "slug": "checkout" diff --git a/content/docs/user-guide/contributing/docs.md b/content/docs/user-guide/contributing/docs.md index 34cae8b53d..1b08f3c062 100644 --- a/content/docs/user-guide/contributing/docs.md +++ b/content/docs/user-guide/contributing/docs.md @@ -200,7 +200,7 @@ We also use "emoji" symbols sparingly for visibility on certain notes. Mainly: - πŸ“– For notes that link to other related documentation - ⚠️ Warnings about possible problems related to DVC usage (similar to **Note!** and "Note that..." notes) -- πŸ’‘ Useful tips related to external tools/integrations +- πŸ’‘ Useful tips related to related or external tools and integrations > Some other emojis currently in use here and there: βš‘βœ…πŸ™πŸ›β­β— (among > others). diff --git a/content/docs/user-guide/dvcignore.md b/content/docs/user-guide/dvcignore.md index 7305abe11b..a9572b271e 100644 --- a/content/docs/user-guide/dvcignore.md +++ b/content/docs/user-guide/dvcignore.md @@ -8,16 +8,17 @@ project. For example, when working in a workspace directory with a large number of data files, you might encounter extended execution time for operations as simple as `dvc status`. In other case you might want to omit files or folders unrelated to the project (like `.DS_Store` on MacOS). To address -these scenarios, DVC supports optional `.dvcignore` files. `.dvcignore` works -similar to `.gitignore` in Git. +these scenarios, DVC supports optional `.dvcignore` files. + +`.dvcignore` is similar to `.gitignore` in Git, and can be tested with our +helper command `dvc check-ignore`. ## How does it work? -- You need to create the `.dvcignore` file. It can be placed in the root of the - project or inside any subdirectory (see also [remarks](#Remarks) below). -- Populate it with [patterns](https://git-scm.com/docs/gitignore) that you would - like to ignore. You can find useful templates - [here](https://github.com/github/gitignore). +- You need to create a `.dvcignore` file. These can be placed in the root of the + project, or in any subdirectory (see the [remarks](#Remarks) below). +- Populate it with [.gitignore patterns](https://git-scm.com/docs/gitignore). + You can find useful templates [here](https://github.com/github/gitignore). - Each line should contain only one pattern. - During execution of commands that traverse directories, DVC will ignore matching paths. @@ -28,87 +29,95 @@ Ignored files will not be saved in cache, they will be non-existent for DVC. It's worth to remember that, especially when ignoring files inside DVC-handled directories. -**It is crucial to understand, that DVC might remove ignored files upon -`dvc run` or `dvc repro`. If they are not produced by a -[pipeline](/doc/command-reference/dag) [stage](/doc/command-reference/run), they -can be deleted permanently.** +⚠️ Important! Note that `dvc run` and `dvc repro` might remove ignored files. If +they are not produced by a pipeline [stage](/doc/command-reference/run), they +can be lost permanently. + +Keep in mind, that when you add `.dvcignore` patterns that affect an existing +output, its status will change and DVC will behave as if that +affected files were deleted. -Keep in mind, that when you add to `.dvcignore` entries that affect one of the -existing outputs, its status will change and DVC will behave as if -that affected files were deleted. +πŸ’‘ Note that you can use the `dvc check-ignore` command to check whether given +files or directories are ignored by the patterns in a `.dvcignore` file. If DVC finds a `.dvcignore` file inside a dependency or output directory, it raises an error. Ignoring files inside such directories should be handled from a `.dvcignore` in higher levels of the project tree. -## Syntax - -The same as for [`.gitignore`](https://git-scm.com/docs/gitignore). - ## Examples -Let's see what happens when we add a file to `.dvcignore`. +Let's see what happens when we add a file to `.dvcignore`: ```dvc $ mkdir data -$ echo data1 >> data/data1 -$ echo data2 >> data/data2 -$ tree . - +$ echo 1 > data/data1 +$ echo 2 > data/data2 +$ tree . └── data β”œβ”€β”€ data1 └── data2 ``` -We created the `data/` directory with two files. Let's ignore one of them, and -track the directory with DVC. +We created the `data/` directory with two data files. Let's ignore one of them, +and double check that it's being ignored by DVC: ```dvc $ echo data/data1 >> .dvcignore $ cat .dvcignore - data/data1 +$ dvc check-ignore data/* +data/data1 +``` -$ dvc add data +> Refer to `dvc check-ignore` for more details on that command. -$ tree .dvc/cache +## Example: Skip specific files when adding directories + +Let's now track the directory with `dvc add`, and see what happens in the +cache: +```dvc +$ dvc add data +... +$ tree .dvc/cache .dvc/cache -β”œβ”€β”€ 54 -β”‚Β Β  └── 40cb5e4c57ab54af68127492334a23.dir -└── ed - └── c3d3797971f12c7f5e1d106dd5cee2 +β”œβ”€β”€ 26 +β”‚Β Β  └── ab0db90d72e28ad0ba1e22ee510510 +└── ad + └── 8b0ddcf133a6e5833002ce28f97c5a.dir +$ md5 data/* +b026324c6904b2a9cb4b88d6d61c81d1 data/data1 +26ab0db90d72e28ad0ba1e22ee510510 data/data2 ``` -Only the hash values of a directory (`data/`) and one file have been -cached. This means that `dvc add` ignored one of the files -(`data1`). +Only the cache entries of the `data/` directory itself and one file have been +stored. Checking the hash value of the data files manually, we can see that +`data2` was cached. This means that `dvc add` did ignore `data1`. > Refer to > [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory) > for more info. +## Example: Ignore file state changes + Now, let's modify file `data1` and see if it affects `dvc status`. ```dvc $ dvc status - Data and pipelines are up to date. -$ echo "123" >> data/data1 +$ echo "2345" >> data/data1 $ dvc status - Data and pipelines are up to date. ``` -`dvc status` also ignores `data1`. The same modification on a tracked file will -produce a different output: +`dvc status` ignores `data1`. Modifications on a tracked file produce a +different output: ```dvc -$ echo "123" >> data/data2 +$ echo "345" >> data/data2 $ dvc status - data.dvc: changed outs: modified: data