diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 49fc6fc36b..0db7230f62 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -243,3 +243,6 @@ In this case, a DVC-file is generated for each file in the `pics/` directory tree. No top-level DVC-file is generated, which is typically less convenient. For example, we cannot use the directory structure as one unit with `dvc run` or other commands. + +To untrack a file or directory just add [patterns](https://git-scm.com/docs/gitignore) (corresponding to the location of file or directory) under `.dvcignore` file.
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. diff --git a/content/docs/command-reference/checkout.md b/content/docs/command-reference/checkout.md index 9f45abb47b..1b31205d36 100644 --- a/content/docs/command-reference/checkout.md +++ b/content/docs/command-reference/checkout.md @@ -230,3 +230,5 @@ MD5 (model.pkl) = 662eb7f64216d9c2c1088d0a5e2c6951 Previously this took two commands, `git checkout` followed by `dvc checkout`. We can now skip the second one, which is automatically run for us. The workspace is automatically synchronized accordingly. + +One thing to note here is when we are using `dvc checkout`, it does not affect state of files and directories listed under `.dvcignore` as these are currently untracked by DVC and `dvc checkout` synchronizes only tracked files and directories with the versions specified in the current DVC-files.
See [.dvcignore](docs/user-guide/.dvcignore) for more details. diff --git a/content/docs/command-reference/commit.md b/content/docs/command-reference/commit.md index 6c1266a5b7..4013c0522f 100644 --- a/content/docs/command-reference/commit.md +++ b/content/docs/command-reference/commit.md @@ -70,6 +70,8 @@ force-update the [DVC-files](/doc/user-guide/dvc-file-format) and save data to cache. They are still useful, but keep in mind that DVC can't guarantee reproducibility in those cases. +Note that [patterns](https://git-scm.com/docs/gitignore) listed in `.dvcignore` are not updated as a result of `dvc commit` as they are not currently tracked by DVC. See [.dvcignore](docs/user-guide/.dvcignore) for more details. + ## Options - `-d`, `--with-deps` - determines files to commit by tracking dependencies to diff --git a/content/docs/command-reference/destroy.md b/content/docs/command-reference/destroy.md index 533ba4f3b5..fda7030103 100644 --- a/content/docs/command-reference/destroy.md +++ b/content/docs/command-reference/destroy.md @@ -21,6 +21,9 @@ directory.) If you were using cache, DVC will replace them with copies, so that your data is intact after the project's destruction. +Note that `.dvcignore` will not get deleted as a result of `dvc destroy`.
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. + ## Options - `-f`, `--force` - do not prompt when destroying this project. diff --git a/content/docs/command-reference/init.md b/content/docs/command-reference/init.md index bb4613edb4..4eb5adf7fb 100644 --- a/content/docs/command-reference/init.md +++ b/content/docs/command-reference/init.md @@ -223,3 +223,6 @@ repo └── project-a └── .dvc ``` + +Its quite intutive to add `.dvcignore` at the time of project intialization for management of tracking of files throughout the project.
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. diff --git a/content/docs/command-reference/move.md b/content/docs/command-reference/move.md index 74a72a9a8e..7e6fea2026 100644 --- a/content/docs/command-reference/move.md +++ b/content/docs/command-reference/move.md @@ -71,6 +71,17 @@ outs: md5: c8263e8422925b0872ee1fb7c953742a path: other.csv ``` +Note that when we try to use `dvc move` over a file whose pattern matches one of the patterns listed in `.dvcignore`, it would raise an error because that DVC-file was not tracked by DVC. + +```dvc +$ dvc add data.csv +$ echo data.* >> .dvcignore +$ dvc move data.csv other.csv +ERROR: failed to move 'data.csv' -> 'other.csv' - Unable to find DVC-file with output 'data.csv' + +Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help! +``` +See [.dvcignore](docs/user-guide/.dvcignore) for more details. ## Options diff --git a/content/docs/command-reference/pipeline/index.md b/content/docs/command-reference/pipeline/index.md index 6b4d6684e5..9d4291f984 100644 --- a/content/docs/command-reference/pipeline/index.md +++ b/content/docs/command-reference/pipeline/index.md @@ -35,6 +35,8 @@ interconnected by their dependencies and outputs later. (See `dvc repro`.) `dvc pipeline` commands help users display the existing project pipelines in different ways. +DVC might remove ignored files(files listed in `.dvcignore`) upon `dvc run` or `dvc repro`. If they are not produced by a pipeline stage, they can be deleted permanently.
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. ## Options - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index 195056d483..90ca308ccc 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -56,6 +56,9 @@ After a data file is in cache, `dvc pull` can use OS-specific mechanisms like reflinks or hardlinks to put it in the workspace without copying. See `dvc checkout` for more details. +Note that when you do `dvc pull` then the missing files whose corresponding DVC-files matches with the DVC-files in remote storage will be downloaded. But if a DVC-file is listed under `.dvcignore` then its corresponding file won't be downloaded.
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. + ## Options - `-a`, `--all-branches` - determines the files to download by examining diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 65f41494e4..fc74d313ae 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -70,6 +70,8 @@ backward from the target [stage files](/doc/command-reference/run), through the corresponding [pipelines](/doc/command-reference/pipeline), to find data files to push. +Note that as `dvc push` uploads tracked files and directories to remote storage, it won't upload files and directories listed under `.dvcignore`.
See [.dvcignore](docs/user-guide/.dvcignore) for more details. + ## Options - `-a`, `--all-branches` - determines the files to upload by examining DVC-files diff --git a/content/docs/command-reference/remote/index.md b/content/docs/command-reference/remote/index.md index c0ca9c2f7a..452df50fe2 100644 --- a/content/docs/command-reference/remote/index.md +++ b/content/docs/command-reference/remote/index.md @@ -58,6 +58,8 @@ be used or these files could be edited manually. For the typical process to share the project via remote, see [Sharing Data And Model Files](/doc/use-cases/sharing-data-and-model-files). +Only those files will be present in remote storage which are not listed in `.dvcignore`.
See [.dvcignore](docs/user-guide/.dvcignore) for more details. + ## Options - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/content/docs/tutorials/deep/define-ml-pipeline.md b/content/docs/tutorials/deep/define-ml-pipeline.md index 9db5a5ee93..ee7b26fd6c 100644 --- a/content/docs/tutorials/deep/define-ml-pipeline.md +++ b/content/docs/tutorials/deep/define-ml-pipeline.md @@ -403,5 +403,8 @@ data/eval.txt:AUC: 0.624652 > document, our focus is DVC, not ML modeling, so we use a relatively small > dataset without any advanced ML techniques. +Note that if a file is not produced by a pipeline stage and listed under `.dvcignore` then DVC might remove them upon `dvc run`
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. + In the next chapter we will try to improve the metrics by changing our modeling code and using reproducibility in our pipeline. diff --git a/content/docs/tutorials/get-started/add-files.md b/content/docs/tutorials/get-started/add-files.md index 048aafa213..bad7fccc70 100644 --- a/content/docs/tutorials/get-started/add-files.md +++ b/content/docs/tutorials/get-started/add-files.md @@ -36,6 +36,8 @@ $ git commit -m "Add raw data to project" Committing DVC-files with Git allows us to track different versions of the project data as it evolves with the source code tracked by Git. +When we don't want DVC to track specific files and directories, we list them under `.dvcignore`. +
See [.dvcignore](docs/user-guide/.dvcignore) for more details.
### Expand to learn about DVC internals diff --git a/content/docs/tutorials/get-started/import-data.md b/content/docs/tutorials/get-started/import-data.md index 6900533d5c..58c99039b5 100644 --- a/content/docs/tutorials/get-started/import-data.md +++ b/content/docs/tutorials/get-started/import-data.md @@ -76,6 +76,8 @@ The `url` and `rev_lock` subfields under `repo` are used to save the origin and
+Suppose we want only a subset of files from the imported ones to work on and need remaining files in later stages. Meanwhile we also don't want DVC to track those files (as files as automatically tracked by DVC when imported) so we can just list them under `.dvcignore`. See [.dvcignore](docs/user-guide/.dvcignore) for more details. + Since this is not an official part of this _Get Started_, bring everything back to normal with: diff --git a/content/docs/tutorials/pipelines.md b/content/docs/tutorials/pipelines.md index 1ba072eb11..7f72fea39d 100644 --- a/content/docs/tutorials/pipelines.md +++ b/content/docs/tutorials/pipelines.md @@ -396,3 +396,5 @@ DVC streamlines all of your experiments into a single, reproducible project, and it makes it easy to share it with Git, including dependencies. This collaboration feature provides the ability to review data science research. + + See also [.dvcignore](docs/user-guide/.dvcignore) to untrack specific files in a pipeline. diff --git a/content/docs/tutorials/versioning.md b/content/docs/tutorials/versioning.md index 20234ffddb..582e41e414 100644 --- a/content/docs/tutorials/versioning.md +++ b/content/docs/tutorials/versioning.md @@ -276,6 +276,9 @@ If you run `git status` you'll see that `data.dvc` is modified and currently points to the `v1.0` version of the dataset, while code and model files are from the `v2.0` tag. +Note that the contents under `.dvcgnore` file won't get affected when switching between versions concluding that the files that are untracked in one version will also remain untracked in other versions. +See [.dvcignore](docs/user-guide/.dvcignore) for more details. +
### Expand to learn more about DVC internals @@ -342,7 +345,8 @@ was a dependency change. It also updates outputs and puts them into the To make things a little simpler: if `dvc add` and `dvc checkout` provide a basic mechanism to version control large data files or models, `dvc run` and `dvc repro` provide a build system for ML models, which is similar to -[Make](https://www.gnu.org/software/make/) in software build automation. +[Make](https://www.gnu.org/software/make/) in software build automation. + ## What's next? diff --git a/content/docs/use-cases/data-registries.md b/content/docs/use-cases/data-registries.md index ea25233187..c1425f8bad 100644 --- a/content/docs/use-cases/data-registries.md +++ b/content/docs/use-cases/data-registries.md @@ -150,6 +150,9 @@ the data source (registry repo). This is achieved by creating a particular kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import stage_). This file can be used staged and committed with Git. +As DVC automatically tracks the files downloaded via `dvc import`, we can list files which we don't want DVC to track under `.dvcignore`.
+See [.dvcignore](docs/user-guide/.dvcignore) for more details. + As an addition to the import workflow, and enabled the saved dependency, we can easily bring it up to date in our consumer project(s) with `dvc update` whenever the the dataset changes in the source repo (data registry): diff --git a/content/docs/user-guide/dvc-files-and-directories.md b/content/docs/user-guide/dvc-files-and-directories.md index 7518ee9422..d157872013 100644 --- a/content/docs/user-guide/dvc-files-and-directories.md +++ b/content/docs/user-guide/dvc-files-and-directories.md @@ -122,3 +122,5 @@ $ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir ``` See also `dvc cache dir` to set the location of the cache directory. + + Refer to `.dvcignore` to add [patterns](https://git-scm.com/docs/gitignore) which DVC ignores as if they are non-existent to it. diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index d84415aef9..cb94464ab0 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -165,6 +165,8 @@ Importing 'model.pkl (git@github.com:iterative/example-get-started)' The command above creates `model.pkl.dvc`, where the external dependency is specified (with the `repo` field). +See [.dvcignore](docs/user-guide/.dvcignore) for untracking unecessary files which were automatically tracked by DVC on running `dvc import` or `dvc import-url`. +
### Expand to see resulting DVC-file