diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index 120b3c98a3..8293686e66 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -1,28 +1,32 @@ # get -Download or copy file or directory from the -[remote storage](/doc/command-reference/remote) of any DVC project -in a Git repository (e.g. hosted on GitHub) into the current working directory. +Obtain a file or directory from any DVC project or Git repository +(e.g. hosted on GitHub) into the current working directory. -> Unlike `dvc import`, this command does not track the downloaded data files -> (does not create a DVC-file). +> Unlike `dvc import`, this command does not track the obtained files (does not +> create a DVC-file). ## Synopsis ```usage usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path +Download/copy files or directories from DVC repository. +Documentation: + positional arguments: - url URL of Git repository with DVC project to download from. - path Path to data within DVC repository. + url URL of Git repository with DVC project to download + from. + path Path to a file or directory within a DVC repository. ``` ## Description -Provides an easy way to download datasets, intermediate results, ML models, or -other files and directories (any data artifact) tracked in another -DVC repository, by downloading them into the current working -directory. (It works like `wget`, but for DVC repositories.) +Provides an easy way to obtain files or directories tracked in any DVC +repository, both by Git (e.g. source code) and DVC (e.g. datasets, ML +models). The file or directory in path is copied to the current working +directory. (For remote URLs, it works like downloading with wget, but supporting +DVC data artifacts.) Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. @@ -32,15 +36,12 @@ external project. Both HTTP and SSH protocols are supported for online repositories (e.g. `[user@]server:project.git`). `url` can also be a local file system path to an "offline" repository. -The `path` argument of this command is used to specify the location of the data -to be downloaded within the source project. It should point to a data file or -directory tracked by that project – specified in one of the -[DVC-files](/doc/user-guide/dvc-file-format) of the repository at `url`. (You -will not find these files directly in the source Git repository.) The source -project should have a default [DVC remote](/doc/command-reference/remote) -configured, containing them.) +The `path` argument of this command is used to specify the location of the file +or directory within the source project. If the file is a +[DVC-file](/doc/user-guide/dvc-file-format) the source project must have a +default [DVC remote](/doc/command-reference/remote) configured. -> See `dvc get-url` to download data from other supported URLs. +> See `dvc get-url` to obtain data from other supported URLs. After running this command successfully, the data found in the `url` `path` is created in the current working directory, with its original file name. @@ -48,7 +49,7 @@ created in the current working directory, with its original file name. ## Options - `-o`, `--out` - specify a path (directory and/or file name) to the desired - location to place the downloaded data in. The default value (when this option + location to place the obtained file in. The default value (when this option isn't used) is the current working directory (`.`) and original file name. If an existing directory is specified, then the output will be placed inside of it. @@ -56,7 +57,7 @@ created in the current working directory, with its original file name. - `--rev` - specific [Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References) (such as a branch name, a tag, or a commit hash) of the DVC repository to - download the data from. The tip of the default branch is used by default when + obtain the file from. The tip of the default branch is used by default when this option is not specified. - `-h`, `--help` - prints the usage/help message, and exit. @@ -66,12 +67,12 @@ created in the current working directory, with its original file name. - `-v`, `--verbose` - displays detailed tracing information. -## Examples +## Example: Retrieve a model from a DVC remote > Note that `dvc get` can be used from anywhere in the file system, as long as > DVC is [installed](/doc/install). -We can use `dvc get` to download the resulting model file from our +We can use `dvc get` to obtain the resulting model file from our [get started example repo](https://github.com/iterative/example-get-started), a DVC project external to the current working directory. The desired output file would be located in the root of the external project @@ -95,26 +96,36 @@ is found, that specifies `model.pkl` in its outputs (`outs`). DVC then its [config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)). -> A recommended use for downloading binary files from DVC repositories, as done -> in this example, is to place a ML model inside a wrapper application that -> serves as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) -> pipeline or as an HTTP/RESTful API (web service) that provides predictions -> upon request. This can be automated leveraging DVC with +> A recommended use for obtaining binary files from DVC repositories, as done in +> this example, is to place a ML model inside a wrapper application that serves +> as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) pipeline +> or as an HTTP/RESTful API (web service) that provides predictions upon +> request. This can be automated leveraging DVC with > [CI/CD](https://en.wikipedia.org/wiki/CI/CD) tools. The same example applies to raw or intermediate data artifacts as -well, of course, for cases where we want to download those files or directories +well, of course, for cases where we want to obtain those files or directories and perform some analysis on them. +## Examples: Retrieve a file from a git repository + +We can also use `dvc get` to retrieve any file or directory that exists in a git +repository. + +```dvc +$ dvc get https://github.com/schacon/cowsay/install.sh install.sh +$ ls +install.sh +``` + ## Example: Compare different versions of data or model `dvc get` has the `--rev` option, to specify which version of the repository to -download a data artifact from. It also has the `--out` option to -specify the file or directory path and file name for the download. Combining -these two options allows us to do something we can't achieve with the regular -`git checkout` + `dvc checkout` process – see for example the -[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get -Started_ section. +obtain a data artifact from. It also has the `--out` option to +specify the target path. Combining these two options allows us to do something +we can't achieve with the regular `git checkout` + `dvc checkout` process – see +for example the [Get Older Data Version](/doc/get-started/older-versions) +chapter of our _Get Started_ section. Let's use the [get started example repo](https://github.com/iterative/example-get-started) @@ -148,7 +159,7 @@ get the most recent one, we use a similar command, but with `-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev` (since it's the latest version anyway). In fact, in this case using `dvc pull` with the corresponding [DVC-files](/doc/user-guide/dvc-file-format) should -suffice, downloading the file as just `model.pkl`. We can then rename it to make +suffice, obtaining the file as just `model.pkl`. We can then rename it to make its version explicit: ```dvc