From e6ab43ea20551bfbb05d48951bcb4c9c342058de Mon Sep 17 00:00:00 2001 From: Dani Hodovic Date: Tue, 26 Nov 2019 18:13:37 +0100 Subject: [PATCH 1/2] cmd ref: add examples on downloading normal git files depends on: https://github.com/iterative/dvc/pull/2837 --- static/docs/command-reference/get.md | 45 +++++++++++++++++++++------- 1 file changed, 34 insertions(+), 11 deletions(-) diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index 120b3c98a3..6f0436c27a 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -1,28 +1,40 @@ # get -Download or copy file or directory from the -[remote storage](/doc/command-reference/remote) of any DVC project -in a Git repository (e.g. hosted on GitHub) into the current working directory. +Download a file or directory from any DVC project or Git repository +(e.g. hosted on GitHub) into the current working directory. -> Unlike `dvc import`, this command does not track the downloaded data files -> (does not create a DVC-file). +> Unlike `dvc import`, this command does not track the downloaded files (does +> not create a DVC-file). ## Synopsis ```usage usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path +Download files or directories from DVC repository. +Documentation: + positional arguments: - url URL of Git repository with DVC project to download from. - path Path to data within DVC repository. + url URL of Git repository with DVC project to + download from. + path Path to a file or directory within a + DVC repository. + +optional arguments: + -h, --help show this help message and exit + -q, --quiet Be quiet. + -v, --verbose Be verbose. + -o [OUT], --out [OUT] + Destination path. + --rev [REV] DVC repository git revision. ``` ## Description Provides an easy way to download datasets, intermediate results, ML models, or -other files and directories (any data artifact) tracked in another -DVC repository, by downloading them into the current working -directory. (It works like `wget`, but for DVC repositories.) +other files and directories tracked in another DVC repository, by +downloading them into the current working directory. (It works like `wget`, but +for DVC repositories.) Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. @@ -66,7 +78,7 @@ created in the current working directory, with its original file name. - `-v`, `--verbose` - displays detailed tracing information. -## Examples +## Example: Retrieve a model from a DVC remote > Note that `dvc get` can be used from anywhere in the file system, as long as > DVC is [installed](/doc/install). @@ -106,6 +118,17 @@ The same example applies to raw or intermediate data artifacts as well, of course, for cases where we want to download those files or directories and perform some analysis on them. +## Examples: Retrieve a file from a git repository + +We can also use `dvc get` to retrieve any file or directory that exists in a git +repository. + +```dvc +$ dvc get https://github.com/schacon/cowsay/install.sh install.sh +$ ls +install.sh +``` + ## Example: Compare different versions of data or model `dvc get` has the `--rev` option, to specify which version of the repository to From 1fc88e5a49e462b150797ddacce7a89bbf5e7187 Mon Sep 17 00:00:00 2001 From: Dani Hodovic Date: Thu, 28 Nov 2019 15:48:50 +0100 Subject: [PATCH 2/2] fixup! cmd ref: add examples on downloading normal git files --- static/docs/command-reference/get.md | 76 ++++++++++++---------------- 1 file changed, 32 insertions(+), 44 deletions(-) diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index 6f0436c27a..8293686e66 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -1,40 +1,32 @@ # get -Download a file or directory from any DVC project or Git repository +Obtain a file or directory from any DVC project or Git repository (e.g. hosted on GitHub) into the current working directory. -> Unlike `dvc import`, this command does not track the downloaded files (does -> not create a DVC-file). +> Unlike `dvc import`, this command does not track the obtained files (does not +> create a DVC-file). ## Synopsis ```usage usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path -Download files or directories from DVC repository. +Download/copy files or directories from DVC repository. Documentation: positional arguments: - url URL of Git repository with DVC project to - download from. - path Path to a file or directory within a - DVC repository. - -optional arguments: - -h, --help show this help message and exit - -q, --quiet Be quiet. - -v, --verbose Be verbose. - -o [OUT], --out [OUT] - Destination path. - --rev [REV] DVC repository git revision. + url URL of Git repository with DVC project to download + from. + path Path to a file or directory within a DVC repository. ``` ## Description -Provides an easy way to download datasets, intermediate results, ML models, or -other files and directories tracked in another DVC repository, by -downloading them into the current working directory. (It works like `wget`, but -for DVC repositories.) +Provides an easy way to obtain files or directories tracked in any DVC +repository, both by Git (e.g. source code) and DVC (e.g. datasets, ML +models). The file or directory in path is copied to the current working +directory. (For remote URLs, it works like downloading with wget, but supporting +DVC data artifacts.) Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. @@ -44,15 +36,12 @@ external project. Both HTTP and SSH protocols are supported for online repositories (e.g. `[user@]server:project.git`). `url` can also be a local file system path to an "offline" repository. -The `path` argument of this command is used to specify the location of the data -to be downloaded within the source project. It should point to a data file or -directory tracked by that project – specified in one of the -[DVC-files](/doc/user-guide/dvc-file-format) of the repository at `url`. (You -will not find these files directly in the source Git repository.) The source -project should have a default [DVC remote](/doc/command-reference/remote) -configured, containing them.) +The `path` argument of this command is used to specify the location of the file +or directory within the source project. If the file is a +[DVC-file](/doc/user-guide/dvc-file-format) the source project must have a +default [DVC remote](/doc/command-reference/remote) configured. -> See `dvc get-url` to download data from other supported URLs. +> See `dvc get-url` to obtain data from other supported URLs. After running this command successfully, the data found in the `url` `path` is created in the current working directory, with its original file name. @@ -60,7 +49,7 @@ created in the current working directory, with its original file name. ## Options - `-o`, `--out` - specify a path (directory and/or file name) to the desired - location to place the downloaded data in. The default value (when this option + location to place the obtained file in. The default value (when this option isn't used) is the current working directory (`.`) and original file name. If an existing directory is specified, then the output will be placed inside of it. @@ -68,7 +57,7 @@ created in the current working directory, with its original file name. - `--rev` - specific [Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References) (such as a branch name, a tag, or a commit hash) of the DVC repository to - download the data from. The tip of the default branch is used by default when + obtain the file from. The tip of the default branch is used by default when this option is not specified. - `-h`, `--help` - prints the usage/help message, and exit. @@ -83,7 +72,7 @@ created in the current working directory, with its original file name. > Note that `dvc get` can be used from anywhere in the file system, as long as > DVC is [installed](/doc/install). -We can use `dvc get` to download the resulting model file from our +We can use `dvc get` to obtain the resulting model file from our [get started example repo](https://github.com/iterative/example-get-started), a DVC project external to the current working directory. The desired output file would be located in the root of the external project @@ -107,15 +96,15 @@ is found, that specifies `model.pkl` in its outputs (`outs`). DVC then its [config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)). -> A recommended use for downloading binary files from DVC repositories, as done -> in this example, is to place a ML model inside a wrapper application that -> serves as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) -> pipeline or as an HTTP/RESTful API (web service) that provides predictions -> upon request. This can be automated leveraging DVC with +> A recommended use for obtaining binary files from DVC repositories, as done in +> this example, is to place a ML model inside a wrapper application that serves +> as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) pipeline +> or as an HTTP/RESTful API (web service) that provides predictions upon +> request. This can be automated leveraging DVC with > [CI/CD](https://en.wikipedia.org/wiki/CI/CD) tools. The same example applies to raw or intermediate data artifacts as -well, of course, for cases where we want to download those files or directories +well, of course, for cases where we want to obtain those files or directories and perform some analysis on them. ## Examples: Retrieve a file from a git repository @@ -132,12 +121,11 @@ install.sh ## Example: Compare different versions of data or model `dvc get` has the `--rev` option, to specify which version of the repository to -download a data artifact from. It also has the `--out` option to -specify the file or directory path and file name for the download. Combining -these two options allows us to do something we can't achieve with the regular -`git checkout` + `dvc checkout` process – see for example the -[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get -Started_ section. +obtain a data artifact from. It also has the `--out` option to +specify the target path. Combining these two options allows us to do something +we can't achieve with the regular `git checkout` + `dvc checkout` process – see +for example the [Get Older Data Version](/doc/get-started/older-versions) +chapter of our _Get Started_ section. Let's use the [get started example repo](https://github.com/iterative/example-get-started) @@ -171,7 +159,7 @@ get the most recent one, we use a similar command, but with `-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev` (since it's the latest version anyway). In fact, in this case using `dvc pull` with the corresponding [DVC-files](/doc/user-guide/dvc-file-format) should -suffice, downloading the file as just `model.pkl`. We can then rename it to make +suffice, obtaining the file as just `model.pkl`. We can then rename it to make its version explicit: ```dvc