From 6fc5e8b7aafa4a5af4d67ee1dd6502e078521b94 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 23 Sep 2020 14:27:19 -0400 Subject: [PATCH 01/12] status: update sample outputs related to iterative/dvc/pull/4490 --- content/docs/command-reference/commit.md | 2 -- content/docs/command-reference/fetch.md | 4 ++-- content/docs/command-reference/install.md | 1 - .../docs/command-reference/metrics/diff.md | 2 +- content/docs/command-reference/pull.md | 1 + content/docs/command-reference/push.md | 8 ++++---- content/docs/command-reference/status.md | 19 ++++++++++--------- content/docs/user-guide/dvcignore.md | 2 -- 8 files changed, 18 insertions(+), 21 deletions(-) diff --git a/content/docs/command-reference/commit.md b/content/docs/command-reference/commit.md index 97cb30c2a3..10604461dd 100644 --- a/content/docs/command-reference/commit.md +++ b/content/docs/command-reference/commit.md @@ -249,7 +249,6 @@ $ git status -s M src/train.py $ dvc status - train.dvc: changed deps: modified: src/train.py @@ -275,7 +274,6 @@ dependencies ['src/train.py'] of 'train.dvc' changed. Are you sure you commit it? [y/n] y $ dvc status - Data and pipelines are up to date. ``` diff --git a/content/docs/command-reference/fetch.md b/content/docs/command-reference/fetch.md index 7e6c843796..8a3bbd22cf 100644 --- a/content/docs/command-reference/fetch.md +++ b/content/docs/command-reference/fetch.md @@ -154,8 +154,8 @@ into our local cache. ```dvc $ dvc status --cloud ... - deleted: data/features/train.pkl - deleted: model.pkl + deleted: data/features/train.pkl + deleted: model.pkl $ dvc fetch diff --git a/content/docs/command-reference/install.md b/content/docs/command-reference/install.md index 35461f2ed5..24dac7b7d1 100644 --- a/content/docs/command-reference/install.md +++ b/content/docs/command-reference/install.md @@ -247,7 +247,6 @@ M model.pkl M data/features/ $ dvc status - Data and pipelines are up to date. ``` diff --git a/content/docs/command-reference/metrics/diff.md b/content/docs/command-reference/metrics/diff.md index daec243ab7..dd381aecc3 100644 --- a/content/docs/command-reference/metrics/diff.md +++ b/content/docs/command-reference/metrics/diff.md @@ -41,7 +41,7 @@ lists all the current metrics without comparisons. ## Options -- `--targets ` - limit command scope to these metric files. Using -R, +- `--targets ` - limit command scope to these metric files. Using `-R`, directories to search metric files in can also be given. When specifying arguments for `--targets` before `revisions`, you should use `--` after this option's arguments, e.g.: diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index 1bcc61e8cc..4e5b65f640 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -192,6 +192,7 @@ such that the data in some of these stages should be updated in the ```dvc $ dvc status -c +... deleted: data/features/test.pkl deleted: data/features/train.pkl deleted: model.pkl diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index e2151ccfcd..09def79221 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -149,9 +149,10 @@ Imagine the project has been modified such that the ```dvc $ dvc status --cloud - new: data/model.p - new: data/matrix-test.p - new: data/matrix-train.p +... + new: data/model.p + new: data/matrix-test.p + new: data/matrix-train.p ``` One could do a simple `dvc push` to share all the data, but what if you only @@ -258,7 +259,6 @@ $ tree ~/vault/recursive 10 directories, 10 files $ dvc status --cloud - Data and pipelines are up to date. ``` diff --git a/content/docs/command-reference/status.md b/content/docs/command-reference/status.md index f9a5766a0f..c45e63edf8 100644 --- a/content/docs/command-reference/status.md +++ b/content/docs/command-reference/status.md @@ -160,11 +160,11 @@ bar.dvc: modified: bar changed outs: not in cache: foo -foo.dvc +foo.dvc: changed outs: deleted: foo changed checksum -prepare.dvc +prepare.dvc: changed outs: new: bar always changed @@ -180,11 +180,11 @@ This shows that for stage `bar.dvc`, the dependency `foo` and the ```dvc $ dvc status foo.dvc dobar -foo.dvc +foo.dvc: changed outs: deleted: foo changed checksum -dobar +dobar: changed deps: modified: bar changed outs: @@ -220,7 +220,7 @@ $ dvc status model.p Data and pipelines are up to date. $ dvc status model.p --with-deps -matrix-train.p +matrix-train.p: changed deps: modified: code/featurization.py ``` @@ -243,10 +243,11 @@ remote yet: ```dvc $ dvc status --remote storage -new: data/model.p -new: data/eval.txt -new: data/matrix-train.p -new: data/matrix-test.p +... + new: data/model.p + new: data/eval.txt + new: data/matrix-train.p + new: data/matrix-test.p ``` The output shows where the location of the remote storage is, as well as any diff --git a/content/docs/user-guide/dvcignore.md b/content/docs/user-guide/dvcignore.md index cc0b0578b7..d826859e03 100644 --- a/content/docs/user-guide/dvcignore.md +++ b/content/docs/user-guide/dvcignore.md @@ -149,12 +149,10 @@ adding new file: ```dvc $ dvc status - Data and pipelines are up to date. $ mv data/data1 data/data3 $ dvc status - data.dvc: changed outs: modified: data From bd5e462898808ebcdd672b43169e9cc784d6f634 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 23 Sep 2020 17:08:27 -0400 Subject: [PATCH 02/12] docs: copy edits around term "ipmort" --- content/docs/command-reference/import.md | 7 ++++--- content/docs/command-reference/list.md | 2 +- .../docs/user-guide/basic-concepts/external-dependency.md | 8 ++++---- content/docs/user-guide/external-dependencies.md | 5 ++++- content/docs/user-guide/merge-conflicts.md | 2 +- 5 files changed, 14 insertions(+), 10 deletions(-) diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md index 341d4fd587..e3af008998 100644 --- a/content/docs/command-reference/import.md +++ b/content/docs/command-reference/import.md @@ -112,9 +112,10 @@ Importing 'data/data.xml (git@github.com:iterative/example-get-started)' ``` In contrast with `dvc get`, this command doesn't just download the data file, -but it also creates an import stage (`.dvc` file) with a link to the data source -(as explained in the description above). (This import stage can later be used to -[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`: +but it also creates an import stage (`.dvc` file) with a link to +the data source (as explained in the description above). (This import stage can +later be used to [update](/doc/command-reference/update) the import.) Check +`data.xml.dvc`: ```yaml md5: 7de90e7de7b432ad972095bc1f2ec0f8 diff --git a/content/docs/command-reference/list.md b/content/docs/command-reference/list.md index 7347303170..ba7878f84b 100644 --- a/content/docs/command-reference/list.md +++ b/content/docs/command-reference/list.md @@ -21,7 +21,7 @@ DVC, by effectively replacing data files, models, directories with `.dvc` files files when you browse a DVC repository on Git hosting (e.g. GitHub), you just see the `dvc.yaml` and `.dvc` files. This makes it hard to navigate the project to find data artifacts for use with `dvc get`, -`dvc import`, or `dvc.api`. +`dvc import`, or `dvc.api` functions. `dvc list` prints a virtual view of a DVC repository, as if files and directories tracked by DVC were found directly in the remote Git repo. Only the diff --git a/content/docs/user-guide/basic-concepts/external-dependency.md b/content/docs/user-guide/basic-concepts/external-dependency.md index aa79fe9e28..821114c3b4 100644 --- a/content/docs/user-guide/basic-concepts/external-dependency.md +++ b/content/docs/user-guide/basic-concepts/external-dependency.md @@ -3,8 +3,8 @@ name: 'External Dependency' match: ['external dependency', 'external dependencies'] --- -A stage dependency (`deps` field in `dvc.yaml` or in an -[import stage](/doc/command-reference/import) `.dvc` file) with origin in an -external source, for example HTTP, SSH, Amazon S3, Google Cloud Storage remote -locations, or even other DVC repositories. See +A stage dependency (`deps` field in `dvc.yaml` or in an import +stage `.dvc` file) with origin in an external source, for example HTTP, +SSH, Amazon S3, Google Cloud Storage remote locations, or even other DVC +repositories. See [External Dependencies](/doc/user-guide/external-dependencies). diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 2a3846b616..8e3003a8dc 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -151,7 +151,7 @@ $ dvc run -n download_file \ If instead of a URL you'd like to use an alias that can be managed independently, or if the external dependency location requires access credentials, you may use `dvc remote add` to define this location as a DVC -Remote, and then use a special URL with format `remote://{remote_name}/{path}` +remote, and then use a special URL with format `remote://{remote_name}/{path}` to define an external dependency. For example, for an HTTPs remote/dependency: @@ -181,6 +181,9 @@ Importing 'https://data.dvc.org/get-started/data.xml' -> 'data.xml' The command above creates the import `.dvc` file `data.xml.dvc`, that contains an external dependency (in this case an HTTPs URL). +The only difference is that `dvc fetch` and `dvc pull` won't look in +[remote-storage]() for the data, but in it's original source. +
### Expand to see resulting `.dvc` file diff --git a/content/docs/user-guide/merge-conflicts.md b/content/docs/user-guide/merge-conflicts.md index 33d77948a6..f12319da6e 100644 --- a/content/docs/user-guide/merge-conflicts.md +++ b/content/docs/user-guide/merge-conflicts.md @@ -139,4 +139,4 @@ outs: - path: data.xml ``` -And then `dvc update` the `.dvc` file. +And then `dvc pull` the import (`dvc.xml` in the example above). From c75a68397bac28077a77113232ce2cd2f55fc123 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 23 Sep 2020 17:52:14 -0400 Subject: [PATCH 03/12] import: add notes about push/pull using original source (not DVC remotes) per #1792 --- content/docs/command-reference/get.md | 8 +-- content/docs/command-reference/import-url.md | 8 ++- content/docs/command-reference/import.md | 12 ++-- .../docs/user-guide/external-dependencies.md | 57 ++++++++++--------- 4 files changed, 49 insertions(+), 36 deletions(-) diff --git a/content/docs/command-reference/get.md b/content/docs/command-reference/get.md index 21e682703e..995c6250fa 100644 --- a/content/docs/command-reference/get.md +++ b/content/docs/command-reference/get.md @@ -36,16 +36,16 @@ the data source. Both HTTP and SSH protocols are supported for online repos to an "offline" repo (if it's a DVC repo without a default remote, instead of downloading, DVC will try to copy the target data from its cache). +⚠️ Online repos should have a default +[DVC remote](/doc/command-reference/remote) containing the actual data for this +command to work. + The `path` argument is used to specify the location of the target to download within the source repository at `url`. `path` can specify any file or directory in the source repo, either tracked by DVC (including paths inside tracked directories) or by Git. Note that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the repo. -⚠️ The project should have a default -[DVC remote](/doc/command-reference/remote), containing the actual data for this -command to work. - > See `dvc get-url` to download data from other supported locations such as S3, > SSH, HTTP, etc. diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index d4b283be2d..0eebf78a39 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -109,8 +109,12 @@ $ dvc run -n download_data \ wget https://data.dvc.org/get-started/data.xml -O data.xml ``` -`dvc import-url` generates an import stage `.dvc` file and `dvc run` a regular -stage (in `dvc.yaml`). +`dvc import-url` generates an import stage `.dvc` file and +`dvc run` a regular stage (in `dvc.yaml`). + +⚠️ DVC won't push or pull imported data to +[remote storage](/doc/command-reference/remote), it will rely on it's original +source. ## Options diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md index e3af008998..730788ac9c 100644 --- a/content/docs/command-reference/import.md +++ b/content/docs/command-reference/import.md @@ -39,6 +39,10 @@ the data source. Both HTTP and SSH protocols are supported for online repos to an "offline" repo (if it's a DVC repo without a default remote, instead of downloading, DVC will try to copy the target data from its cache). +⚠️ Online repos should have a default +[DVC remote](/doc/command-reference/remote) containing the actual data for this +command to work. + The `path` argument is used to specify the location of the target to download within the source repository at `url`. `path` can specify any file or directory in the source repo, either tracked by DVC (including paths inside tracked @@ -46,10 +50,6 @@ directories) or by Git. Note that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the repo. Chained imports (importing data that was imported into the source repo at `url`) are not supported, however. -⚠️ The project should have a default -[DVC remote](/doc/command-reference/remote), containing the actual data for this -command to work. - > See `dvc import-url` to download and track data from other supported locations > such as S3, SSH, HTTP, etc. @@ -66,6 +66,10 @@ path in the workspace. It records enough metadata about the imported data to enable DVC efficiently determining whether the local copy is out of date. +⚠️ DVC won't push or pull imported data to +[remote storage](/doc/command-reference/remote), it will rely on it's original +source. + To actually [version the data](/doc/tutorials/get-started/data-versioning), `git add` (and `git commit`) the import stage. diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 8e3003a8dc..301f85bb42 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -146,27 +146,6 @@ $ dvc run -n download_file \
-## Example: DVC remote aliases - -If instead of a URL you'd like to use an alias that can be managed -independently, or if the external dependency location requires access -credentials, you may use `dvc remote add` to define this location as a DVC -remote, and then use a special URL with format `remote://{remote_name}/{path}` -to define an external dependency. - -For example, for an HTTPs remote/dependency: - -```dvc -$ dvc remote add example https://example.com -$ dvc run -n download_file \ - -d remote://example/data.txt \ - -o data.txt \ - wget https://example.com/data.txt -O data.txt -``` - -Please refer to `dvc remote add` for more details like setting up access -credentials for the different remotes. - ## Example: `import-url` command In the previous examples, special downloading tools were used: `scp`, @@ -181,8 +160,9 @@ Importing 'https://data.dvc.org/get-started/data.xml' -> 'data.xml' The command above creates the import `.dvc` file `data.xml.dvc`, that contains an external dependency (in this case an HTTPs URL). -The only difference is that `dvc fetch` and `dvc pull` won't look in -[remote-storage]() for the data, but in it's original source. +⚠️ DVC won't push or pull imported data to +[remote storage](/doc/command-reference/remote), it will rely on it's original +source.
@@ -208,11 +188,11 @@ determine whether the source has changed and we need to download the file again.
-## Example: Using import +## Example: Imports `dvc import` can download a data artifact from any DVC -project or Git repository. It also creates an external dependency in its -import `.dvc` file. +project, or any file from a Git repository. It also creates an external +dependency in its import `.dvc` file. ```dvc $ dvc import git@github.com:iterative/example-get-started model.pkl @@ -223,6 +203,10 @@ Importing 'model.pkl (git@github.com:iterative/example-get-started)' The command above creates `model.pkl.dvc`, where the external dependency is specified (with the `repo` field). +⚠️ DVC won't push or pull imported data to +[remote storage](/doc/command-reference/remote), it will rely on it's original +source. +
### Expand to see resulting `.dvc` file @@ -246,3 +230,24 @@ The `url` and `rev_lock` subfields under `repo` are used to save the origin and [version](https://git-scm.com/docs/revisions) of the dependency, respectively.
+ +## Example: DVC remote aliases + +If instead of a URL you'd like to use an alias that can be managed +independently, or if the external dependency location requires access +credentials, you may use `dvc remote add` to define this location as a DVC +remote, and then use a special URL with format `remote://{remote_name}/{path}` +to define an external dependency. + +For example, for an HTTPs remote/dependency: + +```dvc +$ dvc remote add example https://example.com +$ dvc run -n download_file \ + -d remote://example/data.txt \ + -o data.txt \ + wget https://example.com/data.txt -O data.txt +``` + +Please refer to `dvc remote add` for more details like setting up access +credentials for the different remotes. From ef42cf8b7ae7e5b5060ec077555cb7195ebd2c75 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 23 Sep 2020 18:21:21 -0400 Subject: [PATCH 04/12] cmd: better move example titles --- content/docs/command-reference/move.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/command-reference/move.md b/content/docs/command-reference/move.md index f2bbe24df5..e564b45653 100644 --- a/content/docs/command-reference/move.md +++ b/content/docs/command-reference/move.md @@ -109,7 +109,7 @@ $ dvc commit -f - `-v`, `--verbose` - displays detailed tracing information. -## Example: change the file name +## Example: Change the file name We first use `dvc add` to track file with DVC. Then, we change its name using `dvc move`. @@ -130,7 +130,7 @@ $ tree └── other.csv.dvc ``` -## Example: change the location +## Example: Change a file location We use `dvc add` to track a file with DVC, then we use `dvc move` to change its location. If the target path is a directory and already exists, the data file is @@ -166,7 +166,7 @@ $ tree └── foo.dvc ``` -## Example: change an imported directory name and location +## Example: Move a directory Let's try the same with an entire directory imported from an external DVC repository with `dvc import`. Note that, as in the previous cases, the From e4c2c87f933bd714c2b38e085819727df2ecc304 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 25 Sep 2020 17:16:49 -0400 Subject: [PATCH 05/12] guide: fix import merge section per https://github.com/iterative/dvc.org/pull/1801#pullrequestreview-495051810 --- content/docs/user-guide/merge-conflicts.md | 19 +++++++++---------- content/docs/user-guide/what-is-dvc.md | 2 +- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/content/docs/user-guide/merge-conflicts.md b/content/docs/user-guide/merge-conflicts.md index f12319da6e..231cd3d9a2 100644 --- a/content/docs/user-guide/merge-conflicts.md +++ b/content/docs/user-guide/merge-conflicts.md @@ -103,11 +103,6 @@ To resolve conflicted `.dvc` files generated by `dvc import` or `dvc import-url`, remove the conflicted hashes altogether: ```yaml -< < < < < < < HEAD -md5: 263395583f35403c8e0b1b94b30bea32 -======= -md5: 520d2602f440d13372435d91d3bfa176 -> > > > > > > branch frozen: true deps: - path: get-started/data.xml @@ -115,15 +110,15 @@ deps: url: https://github.com/iterative/dataset-registry < < < < < < < HEAD rev_lock: f31f5c4cdae787b4bdeb97a717687d44667d9e62 -======= += = = = = = = rev_lock: 06be1104741f8a7c65449322a1fcc8c5f1070a1e ->>>>>>> branch +> > > > > > > branch outs: < < < < < < < HEAD - md5: a304afb96060aad90176268345e10355 -======= += = = = = = = - md5: 35dd1fda9cfb4b645ae431f4621fa324 -> > > > > > > +> > > > > > > branch path: data.xml ``` @@ -139,4 +134,8 @@ outs: - path: data.xml ``` -And then `dvc pull` the import (`dvc.xml` in the example above). +And then `dvc update` the `.dvc` file to download the latest data from its +original source. + +> Note that updating will bring in the latest version of the data found in its +> source, which may not correspond with any of the hashes that was removed. diff --git a/content/docs/user-guide/what-is-dvc.md b/content/docs/user-guide/what-is-dvc.md index 18f86f0acb..045de098a8 100644 --- a/content/docs/user-guide/what-is-dvc.md +++ b/content/docs/user-guide/what-is-dvc.md @@ -1,6 +1,6 @@ # What Is DVC? -**Data Version Control** is a new type of data versioning, workflow and +**Data Version Control** is a new type of data versioning, workflow, and experiment management software, that builds upon [Git](https://git-scm.com/) (although it can work stand-alone). DVC reduces the gap between established engineering tool sets and data science needs, allowing users to take advantage From e6359056a646a8db006f65cb1d3e4d842f069f3f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 25 Sep 2020 17:52:55 -0400 Subject: [PATCH 06/12] cmd: rewrite description of url and path for get/import per https://github.com/iterative/dvc.org/pull/1801#pullrequestreview-495102888 --- content/docs/command-reference/get.md | 21 ++++++++++---------- content/docs/command-reference/import.md | 25 ++++++++++++------------ content/docs/command-reference/list.md | 7 +++---- 3 files changed, 26 insertions(+), 27 deletions(-) diff --git a/content/docs/command-reference/get.md b/content/docs/command-reference/get.md index 995c6250fa..8bc4c2e8b3 100644 --- a/content/docs/command-reference/get.md +++ b/content/docs/command-reference/get.md @@ -31,20 +31,19 @@ directory. (Analogous to `wget`, but for repos.) > directories to download. The `url` argument specifies the address of the DVC or Git repository containing -the data source. Both HTTP and SSH protocols are supported for online repos -(e.g. `[user@]server:project.git`). `url` can also be a local file system path -to an "offline" repo (if it's a DVC repo without a default remote, instead of -downloading, DVC will try to copy the target data from its cache). - -⚠️ Online repos should have a default -[DVC remote](/doc/command-reference/remote) containing the actual data for this -command to work. +the data source. Both HTTP and SSH protocols are supported (e.g. +`[user@]server:project.git`). `url` can also be a local file system path. The `path` argument is used to specify the location of the target to download within the source repository at `url`. `path` can specify any file or directory -in the source repo, either tracked by DVC (including paths inside tracked -directories) or by Git. Note that DVC-tracked targets must be found in a -`dvc.yaml` or `.dvc` file of the repo. +tracked by either Git or DVC (including paths inside tracked directories). Note +that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the +repo. + +⚠️ DVC repos should have a default [DVC remote](/doc/command-reference/remote) +containing the target actual for this command to work. The only exception is for +local repos, where DVC will try to copy the data from its cache +first. > See `dvc get-url` to download data from other supported locations such as S3, > SSH, HTTP, etc. diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md index 730788ac9c..5ea1cbaa7e 100644 --- a/content/docs/command-reference/import.md +++ b/content/docs/command-reference/import.md @@ -34,21 +34,19 @@ updating the import later, if it has changed in its data source. (See > directories to import. The `url` argument specifies the address of the DVC or Git repository containing -the data source. Both HTTP and SSH protocols are supported for online repos -(e.g. `[user@]server:project.git`). `url` can also be a local file system path -to an "offline" repo (if it's a DVC repo without a default remote, instead of -downloading, DVC will try to copy the target data from its cache). - -⚠️ Online repos should have a default -[DVC remote](/doc/command-reference/remote) containing the actual data for this -command to work. +the data source. Both HTTP and SSH protocols are supported (e.g. +`[user@]server:project.git`). `url` can also be a local file system path. The `path` argument is used to specify the location of the target to download within the source repository at `url`. `path` can specify any file or directory -in the source repo, either tracked by DVC (including paths inside tracked -directories) or by Git. Note that DVC-tracked targets must be found in a -`dvc.yaml` or `.dvc` file of the repo. Chained imports (importing data that was -imported into the source repo at `url`) are not supported, however. +tracked by either Git or DVC (including paths inside tracked directories). Note +that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the +repo. + +⚠️ DVC repos should have a default [DVC remote](/doc/command-reference/remote) +containing the target actual for this command to work. The only exception is for +local repos, where DVC will try to copy the data from its cache +first. > See `dvc import-url` to download and track data from other supported locations > such as S3, SSH, HTTP, etc. @@ -78,6 +76,9 @@ Note that import stages are considered always they won't be updated. Use `dvc update` to update the downloaded data artifact from the source repo. +Also note that chained imports (importing data that was imported into the source +repo at `url`) are not supported. + ## Options - `-o `, `--out ` - specify a path to the desired location in the diff --git a/content/docs/command-reference/list.md b/content/docs/command-reference/list.md index ba7878f84b..651c5537ff 100644 --- a/content/docs/command-reference/list.md +++ b/content/docs/command-reference/list.md @@ -36,10 +36,9 @@ $ dvc pull $ ls ``` -The `url` argument specifies the address of the Git repository containing the -data source. Both HTTP and SSH protocols are supported for online repos (e.g. -`[user@]server:project.git`). `url` can also be a local file system path to an -"offline" Git repo. +The `url` argument specifies the address of the DVC or Git repository containing +the data source. Both HTTP and SSH protocols are supported (e.g. +`[user@]server:project.git`). `url` can also be a local file system path. The optional `path` argument is used to specify a directory to list within the source repository at `url` (including paths inside tracked directories). It's From 9b4ea34bce010a1c3a80a30b39c3f4f858308bb8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 25 Sep 2020 18:02:40 -0400 Subject: [PATCH 07/12] guide: remove notes about imports and remote storage in x deps per https://github.com/iterative/dvc.org/pull/1801#pullrequestreview-495103565 --- content/docs/user-guide/external-dependencies.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 301f85bb42..e670fffa12 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -160,10 +160,6 @@ Importing 'https://data.dvc.org/get-started/data.xml' -> 'data.xml' The command above creates the import `.dvc` file `data.xml.dvc`, that contains an external dependency (in this case an HTTPs URL). -⚠️ DVC won't push or pull imported data to -[remote storage](/doc/command-reference/remote), it will rely on it's original -source. -
### Expand to see resulting `.dvc` file @@ -203,10 +199,6 @@ Importing 'model.pkl (git@github.com:iterative/example-get-started)' The command above creates `model.pkl.dvc`, where the external dependency is specified (with the `repo` field). -⚠️ DVC won't push or pull imported data to -[remote storage](/doc/command-reference/remote), it will rely on it's original -source. -
### Expand to see resulting `.dvc` file From 1e75862e826b5d0d6e74f3dbd429271492d93a4d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 25 Sep 2020 23:52:09 -0400 Subject: [PATCH 08/12] guide: rewrite section about remote aliases in x deps per https://github.com/iterative/dvc.org/pull/1801#pullrequestreview-495104228 --- .../docs/user-guide/external-dependencies.md | 45 ++++++++++--------- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index e670fffa12..93532143fe 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -146,6 +146,30 @@ $ dvc run -n download_file \
+## Example: Using DVC remote aliases + +You may want to encapsulate external locations as configurable entities that can +be managed independently. This is useful if multiple dependencies (or stages) +reuse the same location, or if the location is likely to change in the future. +And if the location requires authentication, you need a way to configuring it in +order to access the data. + +[DVC remotes](/doc/command-reference/remote) can do just this. You may use +`dvc remote add` to define them, and then use a special URL with format +`remote://{remote_name}/{path}` (remote alias) to define the external +dependency. For example (HTTPs location): + +```dvc +$ dvc remote add example https://example.com +$ dvc run -n download_file \ + -d remote://example/data.txt \ + -o data.txt \ + wget https://example.com/data.txt -O data.txt +``` + +> Please refer to `dvc remote add` for more details like setting up access +> credentials for the different remotes. + ## Example: `import-url` command In the previous examples, special downloading tools were used: `scp`, @@ -222,24 +246,3 @@ The `url` and `rev_lock` subfields under `repo` are used to save the origin and [version](https://git-scm.com/docs/revisions) of the dependency, respectively.
- -## Example: DVC remote aliases - -If instead of a URL you'd like to use an alias that can be managed -independently, or if the external dependency location requires access -credentials, you may use `dvc remote add` to define this location as a DVC -remote, and then use a special URL with format `remote://{remote_name}/{path}` -to define an external dependency. - -For example, for an HTTPs remote/dependency: - -```dvc -$ dvc remote add example https://example.com -$ dvc run -n download_file \ - -d remote://example/data.txt \ - -o data.txt \ - wget https://example.com/data.txt -O data.txt -``` - -Please refer to `dvc remote add` for more details like setting up access -credentials for the different remotes. From f150ccc9d043d5864e4d3beeb485a2c1c4241ede Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 25 Sep 2020 23:58:10 -0400 Subject: [PATCH 09/12] glossary: roll back wrong abbr tag in tooltip --- .../docs/user-guide/basic-concepts/external-dependency.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/user-guide/basic-concepts/external-dependency.md b/content/docs/user-guide/basic-concepts/external-dependency.md index 821114c3b4..aa79fe9e28 100644 --- a/content/docs/user-guide/basic-concepts/external-dependency.md +++ b/content/docs/user-guide/basic-concepts/external-dependency.md @@ -3,8 +3,8 @@ name: 'External Dependency' match: ['external dependency', 'external dependencies'] --- -A stage dependency (`deps` field in `dvc.yaml` or in an import -stage `.dvc` file) with origin in an external source, for example HTTP, -SSH, Amazon S3, Google Cloud Storage remote locations, or even other DVC -repositories. See +A stage dependency (`deps` field in `dvc.yaml` or in an +[import stage](/doc/command-reference/import) `.dvc` file) with origin in an +external source, for example HTTP, SSH, Amazon S3, Google Cloud Storage remote +locations, or even other DVC repositories. See [External Dependencies](/doc/user-guide/external-dependencies). From 0dc8d2202700c712c3b4de5f97cf6e6ba0fe2040 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 25 Sep 2020 23:59:14 -0400 Subject: [PATCH 10/12] guide: small simplification --- content/docs/user-guide/external-dependencies.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 93532143fe..59aa84ac6f 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -150,9 +150,9 @@ $ dvc run -n download_file \ You may want to encapsulate external locations as configurable entities that can be managed independently. This is useful if multiple dependencies (or stages) -reuse the same location, or if the location is likely to change in the future. -And if the location requires authentication, you need a way to configuring it in -order to access the data. +reuse the same location, or if its likely to change in the future. And if the +location requires authentication, you need a way to configuring it in order to +access the data. [DVC remotes](/doc/command-reference/remote) can do just this. You may use `dvc remote add` to define them, and then use a special URL with format From f23fc5ceb35fc7f24e8a18f91e49dbd71e3077eb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 26 Sep 2020 00:57:07 -0400 Subject: [PATCH 11/12] cmd: "pull/push to/from" fix per https://github.com/iterative/dvc.org/pull/1801#pullrequestreview-496938913 --- content/docs/command-reference/import-url.md | 2 +- content/docs/command-reference/import.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 0eebf78a39..6eca56502b 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -112,7 +112,7 @@ $ dvc run -n download_data \ `dvc import-url` generates an import stage `.dvc` file and `dvc run` a regular stage (in `dvc.yaml`). -⚠️ DVC won't push or pull imported data to +⚠️ DVC won't push or pull imported data to/from [remote storage](/doc/command-reference/remote), it will rely on it's original source. diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md index 5ea1cbaa7e..3f9d9f7a6b 100644 --- a/content/docs/command-reference/import.md +++ b/content/docs/command-reference/import.md @@ -64,7 +64,7 @@ path in the workspace. It records enough metadata about the imported data to enable DVC efficiently determining whether the local copy is out of date. -⚠️ DVC won't push or pull imported data to +⚠️ DVC won't push or pull imported data to/from [remote storage](/doc/command-reference/remote), it will rely on it's original source. From d16cd33d0e6cd1f982b8be1211d76d33965eb39d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 26 Sep 2020 01:10:35 -0400 Subject: [PATCH 12/12] guide: hint on the use of remote modify --local for x deps per https://github.com/iterative/dvc.org/pull/1801#pullrequestreview-496939074 --- .../docs/user-guide/external-dependencies.md | 25 +++++++++++++------ 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 59aa84ac6f..e89283bb47 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -151,25 +151,34 @@ $ dvc run -n download_file \ You may want to encapsulate external locations as configurable entities that can be managed independently. This is useful if multiple dependencies (or stages) reuse the same location, or if its likely to change in the future. And if the -location requires authentication, you need a way to configuring it in order to -access the data. +location requires authentication, you need a way to configure it in order to +connect. [DVC remotes](/doc/command-reference/remote) can do just this. You may use `dvc remote add` to define them, and then use a special URL with format `remote://{remote_name}/{path}` (remote alias) to define the external -dependency. For example (HTTPs location): +dependency. + +Let's see an example using SSH. First, register and configure the remote: + +```dvc +$ dvc remote add myssh ssh://myserver.com +$ dvc remote modify --local myssh user myuser +$ dvc remote modify --local myssh password mypassword +``` + +> Please refer to `dvc remote add` for more details like setting up access +> credentials for the different remote types. + +Now, use an alias to this remote when defining the stage: ```dvc -$ dvc remote add example https://example.com $ dvc run -n download_file \ - -d remote://example/data.txt \ + -d remote://myssh/path/to/data.txt \ -o data.txt \ wget https://example.com/data.txt -O data.txt ``` -> Please refer to `dvc remote add` for more details like setting up access -> credentials for the different remotes. - ## Example: `import-url` command In the previous examples, special downloading tools were used: `scp`,