From 715b8b96a85cba5a9ccbe5728cc7560670eb07a6 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 1 Aug 2020 11:44:44 -0500 Subject: [PATCH 01/16] docs: recommend absolute paths for SSH and HDFS remote connections per https://github.com/iterative/dvc/issues/4167#issuecomment-663318837 --- content/docs/command-reference/import-url.md | 20 +++++++++---------- content/docs/command-reference/remote/add.md | 4 ++-- .../docs/command-reference/remote/modify.md | 2 +- .../docs/user-guide/external-dependencies.md | 8 ++++++-- .../docs/user-guide/managing-external-data.md | 16 +++++++++------ 5 files changed, 29 insertions(+), 21 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index d3d84f0322..fdc5b13462 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -55,16 +55,16 @@ source. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| -------- | --------------------------------------------------- | ------------------------------------------ | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | -| `gs` | Google Cloud Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com:/path/to/data` | -| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | -| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | -| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | +| Type | Description | `url` format | +| -------- | --------------------------------------------------- | ---------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | +| `gs` | Google Cloud Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/abs/path/to/data` | +| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/abs/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | +| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index adc643d7aa..17445f7a77 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -300,7 +300,7 @@ $ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' ### Click for SSH ```dvc -$ dvc remote add -d myremote ssh://user@example.com/path/to/dir +$ dvc remote add -d myremote ssh://user@example.com/absolute/path ``` > See also `dvc remote modify` for a full list of SSH parameters. @@ -323,7 +323,7 @@ like `ssh` and `sftp` (GNU/Linux). ### Click for HDFS ```dvc -$ dvc remote add -d myremote hdfs://user@example.com/path/to/dir +$ dvc remote add -d myremote hdfs://user@example.com/absolute/path ``` > See also `dvc remote modify` for a full list of HDFS parameters. diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 279a791719..2dec9c470b 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -440,7 +440,7 @@ more information. - `url` - remote location URL. ```dvc - $ dvc remote modify myremote url ssh://user@example.com:1234/path/to/remote + $ dvc remote modify myremote url ssh://user@example.com:1234/absolute/path ``` - `user` - username to use to access a remote. The order in which dvc searches diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index f9ee3eda93..f019f86931 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -54,11 +54,13 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d ssh://user@example.com:/home/shared/data.txt \ + -d ssh://user@example.com/home/shared/data.txt \ -o data.txt \ - scp user@example.com:/home/shared/data.txt data.txt + scp user@example.com/home/shared/data.txt data.txt ``` +> Please notice `/home/...` is an absolute path from the remote system's root. + ### Amazon S3 ```dvc @@ -101,6 +103,8 @@ $ dvc run -n download_file data.txt ``` +> Please notice `/home/...` is an absolute path from the remote system's root. + ### HTTP > Including HTTPs diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index 4449b620c0..d10efd7214 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -66,21 +66,23 @@ $ dvc run -d data.txt \ ```dvc # Add SSH remote to be used as cache location for SSH files -$ dvc remote add sshcache ssh://user@example.com:/cache +$ dvc remote add sshcache ssh://user@example.com/home/.../cache # Tell DVC to use the 'sshcache' remote as SSH cache location $ dvc config cache.ssh sshcache # Add data on SSH directly -$ dvc add --external ssh://user@example.com:/mydata +$ dvc add --external ssh://user@example.com/home/.../mydata # Create the stage with an external SSH output $ dvc run -d data.txt \ --external \ - -o ssh://user@example.com:/home/shared/data.txt \ - scp data.txt user@example.com:/home/shared/data.txt + -o ssh://user@example.com/home/shared/data.txt \ + scp data.txt user@example.com/home/shared/data.txt ``` +> Please notice `/home/...` are absolute paths from the remote system's root. + ### Amazon S3 ```dvc @@ -123,13 +125,13 @@ $ dvc run -d data.txt \ ```dvc # Add HDFS remote to be used as cache location for HDFS files -$ dvc remote add hdfscache hdfs://user@example.com/cache +$ dvc remote add hdfscache hdfs://user@example.com/home/.../cache # Tell DVC to use the 'hdfscache' remote as HDFS cache location $ dvc config cache.hdfs hdfscache # Add data on HDFS directly -$ dvc add --external hdfs://user@example.com/mydata +$ dvc add --external hdfs://user@example.com/home/.../mydata # Create the stage with an external HDFS output $ dvc run -d data.txt \ @@ -140,5 +142,7 @@ $ dvc run -d data.txt \ hdfs://user@example.com/home/shared/data.txt ``` +> Please notice `/home/...` are absolute paths from the remote system's root. + Note that as long as there is a `hdfs://...` path for your data, DVC can handle it. So systems like Hadoop, Hive, and HBase are supported! From d27e07d12d1bfa07f12ca31674aafa8cbe0e1635 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 1 Aug 2020 13:19:50 -0500 Subject: [PATCH 02/16] cmd: forgot to save changes to get-url.md (see prev commit) --- content/docs/command-reference/get-url.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/content/docs/command-reference/get-url.md b/content/docs/command-reference/get-url.md index 02356500a1..d97f258e27 100644 --- a/content/docs/command-reference/get-url.md +++ b/content/docs/command-reference/get-url.md @@ -33,14 +33,14 @@ directory will be placed inside. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| ------- | -------------- | ------------------------------------------ | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `gs` | Google Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com:/path/to/data` | -| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` | -| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | +| Type | Description | `url` format | +| ------- | -------------- | ---------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `gs` | Google Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/abs/path/to/data` | +| `hdfs` | HDFS to file\* | `hdfs://user@example.com/abs/path/to/data.csv` | +| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, From 94926eee98ea9c47db2966a887b21c7d4ad95e56 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 11 Aug 2020 00:45:45 -0500 Subject: [PATCH 03/16] guide: improve absolute paths (and note) for SSH x deps per https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-459591916 and https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-459592083 --- content/docs/user-guide/external-dependencies.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index f019f86931..d6fa94bcfd 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -54,12 +54,13 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d ssh://user@example.com/home/shared/data.txt \ + -d ssh://user@example.com/abs/path/to/data.txt \ -o data.txt \ - scp user@example.com/home/shared/data.txt data.txt + scp user@example.com/abs/path/to/data.txt data.txt ``` -> Please notice `/home/...` is an absolute path from the remote system's root. +> ⚠️ Please notice `/abs/path/to/data` is an absolute path from the SFTP root +> (not always configured to be the system root). ### Amazon S3 @@ -96,15 +97,12 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d hdfs://user@example.com/home/shared/data.txt \ + -d hdfs://user@example.com/abs/path/to/data.txt \ -o data.txt \ hdfs fs -copyToLocal \ - hdfs://user@example.com/home/shared/data.txt \ - data.txt + hdfs://user@example.com/abs/path/to/data.txt data.txt ``` -> Please notice `/home/...` is an absolute path from the remote system's root. - ### HTTP > Including HTTPs From 8c60723650d69e8d47b81e2862c65f0849e1d4e9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 11 Aug 2020 01:01:02 -0500 Subject: [PATCH 04/16] guide: update SSH/HDFS abs paths (and notes) in x outs page per https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-459592083 --- .../docs/user-guide/managing-external-data.md | 25 +++++++++---------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index d10efd7214..ccf8cc2325 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -66,22 +66,23 @@ $ dvc run -d data.txt \ ```dvc # Add SSH remote to be used as cache location for SSH files -$ dvc remote add sshcache ssh://user@example.com/home/.../cache +$ dvc remote add sshcache ssh://user@example.com/abs/path/to/cache # Tell DVC to use the 'sshcache' remote as SSH cache location $ dvc config cache.ssh sshcache # Add data on SSH directly -$ dvc add --external ssh://user@example.com/home/.../mydata +$ dvc add --external ssh://user@example.com/abs/path/to/mydata # Create the stage with an external SSH output $ dvc run -d data.txt \ --external \ - -o ssh://user@example.com/home/shared/data.txt \ - scp data.txt user@example.com/home/shared/data.txt + -o ssh://user@example.com/abs/path/to/data.txt \ + scp data.txt user@example.com/abs/path/to/data.txt ``` -> Please notice `/home/...` are absolute paths from the remote system's root. +> ⚠️ Please notice `/abs/path/to/...` are absolute paths from the SFTP root (not +> always configured to be the system root). ### Amazon S3 @@ -125,24 +126,22 @@ $ dvc run -d data.txt \ ```dvc # Add HDFS remote to be used as cache location for HDFS files -$ dvc remote add hdfscache hdfs://user@example.com/home/.../cache +$ dvc remote add hdfscache hdfs://user@example.com/abs/path/to/cache # Tell DVC to use the 'hdfscache' remote as HDFS cache location $ dvc config cache.hdfs hdfscache # Add data on HDFS directly -$ dvc add --external hdfs://user@example.com/home/.../mydata +$ dvc add --external hdfs://user@example.com/abs/path/to/mydata # Create the stage with an external HDFS output $ dvc run -d data.txt \ --external \ - -o hdfs://user@example.com/home/shared/data.txt \ + -o hdfs://user@example.com/abs/path/to/data.txt \ hdfs fs -copyFromLocal \ - data.txt \ - hdfs://user@example.com/home/shared/data.txt + data.txt \ + hdfs://user@example.com/abs/path/to/data.txt ``` -> Please notice `/home/...` are absolute paths from the remote system's root. - -Note that as long as there is a `hdfs://...` path for your data, DVC can handle +Note that as long as there is a `hdfs://...` URL for your data, DVC can handle it. So systems like Hadoop, Hive, and HBase are supported! From 87033625d0892ace01bed441ded943a0c8031da8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 11 Aug 2020 01:04:08 -0500 Subject: [PATCH 05/16] guide: revert SCP URLs from external data guides per https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-459592243 --- content/docs/user-guide/external-dependencies.md | 2 +- content/docs/user-guide/managing-external-data.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index d6fa94bcfd..e61457c7c1 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -56,7 +56,7 @@ $ dvc run -n download_file $ dvc run -n download_file -d ssh://user@example.com/abs/path/to/data.txt \ -o data.txt \ - scp user@example.com/abs/path/to/data.txt data.txt + scp user@example.com:/abs/path/to/data.txt data.txt ``` > ⚠️ Please notice `/abs/path/to/data` is an absolute path from the SFTP root diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index ccf8cc2325..bc41ed0308 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -78,7 +78,7 @@ $ dvc add --external ssh://user@example.com/abs/path/to/mydata $ dvc run -d data.txt \ --external \ -o ssh://user@example.com/abs/path/to/data.txt \ - scp data.txt user@example.com/abs/path/to/data.txt + scp data.txt user@example.com:/abs/path/to/data.txt ``` > ⚠️ Please notice `/abs/path/to/...` are absolute paths from the SFTP root (not From ce32ddfc46d8a1302b3ed67c317d725729b40fe0 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 11 Aug 2020 10:28:33 -0500 Subject: [PATCH 06/16] guide: update note about rel. SSH paths from SFTP root per https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-464764783 --- content/docs/user-guide/external-dependencies.md | 4 ++-- content/docs/user-guide/managing-external-data.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index e61457c7c1..27c3ae09b1 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -59,8 +59,8 @@ $ dvc run -n download_file scp user@example.com:/abs/path/to/data.txt data.txt ``` -> ⚠️ Please notice `/abs/path/to/data` is an absolute path from the SFTP root -> (not always configured to be the system root). +> ⚠️ Please notice `/abs/path/to/data` is a path relative to the SFTP root +> (typically the system root, in which case it's an absolute path). ### Amazon S3 diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index bc41ed0308..06b52c918a 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -81,8 +81,8 @@ $ dvc run -d data.txt \ scp data.txt user@example.com:/abs/path/to/data.txt ``` -> ⚠️ Please notice `/abs/path/to/...` are absolute paths from the SFTP root (not -> always configured to be the system root). +> ⚠️ Please notice `/abs/path/to/...` are paths relative to the SFTP root +> (typically the system root, in which case they're absolute paths). ### Amazon S3 From aa5d6fda68bd984ea5e76cb455d61bee3f7e6a0c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 13 Aug 2020 19:16:44 -0500 Subject: [PATCH 07/16] guide: update SSH sample paths again + note + refs per https://github.com/iterative/dvc.org/pull/1649#discussion_r468679544 --- content/docs/command-reference/get-url.md | 16 +-- content/docs/command-reference/import-url.md | 20 ++-- .../docs/user-guide/external-dependencies.md | 78 ++++++------- .../docs/user-guide/managing-external-data.md | 105 +++++++++--------- 4 files changed, 111 insertions(+), 108 deletions(-) diff --git a/content/docs/command-reference/get-url.md b/content/docs/command-reference/get-url.md index 7d4d2f16dc..377082e697 100644 --- a/content/docs/command-reference/get-url.md +++ b/content/docs/command-reference/get-url.md @@ -33,14 +33,14 @@ directory will be placed inside. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| ------- | -------------- | ---------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `gs` | Google Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/abs/path/to/data` | -| `hdfs` | HDFS to file\* | `hdfs://user@example.com/abs/path/to/data.csv` | -| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | +| Type | Description | `url` format | +| ------- | -------------- | --------------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `gs` | Google Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/from/sftp/root` | +| `hdfs` | HDFS to file\* | `hdfs://user@example.com/absolute/path/to/data.csv` | +| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index fdc5b13462..3d780a1e91 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -55,16 +55,16 @@ source. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| -------- | --------------------------------------------------- | ---------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | -| `gs` | Google Cloud Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/abs/path/to/data` | -| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/abs/path/to/data.csv` | -| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | -| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | +| Type | Description | `url` format | +| -------- | --------------------------------------------------- | --------------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | +| `gs` | Google Cloud Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/absolute/path/to/data` | +| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/absolute/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | +| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 27c3ae09b1..42c5303a3a 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -41,66 +41,57 @@ stage to your list of stages in dvc.yaml. > Note that some of these commands use the `/home/shared` directory, typical in > Linux distributions. -### Local file system path - -```dvc -$ dvc run -n download_file - -d /home/shared/data.txt \ - -o data.txt \ - cp /home/shared/data.txt data.txt -``` - -### SSH - -```dvc -$ dvc run -n download_file - -d ssh://user@example.com/abs/path/to/data.txt \ - -o data.txt \ - scp user@example.com:/abs/path/to/data.txt data.txt -``` - -> ⚠️ Please notice `/abs/path/to/data` is a path relative to the SFTP root -> (typically the system root, in which case it's an absolute path). - ### Amazon S3 ```dvc $ dvc run -n download_file - -d s3://mybucket/data.txt \ - -o data.txt \ - aws s3 cp s3://mybucket/data.txt data.txt + -d s3://mybucket/data \ + -o data \ + aws s3 cp s3://mybucket/data data ``` ### Microsoft Azure Blob Storage ```dvc $ dvc run -n download_file - -d azure://my-container-name/data.txt \ - -o data.txt \ + -d azure://my-container-name/data \ + -o data \ az storage copy \ -d data.json \ --source-account-name my-account \ --source-container my-container-name \ - --source-blob data.txt + --source-blob data ``` ### Google Cloud Storage ```dvc $ dvc run -n download_file - -d gs://mybucket/data.txt \ - -o data.txt \ - gsutil cp gs://mybucket/data.txt data.txt + -d gs://mybucket/data \ + -o data \ + gsutil cp gs://mybucket/data data ``` +### SSH + +```dvc +$ dvc run -n download_file + -d ssh://user@example.com/path/from/sftp/root/to/data \ + -o data \ + scp user@example.com:/path/from/sftp/root/to/data data +``` + +> ⚠️ Please notice that the SFTP root typically is the system root, but doesn't +> have to be. + ### HDFS ```dvc $ dvc run -n download_file - -d hdfs://user@example.com/abs/path/to/data.txt \ - -o data.txt \ + -d hdfs://user@example.com/absolute/path/to/data \ + -o data \ hdfs fs -copyToLocal \ - hdfs://user@example.com/abs/path/to/data.txt data.txt + hdfs://user@example.com/absolute/path/to/data data ``` ### HTTP @@ -109,9 +100,18 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d https://example.com/data.txt \ - -o data.txt \ - wget https://example.com/data.txt -O data.txt + -d https://example.com/data \ + -o data \ + wget https://example.com/data -O data +``` + +### Local file system path + +```dvc +$ dvc run -n download_file + -d /home/shared/data \ + -o data \ + cp /home/shared/data data ``` ## Example: DVC remote aliases @@ -127,9 +127,9 @@ For example, for an HTTPs remote/dependency: ```dvc $ dvc remote add example https://example.com $ dvc run -n download_file - -d remote://example/data.txt \ - -o data.txt \ - wget https://example.com/data.txt -O data.txt + -d remote://example/data \ + -o data \ + wget https://example.com/data -O data ``` Please refer to `dvc remote add` for more details like setting up access diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index 06b52c918a..a39afdc556 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -43,47 +43,9 @@ in the same external/remote file system first. ## Examples For the examples, let's take a look at a [stage](/doc/command-reference/run) -that simply moves local file to an external location, producing a `data.txt.dvc` +that simply moves local file to an external location, producing a `data.dvc` DVC-file. -### Local file system path - -The default local cache location is `.dvc/cache`, so there is no need to specify -it explicitly. - -```dvc -# Add data on an external location directly -$ dvc add --external /home/shared/mydata - -# Create the stage with an external location output -$ dvc run -d data.txt \ - --external \ - -o /home/shared/data.txt \ - cp data.txt /home/shared/data.txt -``` - -### SSH - -```dvc -# Add SSH remote to be used as cache location for SSH files -$ dvc remote add sshcache ssh://user@example.com/abs/path/to/cache - -# Tell DVC to use the 'sshcache' remote as SSH cache location -$ dvc config cache.ssh sshcache - -# Add data on SSH directly -$ dvc add --external ssh://user@example.com/abs/path/to/mydata - -# Create the stage with an external SSH output -$ dvc run -d data.txt \ - --external \ - -o ssh://user@example.com/abs/path/to/data.txt \ - scp data.txt user@example.com:/abs/path/to/data.txt -``` - -> ⚠️ Please notice `/abs/path/to/...` are paths relative to the SFTP root -> (typically the system root, in which case they're absolute paths). - ### Amazon S3 ```dvc @@ -97,10 +59,10 @@ $ dvc config cache.s3 s3cache $ dvc add --external s3://mybucket/mydata # Create the stage with an external S3 output -$ dvc run -d data.txt \ +$ dvc run -d data \ --external \ - -o s3://mybucket/data.txt \ - aws s3 cp data.txt s3://mybucket/data.txt + -o s3://mybucket/data \ + aws s3 cp data s3://mybucket/data ``` ### Google Cloud Storage @@ -116,32 +78,73 @@ $ dvc config cache.gs gscache $ dvc add --external gs://mybucket/mydata # Create the stage with an external GS output -$ dvc run -d data.txt \ +$ dvc run -d data \ --external \ - -o gs://mybucket/data.txt \ - gsutil cp data.txt gs://mybucket/data.txt + -o gs://mybucket/data \ + gsutil cp data gs://mybucket/data ``` +### SSH + +```dvc +# Add SSH remote to be used as cache location for SSH files +$ dvc remote add sshcache \ + ssh://user@example.com/path/from/sftp/root/to/cache + +# Tell DVC to use the 'sshcache' remote as SSH cache location +$ dvc config cache.ssh sshcache + +# Add data on SSH directly +$ dvc add --external \ + ssh://user@example.com/path/from/sftp/root/to/mydata + +# Create the stage with an external SSH output +$ dvc run -d data \ + --external \ + -o ssh://user@example.com/path/from/sftp/root/to/data \ + scp data user@example.com:/path/from/sftp/root/to/data +``` + +> ⚠️ Please notice the SFTP root typically is the system root, but doesn't have +> to be. + ### HDFS ```dvc # Add HDFS remote to be used as cache location for HDFS files -$ dvc remote add hdfscache hdfs://user@example.com/abs/path/to/cache +$ dvc remote add hdfscache \ + hdfs://user@example.com/absolute/path/to/cache # Tell DVC to use the 'hdfscache' remote as HDFS cache location $ dvc config cache.hdfs hdfscache # Add data on HDFS directly -$ dvc add --external hdfs://user@example.com/abs/path/to/mydata +$ dvc add --external hdfs://user@example.com/absolute/path/to/mydata # Create the stage with an external HDFS output -$ dvc run -d data.txt \ +$ dvc run -d data \ --external \ - -o hdfs://user@example.com/abs/path/to/data.txt \ + -o hdfs://user@example.com/absolute/path/to/data \ hdfs fs -copyFromLocal \ - data.txt \ - hdfs://user@example.com/abs/path/to/data.txt + data \ + hdfs://user@example.com/absolute/path/to/data ``` Note that as long as there is a `hdfs://...` URL for your data, DVC can handle it. So systems like Hadoop, Hive, and HBase are supported! + +### Local file system path + +The default local cache location is `.dvc/cache`, so there is no need to specify +it explicitly. + +```dvc +# Add data on an external location directly +$ dvc add --external /home/shared/mydata + +# Create the stage with an external location output +$ dvc run -d data \ + --external \ + -o /home/shared/data \ + cp data /home/shared/data +``` From 7414ce6682c52469591843488cb6bb76d8340958 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 14 Aug 2020 18:47:09 -0500 Subject: [PATCH 08/16] docs: roll back absolute HDFS URLs per https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-465515459 --- content/docs/command-reference/get-url.md | 16 +++++++-------- content/docs/command-reference/import-url.md | 20 +++++++++---------- content/docs/command-reference/remote/add.md | 2 +- .../docs/user-guide/external-dependencies.md | 4 ++-- .../docs/user-guide/managing-external-data.md | 8 ++++---- 5 files changed, 25 insertions(+), 25 deletions(-) diff --git a/content/docs/command-reference/get-url.md b/content/docs/command-reference/get-url.md index 377082e697..51462a209e 100644 --- a/content/docs/command-reference/get-url.md +++ b/content/docs/command-reference/get-url.md @@ -33,14 +33,14 @@ directory will be placed inside. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| ------- | -------------- | --------------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `gs` | Google Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/path/from/sftp/root` | -| `hdfs` | HDFS to file\* | `hdfs://user@example.com/absolute/path/to/data.csv` | -| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | +| Type | Description | `url` format | +| ------- | -------------- | -------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `gs` | Google Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/from/sftp/root` | +| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 3d780a1e91..b2444ef4b7 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -55,16 +55,16 @@ source. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| -------- | --------------------------------------------------- | --------------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | -| `gs` | Google Cloud Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/absolute/path/to/data` | -| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/absolute/path/to/data.csv` | -| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | -| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | +| Type | Description | `url` format | +| -------- | --------------------------------------------------- | ---------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | +| `gs` | Google Cloud Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/absolute/path/to/data` | +| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | +| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index 343a440bf5..1ce3df3e89 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -322,7 +322,7 @@ like `ssh` and `sftp` (GNU/Linux). ### Click for HDFS ```dvc -$ dvc remote add -d myremote hdfs://user@example.com/absolute/path +$ dvc remote add -d myremote hdfs://user@example.com/path/to/dir ``` > See also `dvc remote modify` for a full list of HDFS parameters. diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 42c5303a3a..2bb56570b8 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -88,10 +88,10 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d hdfs://user@example.com/absolute/path/to/data \ + -d hdfs://user@example.com/data \ -o data \ hdfs fs -copyToLocal \ - hdfs://user@example.com/absolute/path/to/data data + hdfs://user@example.com/data data ``` ### HTTP diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index a39afdc556..cc88ff4781 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -113,21 +113,21 @@ $ dvc run -d data \ ```dvc # Add HDFS remote to be used as cache location for HDFS files $ dvc remote add hdfscache \ - hdfs://user@example.com/absolute/path/to/cache + hdfs://user@example.com/cache # Tell DVC to use the 'hdfscache' remote as HDFS cache location $ dvc config cache.hdfs hdfscache # Add data on HDFS directly -$ dvc add --external hdfs://user@example.com/absolute/path/to/mydata +$ dvc add --external hdfs://user@example.com/mydata # Create the stage with an external HDFS output $ dvc run -d data \ --external \ - -o hdfs://user@example.com/absolute/path/to/data \ + -o hdfs://user@example.com/data \ hdfs fs -copyFromLocal \ data \ - hdfs://user@example.com/absolute/path/to/data + hdfs://user@example.com/data ``` Note that as long as there is a `hdfs://...` URL for your data, DVC can handle From 710720da81203406e6122e55c9c3e9e14b8950ae Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 14 Aug 2020 18:53:24 -0500 Subject: [PATCH 09/16] cmd: change SSH example URL in import-url --- content/docs/command-reference/import-url.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index b2444ef4b7..e50ec55c59 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -55,16 +55,16 @@ source. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| -------- | --------------------------------------------------- | ---------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | -| `gs` | Google Cloud Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/absolute/path/to/data` | -| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | -| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | -| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | +| Type | Description | `url` format | +| -------- | --------------------------------------------------- | -------------------------------------------- | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | +| `gs` | Google Cloud Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/from/sftp/root` | +| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | +| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, From 6ec6717fb62adfa5271bc60584b64723bce38935 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 01:04:07 -0600 Subject: [PATCH 10/16] cmd: remove "sftp" mention in SSH URLs of get/import-url per https://github.com/iterative/dvc.org/pull/1649#pullrequestreview-468029218 --- content/docs/command-reference/get-url.md | 16 ++++++++-------- content/docs/command-reference/import-url.md | 20 ++++++++++---------- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/content/docs/command-reference/get-url.md b/content/docs/command-reference/get-url.md index 61ffc6cafc..5f4b1312ec 100644 --- a/content/docs/command-reference/get-url.md +++ b/content/docs/command-reference/get-url.md @@ -34,14 +34,14 @@ directory will be placed inside. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| ------- | -------------- | -------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `gs` | Google Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/path/from/sftp/root` | -| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` | -| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | +| Type | Description | `url` format | +| ------- | -------------- | ------------------------------------------ | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `gs` | Google Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/to/data` | +| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index af848cee08..fe12923b97 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -56,16 +56,16 @@ source. DVC supports several types of (local or) remote locations (protocols): -| Type | Description | `url` format | -| -------- | --------------------------------------------------- | -------------------------------------------- | -| `local` | Local path | `/path/to/local/data` | -| `s3` | Amazon S3 | `s3://mybucket/data` | -| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | -| `gs` | Google Cloud Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com/path/from/sftp/root` | -| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | -| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | -| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | +| Type | Description | `url` format | +| -------- | --------------------------------------------------- | ------------------------------------------ | +| `local` | Local path | `/path/to/local/data` | +| `s3` | Amazon S3 | `s3://mybucket/data` | +| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | +| `gs` | Google Cloud Storage | `gs://mybucket/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/to/data` | +| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | +| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, From f68eb406de63c813baa2f550e302117a2da3bf0b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 01:09:47 -0600 Subject: [PATCH 11/16] cmd: use std. SFTP note from remote add in other SSH URL examples --- content/docs/command-reference/remote/add.md | 15 +++++---------- content/docs/command-reference/remote/modify.md | 8 +++++++- content/docs/user-guide/external-dependencies.md | 7 +++++-- content/docs/user-guide/managing-external-data.md | 7 +++++-- 4 files changed, 22 insertions(+), 15 deletions(-) diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index ff400c33bc..0404c01c67 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -302,21 +302,16 @@ $ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' ### Click for SSH ```dvc -$ dvc remote add -d myremote ssh://user@example.com/absolute/path +$ dvc remote add -d myremote ssh://user@example.com/path/from/sftp/root ``` > See also `dvc remote modify` for a full list of SSH parameters. -⚠️ DVC requires both SSH and SFTP access to work with SSH remote storage. Please -check that you are able to connect both ways to the remote location, with tools -like `ssh` and `sftp` (GNU/Linux). +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Please check that you are able to connect both ways with tools like `ssh` and +`sftp` (GNU/Linux). -> Note that your server's SFTP root might differ from its physical root (`/`). -> (On Linux, see the `ChrootDirectory` setting in `/etc/ssh/sshd_config`.) In -> these cases, the path component in the SSH URL (e.g. `/path/to/dir` above) -> should be specified relative to the SFTP root instead. For example, on some -> Sinology NAS drives, the SFTP root might be in directory `/volume1`, in which -> case you should use path `/path/to/dir` instead of `/volume1/path/to/dir`. +> Note that the server's SFTP root might differ from its physical root (`/`). diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index b9e49eb8d9..b9e212869a 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -469,9 +469,15 @@ more information. ```dvc $ dvc remote modify myremote url \ - ssh://user@example.com:1234/absolute/path + ssh://user@example.com:1234/path/from/sftp/root ``` + ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. + Please check that you are able to connect both ways with tools like `ssh` and + `sftp` (GNU/Linux). + + > Note that your server's SFTP root might differ from its physical root (`/`). + - `user` - username to access the remote. ```dvc diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 2bb56570b8..faccd7c7de 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -81,8 +81,11 @@ $ dvc run -n download_file scp user@example.com:/path/from/sftp/root/to/data data ``` -> ⚠️ Please notice that the SFTP root typically is the system root, but doesn't -> have to be. +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Please check that you are able to connect both ways with tools like `ssh` and +`sftp` (GNU/Linux). + +> Note that your server's SFTP root might differ from its physical root (`/`). ### HDFS diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index cc88ff4781..ef628bcd7d 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -105,8 +105,11 @@ $ dvc run -d data \ scp data user@example.com:/path/from/sftp/root/to/data ``` -> ⚠️ Please notice the SFTP root typically is the system root, but doesn't have -> to be. +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Please check that you are able to connect both ways with tools like `ssh` and +`sftp` (GNU/Linux). + +> Note that your server's SFTP root might differ from its physical root (`/`). ### HDFS From dfbc33e5d535f1ee625563f8d6489d0510d65f21 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 01:12:13 -0600 Subject: [PATCH 12/16] cmd: remove "absolute" HDFS URL path from remote modify --- content/docs/command-reference/remote/modify.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index b9e212869a..b3b0802f3d 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -548,8 +548,7 @@ more information. - `url` - remote location: ```dvc - $ dvc remote modify myremote url \ - hdfs://user@example.com/absolute/path + $ dvc remote modify myremote url hdfs://user@example.com/path/to/dir ``` - `user` - username to access the remote. From 2879de274b0e8531a2b3328ab947392c17521472 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 01:49:58 -0600 Subject: [PATCH 13/16] guide: fix SSH paths in external outputs doc --- content/docs/user-guide/managing-external-data.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index ef628bcd7d..31d8e60a23 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -88,21 +88,19 @@ $ dvc run -d data \ ```dvc # Add SSH remote to be used as cache location for SSH files -$ dvc remote add sshcache \ - ssh://user@example.com/path/from/sftp/root/to/cache +$ dvc remote add sshcache ssh://user@example.com/cache # Tell DVC to use the 'sshcache' remote as SSH cache location $ dvc config cache.ssh sshcache # Add data on SSH directly -$ dvc add --external \ - ssh://user@example.com/path/from/sftp/root/to/mydata +$ dvc add --external ssh://user@example.com/mydata # Create the stage with an external SSH output $ dvc run -d data \ --external \ - -o ssh://user@example.com/path/from/sftp/root/to/data \ - scp data user@example.com:/path/from/sftp/root/to/data + -o ssh://user@example.com/data \ + scp data user@example.com:/data ``` ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. From 990de5961b62aa4c3858615c42cbca6cca23b25c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 01:51:50 -0600 Subject: [PATCH 14/16] ssh: remove 'sftp' from example URLs... per https://github.com/iterative/dvc.org/pull/1649#discussion_r471985512 --- content/docs/command-reference/remote/add.md | 2 +- content/docs/command-reference/remote/modify.md | 2 +- content/docs/user-guide/external-dependencies.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index 0404c01c67..ec1b03c0e3 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -302,7 +302,7 @@ $ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' ### Click for SSH ```dvc -$ dvc remote add -d myremote ssh://user@example.com/path/from/sftp/root +$ dvc remote add -d myremote ssh://user@example.com/path/to/dir ``` > See also `dvc remote modify` for a full list of SSH parameters. diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index b3b0802f3d..b11f63b73e 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -469,7 +469,7 @@ more information. ```dvc $ dvc remote modify myremote url \ - ssh://user@example.com:1234/path/from/sftp/root + ssh://user@example.com:1234/path/to/dir ``` ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index faccd7c7de..d1bfe1520b 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -76,9 +76,9 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d ssh://user@example.com/path/from/sftp/root/to/data \ + -d ssh://user@example.com/path/to/data \ -o data \ - scp user@example.com:/path/from/sftp/root/to/data data + scp user@example.com:/path/to/data data ``` ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. From 0fb4b0db2944dc485372ac5f4dc69835eab00e97 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 02:03:24 -0600 Subject: [PATCH 15/16] guide: roll-back unrelated changes to URLs in external data docs --- .../docs/user-guide/external-dependencies.md | 48 +++++++++---------- .../docs/user-guide/managing-external-data.md | 34 ++++++------- 2 files changed, 41 insertions(+), 41 deletions(-) diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index d1bfe1520b..3241fa0be7 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -45,40 +45,40 @@ stage to your list of stages in dvc.yaml. ```dvc $ dvc run -n download_file - -d s3://mybucket/data \ - -o data \ - aws s3 cp s3://mybucket/data data + -d s3://mybucket/data.txt \ + -o data.txt \ + aws s3 cp s3://mybucket/data.txt data.txt ``` ### Microsoft Azure Blob Storage ```dvc $ dvc run -n download_file - -d azure://my-container-name/data \ - -o data \ + -d azure://my-container-name/data.txt \ + -o data.txt \ az storage copy \ -d data.json \ --source-account-name my-account \ --source-container my-container-name \ - --source-blob data + --source-blob data.txt ``` ### Google Cloud Storage ```dvc $ dvc run -n download_file - -d gs://mybucket/data \ - -o data \ - gsutil cp gs://mybucket/data data + -d gs://mybucket/data.txt \ + -o data.txt \ + gsutil cp gs://mybucket/data.txt data.txt ``` ### SSH ```dvc $ dvc run -n download_file - -d ssh://user@example.com/path/to/data \ - -o data \ - scp user@example.com:/path/to/data data + -d ssh://user@example.com/path/to/data.txt \ + -o data.txt \ + scp user@example.com:/path/to/data.txt data.txt ``` ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. @@ -91,10 +91,10 @@ Please check that you are able to connect both ways with tools like `ssh` and ```dvc $ dvc run -n download_file - -d hdfs://user@example.com/data \ - -o data \ + -d hdfs://user@example.com/data.txt \ + -o data.txt \ hdfs fs -copyToLocal \ - hdfs://user@example.com/data data + hdfs://user@example.com/data.txt data.txt ``` ### HTTP @@ -103,18 +103,18 @@ $ dvc run -n download_file ```dvc $ dvc run -n download_file - -d https://example.com/data \ - -o data \ - wget https://example.com/data -O data + -d https://example.com/data.txt \ + -o data.txt \ + wget https://example.com/data.txt -O data.txt ``` ### Local file system path ```dvc $ dvc run -n download_file - -d /home/shared/data \ - -o data \ - cp /home/shared/data data + -d /home/shared/data.txt \ + -o data.txt \ + cp /home/shared/data.txt data.txt ``` ## Example: DVC remote aliases @@ -130,9 +130,9 @@ For example, for an HTTPs remote/dependency: ```dvc $ dvc remote add example https://example.com $ dvc run -n download_file - -d remote://example/data \ - -o data \ - wget https://example.com/data -O data + -d remote://example/data.txt \ + -o data.txt \ + wget https://example.com/data.txt -O data.txt ``` Please refer to `dvc remote add` for more details like setting up access diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index 31d8e60a23..de3d472303 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -43,7 +43,7 @@ in the same external/remote file system first. ## Examples For the examples, let's take a look at a [stage](/doc/command-reference/run) -that simply moves local file to an external location, producing a `data.dvc` +that simply moves local file to an external location, producing a `data.txt.dvc` DVC-file. ### Amazon S3 @@ -59,10 +59,10 @@ $ dvc config cache.s3 s3cache $ dvc add --external s3://mybucket/mydata # Create the stage with an external S3 output -$ dvc run -d data \ +$ dvc run -d data.txt \ --external \ - -o s3://mybucket/data \ - aws s3 cp data s3://mybucket/data + -o s3://mybucket/data.txt \ + aws s3 cp data.txt s3://mybucket/data.txt ``` ### Google Cloud Storage @@ -78,10 +78,10 @@ $ dvc config cache.gs gscache $ dvc add --external gs://mybucket/mydata # Create the stage with an external GS output -$ dvc run -d data \ +$ dvc run -d data.txt \ --external \ - -o gs://mybucket/data \ - gsutil cp data gs://mybucket/data + -o gs://mybucket/data.txt \ + gsutil cp data.txt gs://mybucket/data.txt ``` ### SSH @@ -97,10 +97,10 @@ $ dvc config cache.ssh sshcache $ dvc add --external ssh://user@example.com/mydata # Create the stage with an external SSH output -$ dvc run -d data \ +$ dvc run -d data.txt \ --external \ - -o ssh://user@example.com/data \ - scp data user@example.com:/data + -o ssh://user@example.com/data.txt \ + scp data.txt user@example.com:/data.txt ``` ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. @@ -123,12 +123,12 @@ $ dvc config cache.hdfs hdfscache $ dvc add --external hdfs://user@example.com/mydata # Create the stage with an external HDFS output -$ dvc run -d data \ +$ dvc run -d data.txt \ --external \ - -o hdfs://user@example.com/data \ + -o hdfs://user@example.com/data.txt \ hdfs fs -copyFromLocal \ - data \ - hdfs://user@example.com/data + data.txt \ + hdfs://user@example.com/data.txt ``` Note that as long as there is a `hdfs://...` URL for your data, DVC can handle @@ -144,8 +144,8 @@ it explicitly. $ dvc add --external /home/shared/mydata # Create the stage with an external location output -$ dvc run -d data \ +$ dvc run -d data.txt \ --external \ - -o /home/shared/data \ - cp data /home/shared/data + -o /home/shared/data.txt \ + cp data.txt /home/shared/data.txt ``` From 34bae5c77c84c624b09f6a0ed3f18ea15f5d20fc Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 18 Aug 2020 02:10:34 -0600 Subject: [PATCH 16/16] guide: roll-back unnecessary new line in x outs page --- content/docs/user-guide/managing-external-data.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index de3d472303..3d18e1a4f1 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -113,8 +113,7 @@ Please check that you are able to connect both ways with tools like `ssh` and ```dvc # Add HDFS remote to be used as cache location for HDFS files -$ dvc remote add hdfscache \ - hdfs://user@example.com/cache +$ dvc remote add hdfscache hdfs://user@example.com/cache # Tell DVC to use the 'hdfscache' remote as HDFS cache location $ dvc config cache.hdfs hdfscache