diff --git a/content/docs/command-reference/get-url.md b/content/docs/command-reference/get-url.md index 564e0963be..5f4b1312ec 100644 --- a/content/docs/command-reference/get-url.md +++ b/content/docs/command-reference/get-url.md @@ -39,7 +39,7 @@ DVC supports several types of (local or) remote locations (protocols): | `local` | Local path | `/path/to/local/data` | | `s3` | Amazon S3 | `s3://mybucket/data` | | `gs` | Google Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com:/path/to/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/to/data` | | `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` | | `http` | HTTP to file\* | `https://example.com/path/to/data.csv` | diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 012147ff1a..fe12923b97 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -62,7 +62,7 @@ DVC supports several types of (local or) remote locations (protocols): | `s3` | Amazon S3 | `s3://mybucket/data` | | `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` | | `gs` | Google Cloud Storage | `gs://mybucket/data` | -| `ssh` | SSH server | `ssh://user@example.com:/path/to/data` | +| `ssh` | SSH server | `ssh://user@example.com/path/to/data` | | `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` | | `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` | | `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` | diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index 2cd4265dc3..ec1b03c0e3 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -307,16 +307,11 @@ $ dvc remote add -d myremote ssh://user@example.com/path/to/dir > See also `dvc remote modify` for a full list of SSH parameters. -⚠️ DVC requires both SSH and SFTP access to work with SSH remote storage. Please -check that you are able to connect both ways to the remote location, with tools -like `ssh` and `sftp` (GNU/Linux). - -> Note that your server's SFTP root might differ from its physical root (`/`). -> (On Linux, see the `ChrootDirectory` setting in `/etc/ssh/sshd_config`.) In -> these cases, the path component in the SSH URL (e.g. `/path/to/dir` above) -> should be specified relative to the SFTP root instead. For example, on some -> Sinology NAS drives, the SFTP root might be in directory `/volume1`, in which -> case you should use path `/path/to/dir` instead of `/volume1/path/to/dir`. +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Please check that you are able to connect both ways with tools like `ssh` and +`sftp` (GNU/Linux). + +> Note that the server's SFTP root might differ from its physical root (`/`). diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index b9e49eb8d9..b11f63b73e 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -469,9 +469,15 @@ more information. ```dvc $ dvc remote modify myremote url \ - ssh://user@example.com:1234/absolute/path + ssh://user@example.com:1234/path/to/dir ``` + ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. + Please check that you are able to connect both ways with tools like `ssh` and + `sftp` (GNU/Linux). + + > Note that your server's SFTP root might differ from its physical root (`/`). + - `user` - username to access the remote. ```dvc @@ -542,8 +548,7 @@ more information. - `url` - remote location: ```dvc - $ dvc remote modify myremote url \ - hdfs://user@example.com/absolute/path + $ dvc remote modify myremote url hdfs://user@example.com/path/to/dir ``` - `user` - username to access the remote. diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index f9ee3eda93..3241fa0be7 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -41,24 +41,6 @@ stage to your list of stages in dvc.yaml. > Note that some of these commands use the `/home/shared` directory, typical in > Linux distributions. -### Local file system path - -```dvc -$ dvc run -n download_file - -d /home/shared/data.txt \ - -o data.txt \ - cp /home/shared/data.txt data.txt -``` - -### SSH - -```dvc -$ dvc run -n download_file - -d ssh://user@example.com:/home/shared/data.txt \ - -o data.txt \ - scp user@example.com:/home/shared/data.txt data.txt -``` - ### Amazon S3 ```dvc @@ -90,15 +72,29 @@ $ dvc run -n download_file gsutil cp gs://mybucket/data.txt data.txt ``` +### SSH + +```dvc +$ dvc run -n download_file + -d ssh://user@example.com/path/to/data.txt \ + -o data.txt \ + scp user@example.com:/path/to/data.txt data.txt +``` + +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Please check that you are able to connect both ways with tools like `ssh` and +`sftp` (GNU/Linux). + +> Note that your server's SFTP root might differ from its physical root (`/`). + ### HDFS ```dvc $ dvc run -n download_file - -d hdfs://user@example.com/home/shared/data.txt \ + -d hdfs://user@example.com/data.txt \ -o data.txt \ hdfs fs -copyToLocal \ - hdfs://user@example.com/home/shared/data.txt \ - data.txt + hdfs://user@example.com/data.txt data.txt ``` ### HTTP @@ -112,6 +108,15 @@ $ dvc run -n download_file wget https://example.com/data.txt -O data.txt ``` +### Local file system path + +```dvc +$ dvc run -n download_file + -d /home/shared/data.txt \ + -o data.txt \ + cp /home/shared/data.txt data.txt +``` + ## Example: DVC remote aliases If instead of a URL you'd like to use an alias that can be managed diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index 4449b620c0..3d18e1a4f1 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -46,41 +46,6 @@ For the examples, let's take a look at a [stage](/doc/command-reference/run) that simply moves local file to an external location, producing a `data.txt.dvc` DVC-file. -### Local file system path - -The default local cache location is `.dvc/cache`, so there is no need to specify -it explicitly. - -```dvc -# Add data on an external location directly -$ dvc add --external /home/shared/mydata - -# Create the stage with an external location output -$ dvc run -d data.txt \ - --external \ - -o /home/shared/data.txt \ - cp data.txt /home/shared/data.txt -``` - -### SSH - -```dvc -# Add SSH remote to be used as cache location for SSH files -$ dvc remote add sshcache ssh://user@example.com:/cache - -# Tell DVC to use the 'sshcache' remote as SSH cache location -$ dvc config cache.ssh sshcache - -# Add data on SSH directly -$ dvc add --external ssh://user@example.com:/mydata - -# Create the stage with an external SSH output -$ dvc run -d data.txt \ - --external \ - -o ssh://user@example.com:/home/shared/data.txt \ - scp data.txt user@example.com:/home/shared/data.txt -``` - ### Amazon S3 ```dvc @@ -119,6 +84,31 @@ $ dvc run -d data.txt \ gsutil cp data.txt gs://mybucket/data.txt ``` +### SSH + +```dvc +# Add SSH remote to be used as cache location for SSH files +$ dvc remote add sshcache ssh://user@example.com/cache + +# Tell DVC to use the 'sshcache' remote as SSH cache location +$ dvc config cache.ssh sshcache + +# Add data on SSH directly +$ dvc add --external ssh://user@example.com/mydata + +# Create the stage with an external SSH output +$ dvc run -d data.txt \ + --external \ + -o ssh://user@example.com/data.txt \ + scp data.txt user@example.com:/data.txt +``` + +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Please check that you are able to connect both ways with tools like `ssh` and +`sftp` (GNU/Linux). + +> Note that your server's SFTP root might differ from its physical root (`/`). + ### HDFS ```dvc @@ -134,11 +124,27 @@ $ dvc add --external hdfs://user@example.com/mydata # Create the stage with an external HDFS output $ dvc run -d data.txt \ --external \ - -o hdfs://user@example.com/home/shared/data.txt \ + -o hdfs://user@example.com/data.txt \ hdfs fs -copyFromLocal \ - data.txt \ - hdfs://user@example.com/home/shared/data.txt + data.txt \ + hdfs://user@example.com/data.txt ``` -Note that as long as there is a `hdfs://...` path for your data, DVC can handle +Note that as long as there is a `hdfs://...` URL for your data, DVC can handle it. So systems like Hadoop, Hive, and HBase are supported! + +### Local file system path + +The default local cache location is `.dvc/cache`, so there is no need to specify +it explicitly. + +```dvc +# Add data on an external location directly +$ dvc add --external /home/shared/mydata + +# Create the stage with an external location output +$ dvc run -d data.txt \ + --external \ + -o /home/shared/data.txt \ + cp data.txt /home/shared/data.txt +```