Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
715b8b9
docs: recommend absolute paths for SSH and HDFS remote connections
jorgeorpinel Aug 1, 2020
d27e07d
cmd: forgot to save changes to get-url.md (see prev commit)
jorgeorpinel Aug 1, 2020
c27303b
Merge branch 'master' into remote/abspath
jorgeorpinel Aug 11, 2020
94926ee
guide: improve absolute paths (and note) for SSH x deps
jorgeorpinel Aug 11, 2020
8c60723
guide: update SSH/HDFS abs paths (and notes) in x outs page
jorgeorpinel Aug 11, 2020
8703362
guide: revert SCP URLs from external data guides
jorgeorpinel Aug 11, 2020
ce32ddf
guide: update note about rel. SSH paths from SFTP root
jorgeorpinel Aug 11, 2020
aa5d6fd
guide: update SSH sample paths again + note + refs
jorgeorpinel Aug 14, 2020
7414ce6
docs: roll back absolute HDFS URLs
jorgeorpinel Aug 14, 2020
710720d
cmd: change SSH example URL in import-url
jorgeorpinel Aug 14, 2020
3be66c1
Merge branch 'master' into remote/abspath
jorgeorpinel Aug 18, 2020
6ec6717
cmd: remove "sftp" mention in SSH URLs of get/import-url
jorgeorpinel Aug 18, 2020
f68eb40
cmd: use std. SFTP note from remote add in other SSH URL examples
jorgeorpinel Aug 18, 2020
dfbc33e
cmd: remove "absolute" HDFS URL path from remote modify
jorgeorpinel Aug 18, 2020
2879de2
guide: fix SSH paths in external outputs doc
jorgeorpinel Aug 18, 2020
990de59
ssh: remove 'sftp' from example URLs...
jorgeorpinel Aug 18, 2020
0fb4b0d
guide: roll-back unrelated changes to URLs in external data docs
jorgeorpinel Aug 18, 2020
34bae5c
guide: roll-back unnecessary new line in x outs page
jorgeorpinel Aug 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/command-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ DVC supports several types of (local or) remote locations (protocols):
| `local` | Local path | `/path/to/local/data` |
| `s3` | Amazon S3 | `s3://mybucket/data` |
| `gs` | Google Storage | `gs://mybucket/data` |
| `ssh` | SSH server | `ssh://user@example.com:/path/to/data` |
| `ssh` | SSH server | `ssh://user@example.com/path/to/data` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the trailing space?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's all formatted by Prettier this way.

| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` |
| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` |

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ DVC supports several types of (local or) remote locations (protocols):
| `s3` | Amazon S3 | `s3://mybucket/data` |
| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` |
| `gs` | Google Cloud Storage | `gs://mybucket/data` |
| `ssh` | SSH server | `ssh://user@example.com:/path/to/data` |
| `ssh` | SSH server | `ssh://user@example.com/path/to/data` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-formatted. To align columns I guess. Looks "prettier" tehehe

| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` |
| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` |
| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` |
Expand Down
15 changes: 5 additions & 10 deletions content/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,16 +307,11 @@ $ dvc remote add -d myremote ssh://user@example.com/path/to/dir

> See also `dvc remote modify` for a full list of SSH parameters.

⚠️ DVC requires both SSH and SFTP access to work with SSH remote storage. Please
check that you are able to connect both ways to the remote location, with tools
like `ssh` and `sftp` (GNU/Linux).

> Note that your server's SFTP root might differ from its physical root (`/`).
> (On Linux, see the `ChrootDirectory` setting in `/etc/ssh/sshd_config`.) In
> these cases, the path component in the SSH URL (e.g. `/path/to/dir` above)
> should be specified relative to the SFTP root instead. For example, on some
> Sinology NAS drives, the SFTP root might be in directory `/volume1`, in which
> case you should use path `/path/to/dir` instead of `/volume1/path/to/dir`.
Comment on lines -315 to -319
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove these details. They seem more like tips unrelated to DVC. Users can research SFTP details on other sites easily, I think. Left a quick note: "Note that the server's SFTP root might differ from its physical root (/)." that should be enough of a tip for the few users running into that issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ounds good! and we can elaborate when we have a separate page per remote I guess

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're planning to do separate pages per remote? I don't recall that decision. It sounds like a good plan except that the remote types are already burried in docs->cmd ref->remote->add/modify->expandable sections so not sure adding yet another level to that tree is a good idea.

Maybe the remote types should all be in the remote ref. index? And the remote concept explanation in Basic Concepts.

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Please check that you are able to connect both ways with tools like `ssh` and
`sftp` (GNU/Linux).

> Note that the server's SFTP root might differ from its physical root (`/`).

</details>

Expand Down
11 changes: 8 additions & 3 deletions content/docs/command-reference/remote/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,9 +469,15 @@ more information.

```dvc
$ dvc remote modify myremote url \
ssh://user@example.com:1234/absolute/path
ssh://user@example.com:1234/path/to/dir
```

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Please check that you are able to connect both ways with tools like `ssh` and
`sftp` (GNU/Linux).

> Note that your server's SFTP root might differ from its physical root (`/`).

- `user` - username to access the remote.

```dvc
Expand Down Expand Up @@ -542,8 +548,7 @@ more information.
- `url` - remote location:

```dvc
$ dvc remote modify myremote url \
hdfs://user@example.com/absolute/path
$ dvc remote modify myremote url hdfs://user@example.com/path/to/dir
```

- `user` - username to access the remote.
Expand Down
47 changes: 26 additions & 21 deletions content/docs/user-guide/external-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,24 +41,6 @@ stage to your list of stages in dvc.yaml.
> Note that some of these commands use the `/home/shared` directory, typical in
> Linux distributions.

### Local file system path

```dvc
$ dvc run -n download_file
-d /home/shared/data.txt \
-o data.txt \
cp /home/shared/data.txt data.txt
```

### SSH

```dvc
$ dvc run -n download_file
-d ssh://user@example.com:/home/shared/data.txt \
-o data.txt \
scp user@example.com:/home/shared/data.txt data.txt
```

Comment thread
jorgeorpinel marked this conversation as resolved.
### Amazon S3

```dvc
Expand Down Expand Up @@ -90,15 +72,29 @@ $ dvc run -n download_file
gsutil cp gs://mybucket/data.txt data.txt
```

### SSH

```dvc
$ dvc run -n download_file
-d ssh://user@example.com/path/to/data.txt \
-o data.txt \
scp user@example.com:/path/to/data.txt data.txt
```

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Please check that you are able to connect both ways with tools like `ssh` and
`sftp` (GNU/Linux).

> Note that your server's SFTP root might differ from its physical root (`/`).

### HDFS

```dvc
$ dvc run -n download_file
-d hdfs://user@example.com/home/shared/data.txt \
-d hdfs://user@example.com/data.txt \
-o data.txt \
hdfs fs -copyToLocal \
hdfs://user@example.com/home/shared/data.txt \
data.txt
hdfs://user@example.com/data.txt data.txt
Comment thread
jorgeorpinel marked this conversation as resolved.
```

### HTTP
Expand All @@ -112,6 +108,15 @@ $ dvc run -n download_file
wget https://example.com/data.txt -O data.txt
```

### Local file system path

```dvc
$ dvc run -n download_file
-d /home/shared/data.txt \
-o data.txt \
cp /home/shared/data.txt data.txt
```

## Example: DVC remote aliases

If instead of a URL you'd like to use an alias that can be managed
Expand Down
84 changes: 45 additions & 39 deletions content/docs/user-guide/managing-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,41 +46,6 @@ For the examples, let's take a look at a [stage](/doc/command-reference/run)
that simply moves local file to an external location, producing a `data.txt.dvc`
DVC-file.

### Local file system path

The default local cache location is `.dvc/cache`, so there is no need to specify
it explicitly.

```dvc
# Add data on an external location directly
$ dvc add --external /home/shared/mydata

# Create the stage with an external location output
$ dvc run -d data.txt \
--external \
-o /home/shared/data.txt \
cp data.txt /home/shared/data.txt
```

### SSH

```dvc
# Add SSH remote to be used as cache location for SSH files
$ dvc remote add sshcache ssh://user@example.com:/cache

# Tell DVC to use the 'sshcache' remote as SSH cache location
$ dvc config cache.ssh sshcache

# Add data on SSH directly
$ dvc add --external ssh://user@example.com:/mydata

# Create the stage with an external SSH output
$ dvc run -d data.txt \
--external \
-o ssh://user@example.com:/home/shared/data.txt \
scp data.txt user@example.com:/home/shared/data.txt
```

### Amazon S3

```dvc
Expand Down Expand Up @@ -119,6 +84,31 @@ $ dvc run -d data.txt \
gsutil cp data.txt gs://mybucket/data.txt
```

### SSH

```dvc
# Add SSH remote to be used as cache location for SSH files
$ dvc remote add sshcache ssh://user@example.com/cache

# Tell DVC to use the 'sshcache' remote as SSH cache location
$ dvc config cache.ssh sshcache

# Add data on SSH directly
$ dvc add --external ssh://user@example.com/mydata

# Create the stage with an external SSH output
$ dvc run -d data.txt \
--external \
-o ssh://user@example.com/data.txt \
scp data.txt user@example.com:/data.txt
```

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Please check that you are able to connect both ways with tools like `ssh` and
`sftp` (GNU/Linux).

> Note that your server's SFTP root might differ from its physical root (`/`).

### HDFS

```dvc
Expand All @@ -134,11 +124,27 @@ $ dvc add --external hdfs://user@example.com/mydata
# Create the stage with an external HDFS output
$ dvc run -d data.txt \
--external \
-o hdfs://user@example.com/home/shared/data.txt \
-o hdfs://user@example.com/data.txt \
hdfs fs -copyFromLocal \
data.txt \
hdfs://user@example.com/home/shared/data.txt
data.txt \
hdfs://user@example.com/data.txt
```

Note that as long as there is a `hdfs://...` path for your data, DVC can handle
Note that as long as there is a `hdfs://...` URL for your data, DVC can handle
it. So systems like Hadoop, Hive, and HBase are supported!

### Local file system path

The default local cache location is `.dvc/cache`, so there is no need to specify
it explicitly.

```dvc
# Add data on an external location directly
$ dvc add --external /home/shared/mydata

# Create the stage with an external location output
$ dvc run -d data.txt \
--external \
-o /home/shared/data.txt \
cp data.txt /home/shared/data.txt
```