Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 35 additions & 26 deletions content/docs/command-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,8 @@ Download a file or directory from a supported URL (for example `s3://`,
usage: dvc get-url [-h] [-q | -v] url [out]

positional arguments:
url Location of the data to download.
See supported URLs below.
out Destination path to put files in
url (See supported URLs in the description.)
out Destination path to put files in.
Comment thread
jorgeorpinel marked this conversation as resolved.
```

## Description
Expand All @@ -34,23 +33,33 @@ directory will be placed inside.

DVC supports several types of (local or) remote locations (protocols):

| Type | Description | `url` format |
| ------- | -------------- | ------------------------------------------ |
| `local` | Local path | `/path/to/local/data` |
| `s3` | Amazon S3 | `s3://mybucket/data` |
| `gs` | Google Storage | `gs://mybucket/data` |
| `ssh` | SSH server | `ssh://user@example.com/path/to/data` |
| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` |
| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` |
| Type | Description | `url` format example |
| -------- | ---------------------------- | ---------------------------------------------------------- |
| `s3` | Amazon S3 | `s3://bucket/data` |
| `azure` | Microsoft Azure Blob Storage | `azure://container/data` |
| `gdrive` | Google Drive | `gdrive://<folder-id>/data` |
| `gs` | Google Cloud Storage | `gs://bucket/data` |
| `ssh` | SSH server | `ssh://user@example.com/path/to/data` |
| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` |
| `http` | HTTP to file\* | `https://example.com/path/to/data.csv` |
| `webdav` | WebDav to file\* | `webdavs://example.com/public.php/webdav/path/to/data.csv` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is public.php is a very common thingy for webdav?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar TBH but apparently yes there's some sort of endpoint. Not sure if it's typically PHP. I took this from https://dvc.org/doc/command-reference/remote/add#supported-storage-types

But let me check, maybe it's quick to figure this out. ⌛

Copy link
Copy Markdown
Contributor Author

@jorgeorpinel jorgeorpinel Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://linuxconfig.org/webdav-server-setup-on-ubuntu-linux and https://docs.microsoft.com/en-us/iis/install/installing-publishing-technologies/installing-and-configuring-webdav-on-iis WebDAV is setup similar to any HTTP server so no special need for PHP here... We should probably review all these in a separate PR though, so merging this for now (will create a ticket).

Copy link
Copy Markdown
Collaborator

@skshetry skshetry Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public.php is a public endpoint for Nextcloud and Owncloud. There's remote.php as a private endpoint. It can be removed from most of the places, but good to keep one as an example, as most of our WebDAV users are either using Owncloud or Nextcloud.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks @skshetry! Will use Ownclowd and Nextcloud examples then. See #1706.

| `local` | Local path | `/path/to/local/data` |
Comment thread
jorgeorpinel marked this conversation as resolved.

> If you installed DVC via `pip` and plan to use cloud services as remote
> storage, you might need to install these optional dependencies: `[s3]`,
> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to
> include them all. The command should look like this: `pip install "dvc[s3]"`.
> (This example installs `boto3` library along with DVC to support S3 storage.)

\* HDFS and HTTP **do not** support downloading entire directories, only single
files.
\* Notes on remote locations:

- HDFS, HTTP, and WebDav **do not** support downloading entire directories, only
single files.

- `remote://myremote/path/to/file` notation just means that a DVC
[remote](/doc/command-reference/remote) `myremote` is defined and when DVC is
running. DVC automatically expands this URL into a regular S3, SSH, GS, etc
URL by appending `/path/to/file` to the `myremote`'s configured base path.

Another way to understand the `dvc get-url` command is as a tool for downloading
data files. On GNU/Linux systems for example, instead of `dvc get-url` with
Expand All @@ -73,19 +82,6 @@ $ wget https://example.com/path/to/data.csv

<details>

### Click and expand for a local example

```dvc
$ dvc get-url /local/path/to/data
```

The above command will copy the `/local/path/to/data` file or directory into
`./dir`.

</details>

<details>

### Click for Amazon S3 example
Comment thread
jorgeorpinel marked this conversation as resolved.

This command will copy an S3 object into the current working directory with the
Expand Down Expand Up @@ -157,3 +153,16 @@ $ dvc get-url https://example.com/path/to/file
```

</details>

### Click and expand for a local example

```dvc
$ dvc get-url /local/path/to/data
```

The above command will copy the `/local/path/to/data` file or directory into
`./dir`.

</details>

<details>
33 changes: 17 additions & 16 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@ usage: dvc import-url [-h] [-q | -v] [--file <filename>] [--no-exec]
url [out]

positional arguments:
url Location of the data to import.
See supported URLs below.
out Destination path to put files in
url (See supported URLs in the description.)
out Destination path to put files in.
```

## Description
Expand Down Expand Up @@ -56,27 +55,29 @@ source.

DVC supports several types of (local or) remote locations (protocols):

| Type | Description | `url` format |
| -------- | --------------------------------------------------- | ------------------------------------------ |
| `local` | Local path | `/path/to/local/data` |
| `s3` | Amazon S3 | `s3://mybucket/data` |
| `azure` | Microsoft Azure Blob Storage | `azure://my-container-name/path/to/data` |
| `gs` | Google Cloud Storage | `gs://mybucket/data` |
| `ssh` | SSH server | `ssh://user@example.com/path/to/data` |
| `hdfs` | HDFS to file (explanation below) | `hdfs://user@example.com/path/to/data.csv` |
| `http` | HTTP to file with _strong ETag_ (explanation below) | `https://example.com/path/to/data.csv` |
| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/data` |
| Type | Description | `url` format example |
| -------- | --------------------------------- | ---------------------------------------------------------- |
| `s3` | Amazon S3 | `s3://bucket/data` |
| `azure` | Microsoft Azure Blob Storage | `azure://container/data` |
| `gdrive` | Google Drive | `gdrive://<folder-id>/data` |
| `gs` | Google Cloud Storage | `gs://bucket/data` |
| `ssh` | SSH server | `ssh://user@example.com/path/to/data` |
| `hdfs` | HDFS to file\* | `hdfs://user@example.com/path/to/data.csv` |
| `http` | HTTP to file with _strong ETag_\* | `https://example.com/path/to/data.csv` |
| `webdav` | WebDav to file\* | `webdavs://example.com/public.php/webdav/path/to/data.csv` |
| `local` | Local path | `/path/to/local/data` |
| `remote` | Remote path\* | `remote://remote-name/data` |

> If you installed DVC via `pip` and plan to use cloud services as remote
> storage, you might need to install these optional dependencies: `[s3]`,
> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to
> include them all. The command should look like this: `pip install "dvc[s3]"`.
> (This example installs `boto3` library along with DVC to support S3 storage.)

Specific explanations:
\* Notes on remote locations:

- HDFS and HTTP **do not** support downloading entire directories, only single
files.
- HDFS, HTTP, and WebDav **do not** support downloading entire directories, only
single files.

- In case of HTTP,
[strong ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation)
Expand Down
25 changes: 12 additions & 13 deletions content/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,8 @@ usage: dvc remote add [-h] [--global | --system | --local] [-q | -v]
[-d] [-f] name url

positional arguments:
name Name of the remote
url Remote location.
See full list of supported URLs below.
name Name of the remote.
url (See supported URLs in the examples below.)
```

## Description
Expand Down Expand Up @@ -94,7 +93,7 @@ The following are the types of remote storage (protocols) supported:
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

```dvc
$ dvc remote add -d s3remote url s3://my-bucket/my-key
$ dvc remote add -d s3remote url s3://mybucket/path
```

By default, DVC expects your AWS CLI is already
Expand Down Expand Up @@ -134,7 +133,7 @@ configure the remote's `endpointurl` explicitly:
For example:

```dvc
$ dvc remote add -d myremote s3://my-bucket/path/to/dir
$ dvc remote add -d myremote s3://mybucket/path/to/dir
Comment on lines -137 to +136
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetic: I removed - in URLs in this file and in remote modify, mainly.

$ dvc remote modify myremote endpointurl \
https://object-storage.example.com
```
Expand All @@ -146,7 +145,7 @@ S3 remotes can also be configured entirely via environment variables:
```dvc
$ export AWS_ACCESS_KEY_ID="<my-access-key>"
$ export AWS_SECRET_ACCESS_KEY="<my-secret-key>"
$ dvc remote add -d myremote s3://my-bucket/my/key
$ dvc remote add -d myremote s3://mybucket/my/path
```

For more information about the variables DVC supports, please visit
Expand All @@ -159,7 +158,7 @@ For more information about the variables DVC supports, please visit
### Click for Microsoft Azure Blob Storage

```dvc
$ dvc remote add -d myremote azure://my-container-name/path
$ dvc remote add -d myremote azure://mycontainer/path
$ dvc remote modify --local myremote connection_string \
'my-connection-string'
```
Expand All @@ -173,7 +172,7 @@ variables:

```dvc
$ export AZURE_STORAGE_CONNECTION_STRING='<my-connection-string>'
$ export AZURE_STORAGE_CONTAINER_NAME='my-container-name'
$ export AZURE_STORAGE_CONTAINER_NAME='mycontainer'
$ dvc remote add -d myremote 'azure://'
```

Expand Down Expand Up @@ -410,7 +409,7 @@ region.
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

```dvc
$ dvc remote add -d myremote s3://mybucket/myproject
$ dvc remote add -d myremote s3://mybucket/path
Setting 'myremote' as a default remote.

$ dvc remote modify myremote region us-east-2
Expand All @@ -420,7 +419,7 @@ The <abbr>project</abbr>'s config file (`.dvc/config`) now looks like this:

```ini
['remote "myremote"']
url = s3://mybucket/myproject
url = s3://mybucket/path
region = us-east-2
[core]
remote = myremote
Expand All @@ -430,19 +429,19 @@ The list of remotes should now be:

```dvc
$ dvc remote list
myremote s3://mybucket/myproject
myremote s3://mybucket/path
```

You can overwrite existing remotes using `-f` with `dvc remote add`:

```dvc
$ dvc remote add -f myremote s3://mybucket/mynewproject
$ dvc remote add -f myremote s3://mybucket/another-path
```

List remotes again to view the updated remote:

```dvc
$ dvc remote list

myremote s3://mybucket/mynewproject
myremote s3://mybucket/another-path
```
6 changes: 3 additions & 3 deletions content/docs/command-reference/remote/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ remote = myremote
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

```dvc
$ dvc remote add newremote s3://mybucket/myproject
$ dvc remote add newremote s3://mybucket/path
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed possibly confusing concepts like "project" or "key" to a more generic "path" in URLs (mostly S3 URLs had this problem).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we change this - we might want to use -d ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not in this one under the Customize an additional S3 remote example (meant as a continuation of the previous example where a default local remote is added).

$ dvc remote modify newremote endpointurl https://object-storage.example.com
```

Expand All @@ -115,7 +115,7 @@ url = /path/to/remote
[core]
remote = myremote
['remote "newremote"']
url = s3://mybucket/myproject
url = s3://mybucket/path
endpointurl = https://object-storage.example.com
```

Expand All @@ -124,7 +124,7 @@ endpointurl = https://object-storage.example.com
```dvc
$ dvc remote list
myremote /path/to/remote
newremote s3://mybucket/myproject
newremote s3://mybucket/path
```

## Example: Change the name of a remote
Expand Down
14 changes: 7 additions & 7 deletions content/docs/command-reference/remote/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ The following config options are available for all remote types:
below):

```dvc
$ dvc remote modify s3remote url s3://my-bucket/my/key
$ dvc remote modify s3remote url s3://mybucket/path
```

Or a _local remote_ (a directory in the file system):
Expand Down Expand Up @@ -105,7 +105,7 @@ these settings, you could use the following options.
- `url` - remote location, in the `s3://<bucket>/<key>` format:

```dvc
$ dvc remote modify myremote url s3://my-bucket/my/key
$ dvc remote modify myremote url s3://mybucket/my/path
```

- `region` - change S3 remote region:
Expand Down Expand Up @@ -240,7 +240,7 @@ To communicate with a remote object storage that supports an S3 compatible API
configure the remote's `endpointurl` explicitly:

```dvc
$ dvc remote add -d myremote s3://my-bucket/path/to/dir
$ dvc remote add -d myremote s3://mybucket/path/to/dir
$ dvc remote modify myremote endpointurl \
https://object-storage.example.com
```
Expand All @@ -250,7 +250,7 @@ S3 remotes can also be configured entirely via environment variables:
```dvc
$ export AWS_ACCESS_KEY_ID='<my-access-key>'
$ export AWS_SECRET_ACCESS_KEY='<my-secret-key>'
$ dvc remote add -d myremote s3://my-bucket/my/key
$ dvc remote add -d myremote s3://mybucket/my/path
```

For more information about the variables DVC supports, please visit
Expand All @@ -265,7 +265,7 @@ For more information about the variables DVC supports, please visit
- `url` - remote location, in the `azure://<container>/<object>` format:

```dvc
$ dvc remote modify myremote url azure://my-container-name/path
$ dvc remote modify myremote url azure://mycontainer/path
```

- `connection_string` - connection string:
Expand Down Expand Up @@ -717,7 +717,7 @@ Let's first set up a _default_ S3 remote.
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

```dvc
$ dvc remote add -d myremote s3://mybucket/myproject
$ dvc remote add -d myremote s3://mybucket/path
Setting 'myremote' as a default remote.
```

Expand All @@ -731,7 +731,7 @@ Now the project config file should look like this:

```ini
['remote "myremote"']
url = s3://mybucket/storage
url = s3://mybucket/path
profile = myusername
[core]
remote = myremote
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/remote/remove.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The `name` argument is required.
Add Amazon S3 remote:

```dvc
$ dvc remote add myremote s3://mybucket/myproject
$ dvc remote add myremote s3://mybucket/path
```

Remove it:
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/remote/rename.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ DVC remote, respectively.
Add Amazon S3 remote:

```dvc
$ dvc remote add myremote s3://mybucket/myproject
$ dvc remote add myremote s3://mybucket/path
```

Rename it:
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ what files we have generated but haven't pushed to the remote yet:

```dvc
$ dvc remote list
storage s3://dvc-remote
storage s3://bucket/path
```

And would like to check what files we have generated but haven't pushed to the
Expand Down
4 changes: 2 additions & 2 deletions content/docs/use-cases/sharing-data-and-model-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ to the bucket where the data should be stored to the `dvc remote add` command.
For example:

```dvc
$ dvc remote add -d myremote s3://mybucket/myproject
$ dvc remote add -d myremote s3://mybucket/path
Setting 'myremote' as a default remote.
```

Expand All @@ -43,7 +43,7 @@ remote section for it:

```dvc
['remote "myremote"']
url = s3://mybucket/myproject
url = s3://mybucket/path
[core]
remote = myremote
```
Expand Down
4 changes: 2 additions & 2 deletions content/docs/user-guide/external-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,12 @@ $ dvc run -n download_file

```dvc
$ dvc run -n download_file
-d azure://my-container-name/data.txt \
Comment thread
jorgeorpinel marked this conversation as resolved.
-d azure://mycontainer/data.txt \
-o data.txt \
az storage copy \
-d data.json \
--source-account-name my-account \
--source-container my-container-name \
--source-container mycontainer \
--source-blob data.txt
```

Expand Down