From c853e79359c6702cc38049718c5f1f71eecc3095 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 5 Dec 2019 11:34:04 -0800 Subject: [PATCH 01/18] wrap /api/comments file doc paragraph per https://github.com/iterative/dvc.org/pull/819#pullrequestreview-327795707 --- pages/api/comments.js | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/pages/api/comments.js b/pages/api/comments.js index ba602b8662..14e4aaf1de 100644 --- a/pages/api/comments.js +++ b/pages/api/comments.js @@ -1,12 +1,10 @@ /* - * This API endpoint is used by https://blog.dvc.org - * to get comments count for the post, it gets - * discuss.dvc.org topic url as a param and returns - * comments count or error. + * This API endpoint is used by our blog to get comments count for the post, it + * gets discuss.dvc.org topic url as a param and returns comments count or + * error. * - * It made this way to configure CORS, reduce user's payload - * and to add potential ability to cache comments count - * in the future. + * It made this way to configure CORS, reduce user's payload and to add + * potential ability to cache comments count in the future. */ import Cors from 'micro-cors' From 7a5c344266cafbdeda0e8725c41fb1fccabca280 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 11:58:12 -0800 Subject: [PATCH 02/18] user-guide: improve doc contrib guide instructions for #843 --- static/docs/user-guide/contributing/docs.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/static/docs/user-guide/contributing/docs.md b/static/docs/user-guide/contributing/docs.md index 9abc6be6d8..aaf91507a1 100644 --- a/static/docs/user-guide/contributing/docs.md +++ b/static/docs/user-guide/contributing/docs.md @@ -64,17 +64,22 @@ $ git clone git@github.com:/dvc.org.git $ cd dvc.org ``` -Make sure you have the latest version of [Node.js](https://nodejs.org/en/) and -[Yarn](https://yarnpkg.com/) are installed: +Make sure you have the latest version of [Node.js](https://nodejs.org/en/), and +install [Yarn](https://yarnpkg.com/): ```dvc $ npm install -g yarn ``` -Install the dependencies by running `yarn` and launch the server locally: +Install the project dependencies with Yarn: ```dvc $ yarn +``` + +Launch the server locally with: + +```dvc $ yarn dev ``` From d9ab97f8cb236439e412a2b48362752902aecc0b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 12:46:18 -0800 Subject: [PATCH 03/18] user-guide: change note about GDrive access token per https://github.com/iterative/dvc.org/pull/833#pullrequestreview-328534908 --- static/docs/user-guide/contributing/core.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/static/docs/user-guide/contributing/core.md b/static/docs/user-guide/contributing/core.md index a19f5f7f35..0fdfa99df6 100644 --- a/static/docs/user-guide/contributing/core.md +++ b/static/docs/user-guide/contributing/core.md @@ -265,11 +265,12 @@ $ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountN ### Click for Google Drive testing instructions -❗Do not share Google Drive access token with anyone to avoid unauthorized usage -of your Google Drive. +> Please remember that Google Drive access tokens are personal credentials and +> should not be shared with anyone, otherwise risking unauthorized usage of the +> Google account. -To avoid tests flow interruption by manual login, do authorization once and -backup obtained Google Drive access token which is stored by default under +To avoid tests flow interruption by manual login, perform authorization once and +backup the obtained Google Drive access token, which is stored by default under `.dvc/tmp/gdrive-user-credentials.json`. Restore `gdrive-user-credentials.json` from backup for any new DVC repo setup to avoid manual login. From 63a4ac9c3512e256b4cc4ce0d6cd263024f00bf2 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 15:07:15 -0800 Subject: [PATCH 04/18] remote: use consistent order and terminology for remote types per https://github.com/iterative/dvc.org/pull/833#pullrequestreview-328534913 --- pages/features.js | 7 +- src/Diagram/index.js | 4 +- static/docs/command-reference/config.md | 2 +- static/docs/command-reference/get-url.md | 9 +- static/docs/command-reference/import-url.md | 9 +- static/docs/command-reference/remote/add.md | 190 +++++++++--------- static/docs/command-reference/remote/index.md | 10 +- .../docs/command-reference/remote/modify.md | 66 +++--- static/docs/get-started/configure.md | 22 +- static/docs/install/linux.md | 4 +- static/docs/install/macos.md | 4 +- static/docs/install/windows.md | 4 +- .../docs/understanding-dvc/core-features.md | 4 +- static/docs/understanding-dvc/how-it-works.md | 2 +- .../use-cases/sharing-data-and-model-files.md | 7 +- .../versioning-data-and-model-files.md | 6 +- static/docs/user-guide/contributing/core.md | 91 +++++---- 17 files changed, 221 insertions(+), 220 deletions(-) diff --git a/pages/features.js b/pages/features.js index 12d4d87b44..6e4d8b8117 100644 --- a/pages/features.js +++ b/pages/features.js @@ -53,9 +53,10 @@ export default function FeaturesPage() { Storage agnostic - Use S3, Azure, Google Drive, GCP, SSH, SFTP, Aliyun OSS rsync or - any network-attached storage to store data. The list of supported - protocols is constantly expanding. + Use Amazon S3, Microsoft Azure Blob Storage, Google Drive, Google + Cloud Storage, Aliyun OSS, SSH/SFTP, HDFS, HTTP, network-attached + storage, or rsync to store data. The list of supported remote + storage is constantly expanding. diff --git a/src/Diagram/index.js b/src/Diagram/index.js index d027ae3f0d..7a0ea4ae12 100644 --- a/src/Diagram/index.js +++ b/src/Diagram/index.js @@ -41,8 +41,8 @@ const ColumnOne = () => (

Version control machine learning models, data sets and intermediate - files. DVC connects them with code and uses S3, Azure, Google Drive, - GCP, SSH, Aliyun OSS or to store file contents. + files. DVC connects them with code, and uses cloud storage, SSH, NAS, + etc. to store file contents.

Full code and data provenance help track the complete evolution of every diff --git a/static/docs/command-reference/config.md b/static/docs/command-reference/config.md index afb1415cd6..0bdd86116a 100644 --- a/static/docs/command-reference/config.md +++ b/static/docs/command-reference/config.md @@ -164,7 +164,7 @@ for more details.) - `cache.hdfs` - name of an [HDFS remote to use as external cache](/doc/user-guide/managing-external-data#hdfs). -- `cache.azure` - name of an Azure remote to use as +- `cache.azure` - name of a Microsoft Azure Blob Storage remote to use as [external cache](/doc/user-guide/managing-external-data). ### state diff --git a/static/docs/command-reference/get-url.md b/static/docs/command-reference/get-url.md index 2f6ef0a027..b0f6359562 100644 --- a/static/docs/command-reference/get-url.md +++ b/static/docs/command-reference/get-url.md @@ -45,10 +45,11 @@ DVC supports several types of (local or) remote locations (protocols): | `hdfs` | HDFS | `hdfs://user@example.com/path/to/data.csv` | | `http` | HTTP to file | `https://example.com/path/to/data.csv` | -> Depending on the remote locations type you plan to download data from you -> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all) -> when [installing DVC](/doc/install) with `pip`. +> If you installed DVC via `pip` and plan to use cloud services as remote +> storage, you might need to install these optional dependencies: `[s3]`, +> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to +> include them all. The command should look like this: `pip install "dvc[s3]"`. +> (This example installs `boto3` library along with DVC to support S3 storage.) Another way to understand the `dvc get-url` command is as a tool for downloading data files. diff --git a/static/docs/command-reference/import-url.md b/static/docs/command-reference/import-url.md index 3de23da252..15854a0fd1 100644 --- a/static/docs/command-reference/import-url.md +++ b/static/docs/command-reference/import-url.md @@ -58,10 +58,11 @@ DVC supports several types of (local or) remote locations (protocols): | `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | | `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` | -> Depending on the remote locations type you plan to download data from you -> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all) -> when [installing DVC](/doc/install) with `pip`. +> If you installed DVC via `pip` and plan to use cloud services as remote +> storage, you might need to install these optional dependencies: `[s3]`, +> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to +> include them all. The command should look like this: `pip install "dvc[s3]"`. +> (This example installs `boto3` library along with DVC to support S3 storage.) diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 891d150db0..05f617b7ae 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -23,20 +23,20 @@ positional arguments: ## Description -`name` and `url` are required. `url` specifies a location to store your data. It -can be an SSH, S3 path, Azure, Google Drive path, Google Cloud path, Aliyun OSS, -local directory, etc. (See all the supported remote storage types in the -examples below.) If `url` is a local relative path, it will be resolved relative -to the current working directory but saved **relative to the config file -location** (see LOCAL example below). Whenever possible DVC will create a remote -directory if it doesn't exists yet. It won't create an S3 bucket though and will -rely on default access settings. - -> If you installed DVC via `pip`, depending on the remote storage type you plan -> to use you might need to install optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. -> The command should look like this: `pip install "dvc[s3]"`. This installs -> `boto3` library along with DVC to support Amazon S3 storage. +`name` and `url` are required. `url` specifies a location (path, address, +endpoint) to store your data. It can represent a cloud storage service, an SSH +server, network-attached storage, or even a directory in the local file system. +(See all the supported remote storage types in the examples below.) If `url` is +a relative path, it will be resolved against the current working directory, but +saved **relative to the config file location** (see LOCAL example below). +Whenever possible, DVC will create a remote directory if it doesn't exists yet. +(It won't create an S3 bucket though, and will rely on default access settings.) + +> If you installed DVC via `pip` and plan to use cloud services as remote +> storage, you might need to install these optional dependencies: `[s3]`, +> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to +> include them all. The command should look like this: `pip install "dvc[s3]"`. +> (This example installs `boto3` library along with DVC to support S3 storage.) This command creates a section in the DVC project's [config file](/doc/command-reference/config) and optionally assigns a default @@ -89,46 +89,6 @@ These are the possible remote storage (protocols) DVC can work with:

-### Click for local remote - -A "local remote" is a directory in the machine's file system. - -> While the term may seem contradictory, it doesn't have to be. The "local" part -> refers to the machine where the project is stored, so it can be any directory -> accessible to the same system. The "remote" part refers specifically to the -> project/repository itself. - -Using an absolute path (recommended): - -```dvc -$ dvc remote add myremote /tmp/my-dvc-storage -$ cat .dvc/config - ... - ['remote "myremote"'] - url = /tmp/my-dvc-storage - ... -``` - -> Note that the absolute path `/tmp/my-dvc-storage` is saved as is. - -Using a relative path: - -```dvc -$ dvc remote add myremote ../my-dvc-storage -$ cat .dvc/config - ... - ['remote "myremote"'] - url = ../../my-dvc-storage - ... -``` - -> Note that `../my-dvc-storage` has been resolved relative to the `.dvc/` dir, -> resulting in `../../my-dvc-storage`. - -
- -
- ### Click for Amazon S3 > **Note!** Before adding a new remote be sure to login into AWS services and @@ -196,7 +156,7 @@ For more information about the variables DVC supports, please visit
-### Click for Azure +### Click for Microsoft Azure Blob Storage ```dvc $ dvc remote add myremote azure://my-container-name/path @@ -282,6 +242,61 @@ $ dvc remote add myremote gs://bucket/path
+### Click for Aliyun OSS + +First you need to setup OSS storage on Aliyun Cloud and then use an S3 style URL +for OSS storage and make the endpoint value configurable. An example is shown +below: + +```dvc +$ dvc remote add myremote oss://my-bucket/path +``` + +To set key id, key secret and endpoint you need to use `dvc remote modify`. +Example usage is show below. Make sure to use the `--local` option to avoid +committing your secrets into Git: + +```dvc +$ dvc remote modify myremote --local oss_key_id my-key-id +$ dvc remote modify myremote --local oss_key_secret my-key-secret +$ dvc remote modify myremote oss_endpoint endpoint +``` + +You can also set environment variables and use them later, to set environment +variables use following environment variables: + +```dvc +$ export OSS_ACCESS_KEY_ID="my-key-id" +$ export OSS_ACCESS_KEY_SECRET="my-key-secret" +$ export OSS_ENDPOINT="endpoint" +``` + +#### Test your OSS storage using docker + +Start a container running an OSS emulator. + +```dvc +$ git clone https://github.com/nanaya-tachibana/oss-emulator.git +$ docker image build -t oss:1.0 oss-emulator +$ docker run --detach -p 8880:8880 --name oss-emulator oss:1.0 +``` + +Setup environment variables. + +```dvc +$ export OSS_BUCKET='my-bucket' +$ export OSS_ENDPOINT='localhost:8880' +$ export OSS_ACCESS_KEY_ID='AccessKeyID' +$ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' +``` + +> Uses default key id and key secret when they are not given, which gives read +> access to public read bucket and public bucket. + +
+ +
+ ### Click for SSH ```dvc @@ -289,8 +304,8 @@ $ dvc remote add myremote ssh://user@example.com/path/to/dir ``` > **Note!** DVC requires both SSH and SFTP access to work with SSH remote -> storage. Please check that you are able to connect to the remote location with -> tools like `ssh` and `sftp` (GNU/Linux). +> storage. Please check that you are able to connect both ways to the remote +> location, with tools like `ssh` and `sftp` (GNU/Linux). @@ -336,56 +351,41 @@ $ dvc remote add myremote https://example.com/path/to/dir
-### Click for Aliyun OSS - -First you need to setup OSS storage on Aliyun Cloud and then use an S3 style URL -for OSS storage and make the endpoint value configurable. An example is shown -below: - -```dvc -$ dvc remote add myremote oss://my-bucket/path -``` +### Click for local remote -To set key id, key secret and endpoint you need to use `dvc remote modify`. -Example usage is show below. Make sure to use the `--local` option to avoid -committing your secrets into Git: +A "local remote" is a directory in the machine's file system. -```dvc -$ dvc remote modify myremote --local oss_key_id my-key-id -$ dvc remote modify myremote --local oss_key_secret my-key-secret -$ dvc remote modify myremote oss_endpoint endpoint -``` +> While the term may seem contradictory, it doesn't have to be. The "local" part +> refers to the machine where the project is stored, so it can be any directory +> accessible to the same system. The "remote" part refers specifically to the +> project/repository itself. -You can also set environment variables and use them later, to set environment -variables use following environment variables: +Using an absolute path (recommended): ```dvc -$ export OSS_ACCESS_KEY_ID="my-key-id" -$ export OSS_ACCESS_KEY_SECRET="my-key-secret" -$ export OSS_ENDPOINT="endpoint" +$ dvc remote add myremote /tmp/my-dvc-storage +$ cat .dvc/config + ... + ['remote "myremote"'] + url = /tmp/my-dvc-storage + ... ``` -#### Test your OSS storage using docker - -Start a container running an OSS emulator. - -```dvc -$ git clone https://github.com/nanaya-tachibana/oss-emulator.git -$ docker image build -t oss:1.0 oss-emulator -$ docker run --detach -p 8880:8880 --name oss-emulator oss:1.0 -``` +> Note that the absolute path `/tmp/my-dvc-storage` is saved as is. -Setup environment variables. +Using a relative path: ```dvc -$ export OSS_BUCKET='my-bucket' -$ export OSS_ENDPOINT='localhost:8880' -$ export OSS_ACCESS_KEY_ID='AccessKeyID' -$ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' +$ dvc remote add myremote ../my-dvc-storage +$ cat .dvc/config + ... + ['remote "myremote"'] + url = ../../my-dvc-storage + ... ``` -> Uses default key id and key secret when they are not given, which gives read -> access to public read bucket and public bucket. +> Note that `../my-dvc-storage` has been resolved relative to the `.dvc/` dir, +> resulting in `../../my-dvc-storage`.
diff --git a/static/docs/command-reference/remote/index.md b/static/docs/command-reference/remote/index.md index ec1c504230..e02052d7f2 100644 --- a/static/docs/command-reference/remote/index.md +++ b/static/docs/command-reference/remote/index.md @@ -37,11 +37,11 @@ DVC supports several types of remote storage: local file system, SSH, Amazon S3, Google Cloud Storage, HTTP, HDFS, among others. Refer to `dvc remote add` for more details. -> If you installed DVC via `pip`, depending on the remote storage type you plan -> to use you might need to install optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. -> The command should look like this: `pip install "dvc[s3]"`. This installs -> `boto3` library along with DVC to support S3 storage. +> If you installed DVC via `pip` and plan to use cloud services as remote +> storage, you might need to install these optional dependencies: `[s3]`, +> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to +> include them all. The command should look like this: `pip install "dvc[s3]"`. +> (This example installs `boto3` library along with DVC to support S3 storage.) Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually the `.dvc/cache` directory). This diff --git a/static/docs/command-reference/remote/modify.md b/static/docs/command-reference/remote/modify.md index 094679466a..d88e7480ae 100644 --- a/static/docs/command-reference/remote/modify.md +++ b/static/docs/command-reference/remote/modify.md @@ -27,8 +27,8 @@ positional arguments: ## Description Remote `name` and `option` name are required. Option names are remote type -specific. See below examples and a list of remote storage types: Amazon S3, -Google Cloud, Azure, Google Drive, SSH, ALiyun OSS, among others. +specific. See `dvc remote add` and **Available settings** section below for a +list of remote storage types. This command modifies a `remote` section in the project's [config file](/doc/command-reference/config). Alternatively, `dvc config` or @@ -64,7 +64,7 @@ The following are the types of remote storage (protocols) supported:
-### Click for Amazon S3 available options +### Click for Amazon S3 options By default DVC expects your AWS CLI is already [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html). @@ -132,7 +132,7 @@ these settings, you could use the following options:
-### Click for S3 API compatible storage available options +### Click for S3 API compatible storage options To communicate with a remote object storage that supports an S3 compatible API (e.g. [Minio](https://min.io/), @@ -162,7 +162,7 @@ For more information about the variables DVC supports, please visit
-### Click for Azure available options +### Click for Microsoft Azure Blob Storage options - `url` - remote location URL. @@ -187,7 +187,7 @@ For more information on configuring Azure Storage connection strings, visit
-### Click for Google Drive available options +### Click for Google Drive options - `url` - remote location URL. @@ -211,7 +211,7 @@ For more information on configuring Azure Storage connection strings, visit
-### Click for Google Cloud Storage available options +### Click for Google Cloud Storage options - `projectname` - project name to use. @@ -236,7 +236,31 @@ For more information on configuring Azure Storage connection strings, visit
-### Click for SSH available options +### Click for Aliyun OSS options + +- `oss_key_id` - OSS key id to use to access a remote. + + ```dvc + $ dvc remote modify myremote --local oss_key_id my-key-id + ``` + +- `oss_key_secret` - OSS secret key for authorizing access into a remote. + + ```dvc + $ dvc remote modify myremote --local oss_key_secret my-key-secret + ``` + +- `oss_endpoint endpoint` - OSS endpoint values for accessing remote container. + + ```dvc + $ dvc remote modify myremote oss_endpoint endpoint + ``` + +
+ +
+ +### Click for SSH options - `url` - remote location URL. @@ -304,7 +328,7 @@ For more information on configuring Azure Storage connection strings, visit
-### Click for HDFS available options +### Click for HDFS options - `user` - username to use to access a remote. @@ -314,30 +338,6 @@ For more information on configuring Azure Storage connection strings, visit
-
- -### Click for Aliyun OSS available options - -- `oss_key_id` - OSS key id to use to access a remote. - - ```dvc - $ dvc remote modify myremote --local oss_key_id my-key-id - ``` - -- `oss_key_secret` - OSS secret key for authorizing access into a remote. - - ```dvc - $ dvc remote modify myremote --local oss_key_secret my-key-secret - ``` - -- `oss_endpoint endpoint` - OSS endpoint values for accessing remote container. - - ```dvc - $ dvc remote modify myremote oss_endpoint endpoint - ``` - -
- ## Example: Customize an S3 remote Let's first set up a _default_ S3 remote: diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 1d84e22092..09de1420cd 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -31,23 +31,23 @@ $ git commit .dvc/config -m "Configure local remote" > to use DVC. For most [use cases](/doc/use-cases), other "more remote" types of > remotes will be required. -Adding a remote should be specified by both its type (protocol) and its path. -DVC currently supports seven types of remotes: +[Adding a remote](/doc/command-reference/remote/add) should be specified by both +its type (protocol) and its path. DVC currently supports these types of remotes: -- `local`: Local Directory - `s3`: Amazon Simple Storage Service -- `gs`: Google Cloud Storage -- `azure`: Azure Blob Storage +- `azure`: Microsoft Azure Blob Storage - `gdrive` : Google Drive -- `ssh`: Secure Shell +- `gs`: Google Cloud Storage +- `ssh`: Secure Shell (requires SFTP) - `hdfs`: Hadoop Distributed File System - `http`: HTTP and HTTPS protocols +- `local`: Directory in the local file system -> If you installed DVC via `pip`, depending on the remote type you plan to use -> you might need to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`, -> `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. The -> command should look like this: `pip install "dvc[s3]"`. This installs `boto3` -> library along with DVC to support Amazon S3 storage. +> If you installed DVC via `pip` and plan to use cloud services as remote +> storage, you might need to install these optional dependencies: `[s3]`, +> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to +> include them all. The command should look like this: `pip install "dvc[s3]"`. +> (This example installs `boto3` library along with DVC to support S3 storage.) For example, to setup an S3 remote we would use something like this (make sure that `mybucket` exists): diff --git a/static/docs/install/linux.md b/static/docs/install/linux.md index 5d779f03ba..297389845b 100644 --- a/static/docs/install/linux.md +++ b/static/docs/install/linux.md @@ -13,8 +13,8 @@ $ pip install dvc ``` Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`, -`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all. +plan to use, you might need to install optional dependencies: `[s3]`, `[azure]`, +`[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Use `[all]` to include them all.
diff --git a/static/docs/install/macos.md b/static/docs/install/macos.md index 428a26816a..c5ef6274c0 100644 --- a/static/docs/install/macos.md +++ b/static/docs/install/macos.md @@ -36,8 +36,8 @@ $ pip install dvc ``` Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`, -`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all. +plan to use, you might need to install optional dependencies: `[s3]`, `[azure]`, +`[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Use `[all]` to include them all.
diff --git a/static/docs/install/windows.md b/static/docs/install/windows.md index 47fffd24a9..95a799edb4 100644 --- a/static/docs/install/windows.md +++ b/static/docs/install/windows.md @@ -37,8 +37,8 @@ $ pip install dvc ``` Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`, -`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all. +plan to use, you might need to install optional dependencies: `[s3]`, `[azure]`, +`[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Use `[all]` to include them all.
diff --git a/static/docs/understanding-dvc/core-features.md b/static/docs/understanding-dvc/core-features.md index 8b0fa2be09..4e3abcdc7c 100644 --- a/static/docs/understanding-dvc/core-features.md +++ b/static/docs/understanding-dvc/core-features.md @@ -15,5 +15,5 @@ - It's **Open-source** and **Self-serve**: DVC is free and doesn't require any additional services. -- DVC supports cloud storage (Amazon S3, Azure Blob Storage, Google Drive, and - Google Cloud Storage) for **data sources and pre-trained model sharing**. +- DVC supports cloud storage (Amazon S3, Microsoft Azure Blob Storage, Google + Cloud Storage, etc.) for **data sources and pre-trained model sharing**. diff --git a/static/docs/understanding-dvc/how-it-works.md b/static/docs/understanding-dvc/how-it-works.md index 32a525e763..2701aff59f 100644 --- a/static/docs/understanding-dvc/how-it-works.md +++ b/static/docs/understanding-dvc/how-it-works.md @@ -73,7 +73,7 @@ ``` - The cache of a DVC project can be shared with colleagues through Amazon S3, - Azure Blob Storage, Google Drive, and Google Cloud Storage, among others: + Microsoft Azure Blob Storage, Google Cloud Storage, among others: ```dvc $ git push diff --git a/static/docs/use-cases/sharing-data-and-model-files.md b/static/docs/use-cases/sharing-data-and-model-files.md index c351fc3519..ce16ceadf8 100644 --- a/static/docs/use-cases/sharing-data-and-model-files.md +++ b/static/docs/use-cases/sharing-data-and-model-files.md @@ -5,10 +5,9 @@ easy to consistently get all your data files and directories into any machine, along with matching source code. All you need to do is to setup [remote storage](/doc/command-reference/remote) for your DVC project, and push the data there, so others can reach it. Currently DVC -supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, Google -Drive, SSH, HDFS, and other remote locations, and the list is constantly -growing. (For a complete list and configuration instructions, take a look at the -examples in `dvc remote add`.) +supports Amazon S3, Microsoft Azure Blob Storage, Google Drive, Google Cloud +Storage, SSH, HDFS, and other remote locations. The list is constantly growing. +(For a complete list and configuration instructions, refer to `dvc remote add`.) ![](/static/img/model-sharing-digram.png) diff --git a/static/docs/use-cases/versioning-data-and-model-files.md b/static/docs/use-cases/versioning-data-and-model-files.md index 24e5be449e..00fcee8a36 100644 --- a/static/docs/use-cases/versioning-data-and-model-files.md +++ b/static/docs/use-cases/versioning-data-and-model-files.md @@ -19,9 +19,9 @@ In this basic scenario, DVC is a better replacement for `git-lfs` (see [Related Technologies](/doc/understanding-dvc/related-technologies)) and for ad-hoc scripts on top of Amazon S3 (or any other cloud) used to manage ML data artifacts like raw data, models, etc. Unlike `git-lfs`, DVC -doesn't require installing a dedicated server; It can be used on-premises (NAS, -SSH, for example) or with any major cloud provider (S3, Google Cloud, Azure, -Google Drive). +doesn't require installing a dedicated server; It can be used on-premises (e.g. +SSH, NAS) or with any major cloud storage provider (Amazon S3, Microsoft Azure +Blob Storage, Google Drive, Google Cloud Storage, etc). Let's say you already have a Git repository that uses a bunch of images stored in the `images/` directory and has a `model.pkl` file – a model file deployed to diff --git a/static/docs/user-guide/contributing/core.md b/static/docs/user-guide/contributing/core.md index 0fdfa99df6..22a748c4de 100644 --- a/static/docs/user-guide/contributing/core.md +++ b/static/docs/user-guide/contributing/core.md @@ -153,10 +153,9 @@ Install requirements for whatever remotes you are going to test: ```dvc $ pip install -e ".[s3]" -$ pip install -e ".[gs]" $ pip install -e ".[azure]" $ pip install -e ".[gdrive]" -$ pip install -e ".[ssh]" +$ pip install -e ".[gs]" # or $ pip install -e ".[all]" ``` @@ -182,7 +181,7 @@ manipulations below.
-### Click for S3 testing instructions +### Click for Amazon S3 instructions Install [aws cli](https://docs.aws.amazon.com/en_us/cli/latest/userguide/cli-chap-install.html) @@ -201,47 +200,7 @@ $ export DVC_TEST_AWS_REPO_BUCKET="...TEST-S3-BUCKET..."
-### Click for Google Cloud Storage testing instructions - -Go through the [quick start](https://cloud.google.com/sdk/docs/quickstarts) for -your OS. After that you should have `gcloud` command line tool available and -authenticated with your google account. - -You then need to create a bucket, a service account and get its credentials. You -can do this via web UI or terminal. Then you need to put your keys to -`scripts/ci/gcp-creds.json` and add these to your env vars: - -```dvc -$ export GOOGLE_APPLICATION_CREDENTIALS=".gcp-creds.json" -$ export GCP_CREDS="yes" -$ export DVC_TEST_GCP_REPO_BUCKET="dvc-test-xyz" -``` - -Here are some command examples to do this: - -```dvc -# This name needs to be globally unique -$ export GCP_NAME="dvc-test-xyz" -$ gcloud projects create $GCP_NAME -$ gcloud iam service-accounts create $GCP_NAME --project=$GCP_NAME -$ gcloud iam service-accounts keys create \ - scripts/ci/gcp-creds.json \ - --iam-account=$GCP_NAME@$GCP_NAME.iam.gserviceaccount.com - -$ gcloud auth activate-service-account \ - --key-file=scripts/ci/gcp-creds.json -$ gcloud config set project $GCP_NAME -$ gsutil mb gs://$GCP_NAME/ -``` - -I used the same name for project, service account and bucket for simplicity. You -may use different names. - -
- -
- -### Click for Azure testing instructions +### Click for Microsoft Azure Blob Storage instructions Install [Node.js](https://nodejs.org/en/download/) and then install and run Azurite: @@ -263,7 +222,7 @@ $ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountN
-### Click for Google Drive testing instructions +### Click for Google Drive instructions > Please remember that Google Drive access tokens are personal credentials and > should not be shared with anyone, otherwise risking unauthorized usage of the @@ -284,7 +243,47 @@ $ export GDRIVE_USER_CREDENTIALS_DATA='CONTENT_of_gdrive-user-credentials.json'
-### Click for HDFS testing instructions +### Click for Google Cloud Storage instructions + +Go through the [quick start](https://cloud.google.com/sdk/docs/quickstarts) for +your OS. After that you should have `gcloud` command line tool available and +authenticated with your google account. + +You then need to create a bucket, a service account and get its credentials. You +can do this via web UI or terminal. Then you need to put your keys to +`scripts/ci/gcp-creds.json` and add these to your env vars: + +```dvc +$ export GOOGLE_APPLICATION_CREDENTIALS=".gcp-creds.json" +$ export GCP_CREDS="yes" +$ export DVC_TEST_GCP_REPO_BUCKET="dvc-test-xyz" +``` + +Here are some command examples to do this: + +```dvc +# This name needs to be globally unique +$ export GCP_NAME="dvc-test-xyz" +$ gcloud projects create $GCP_NAME +$ gcloud iam service-accounts create $GCP_NAME --project=$GCP_NAME +$ gcloud iam service-accounts keys create \ + scripts/ci/gcp-creds.json \ + --iam-account=$GCP_NAME@$GCP_NAME.iam.gserviceaccount.com + +$ gcloud auth activate-service-account \ + --key-file=scripts/ci/gcp-creds.json +$ gcloud config set project $GCP_NAME +$ gsutil mb gs://$GCP_NAME/ +``` + +I used the same name for project, service account and bucket for simplicity. You +may use different names. + +
+ +
+ +### Click for HDFS instructions Tests currently only work on Linux. First you need to set up passwordless ssh access to localhost: From b7992487c3053acff4dd396c6168c5c3784de1d2 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 15:13:18 -0800 Subject: [PATCH 05/18] term: improve note about "local remote" --- static/docs/command-reference/remote/add.md | 2 +- static/docs/command-reference/remote/index.md | 2 +- static/docs/command-reference/remote/list.md | 2 +- static/docs/get-started/configure.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 05f617b7ae..0a64331ef6 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -358,7 +358,7 @@ A "local remote" is a directory in the machine's file system. > While the term may seem contradictory, it doesn't have to be. The "local" part > refers to the machine where the project is stored, so it can be any directory > accessible to the same system. The "remote" part refers specifically to the -> project/repository itself. +> project/repository itself. Read "local, but external" storage. Using an absolute path (recommended): diff --git a/static/docs/command-reference/remote/index.md b/static/docs/command-reference/remote/index.md index e02052d7f2..e3f3b24d7e 100644 --- a/static/docs/command-reference/remote/index.md +++ b/static/docs/command-reference/remote/index.md @@ -76,7 +76,7 @@ For the typical process to share the project via remote, see While the term may seem contradictory, it doesn't have to be. The "local" part refers to the machine where the project is stored, so it can be any directory accessible to the same system. The "remote" part refers specifically to the -project/repository itself. +project/repository itself. Read "local, but external" storage.
diff --git a/static/docs/command-reference/remote/list.md b/static/docs/command-reference/remote/list.md index 37b4a4081e..67e9a6fd46 100644 --- a/static/docs/command-reference/remote/list.md +++ b/static/docs/command-reference/remote/list.md @@ -47,7 +47,7 @@ Let's for simplicity add a default local remote: While the term may seem contradictory, it doesn't have to be. The "local" part refers to the machine where the project is stored, so it can be any directory accessible to the same system. The "remote" part refers specifically to the -project/repository itself. +project/repository itself. Read "local, but external" storage.
diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 09de1420cd..99e8ca9279 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -18,7 +18,7 @@ For simplicity, let's setup a local remote: While the term may seem contradictory, it doesn't have to be. The "local" part refers to the machine where the project is stored, so it can be any directory accessible to the same system. The "remote" part refers specifically to the -project/repository itself. +project/repository itself. Read "local, but external" storage.
From 884968d9244de2d59503623118b171ee971cbd81 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 15:25:22 -0800 Subject: [PATCH 06/18] cmd ref: link to settings header in remote modify text per https://github.com/iterative/dvc.org/pull/846#pullrequestreview-329440161 --- static/docs/command-reference/remote/modify.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/static/docs/command-reference/remote/modify.md b/static/docs/command-reference/remote/modify.md index d88e7480ae..0a70dbc21e 100644 --- a/static/docs/command-reference/remote/modify.md +++ b/static/docs/command-reference/remote/modify.md @@ -27,7 +27,8 @@ positional arguments: ## Description Remote `name` and `option` name are required. Option names are remote type -specific. See `dvc remote add` and **Available settings** section below for a +specific. See `dvc remote add` and +[Available settings](#available-settings-per-storage-type) section below for a list of remote storage types. This command modifies a `remote` section in the project's From a9a9bda8843767805ffaf2be630365d9bae7023d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 17:24:42 -0800 Subject: [PATCH 07/18] use-cases: address pending feedback from iterative/dvc.org/pull/821 Specifically https://github.com/iterative/dvc.org/pull/821#pullrequestreview-324417528 --- static/docs/command-reference/get.md | 67 +++++++++++++------------- static/docs/use-cases/data-registry.md | 2 +- 2 files changed, 34 insertions(+), 35 deletions(-) diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index 8293686e66..fbe5aea3ab 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -1,10 +1,10 @@ # get -Obtain a file or directory from any DVC project or Git repository +Download a file or directory from any DVC project or Git repository (e.g. hosted on GitHub) into the current working directory. -> Unlike `dvc import`, this command does not track the obtained files (does not -> create a DVC-file). +> Unlike `dvc import`, this command does not track the downloaded files (does +> not create a DVC-file). ## Synopsis @@ -15,14 +15,13 @@ Download/copy files or directories from DVC repository. Documentation: positional arguments: - url URL of Git repository with DVC project to download - from. - path Path to a file or directory within a DVC repository. + url URL of Git repository with DVC project to download from. + path Path to a file or directory within a DVC repository. ``` ## Description -Provides an easy way to obtain files or directories tracked in any DVC +Provides an easy way to download files or directories tracked in any DVC repository, both by Git (e.g. source code) and DVC (e.g. datasets, ML models). The file or directory in path is copied to the current working directory. (For remote URLs, it works like downloading with wget, but supporting @@ -36,12 +35,15 @@ external project. Both HTTP and SSH protocols are supported for online repositories (e.g. `[user@]server:project.git`). `url` can also be a local file system path to an "offline" repository. -The `path` argument of this command is used to specify the location of the file -or directory within the source project. If the file is a -[DVC-file](/doc/user-guide/dvc-file-format) the source project must have a -default [DVC remote](/doc/command-reference/remote) configured. +The `path` argument of this command is used to specify the location, within the +source repository at `url`, of the target(s) to be downloaded. It can point to +any file or directory in the source project, including all files tracked by Git. +Note that data tracked by DVC should be specified in one of the +[DVC-files](/doc/user-guide/dvc-file-format) of the source repository. (In this +case, a default [DVC remote](/doc/command-reference/remote) needs to be +configured in the project, containing the actual data.) -> See `dvc get-url` to obtain data from other supported URLs. +> See `dvc get-url` to download data from other supported URLs. After running this command successfully, the data found in the `url` `path` is created in the current working directory, with its original file name. @@ -49,7 +51,7 @@ created in the current working directory, with its original file name. ## Options - `-o`, `--out` - specify a path (directory and/or file name) to the desired - location to place the obtained file in. The default value (when this option + location to place the download file in. The default value (when this option isn't used) is the current working directory (`.`) and original file name. If an existing directory is specified, then the output will be placed inside of it. @@ -57,7 +59,7 @@ created in the current working directory, with its original file name. - `--rev` - specific [Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References) (such as a branch name, a tag, or a commit hash) of the DVC repository to - obtain the file from. The tip of the default branch is used by default when + download the file from. The tip of the default branch is used by default when this option is not specified. - `-h`, `--help` - prints the usage/help message, and exit. @@ -67,18 +69,14 @@ created in the current working directory, with its original file name. - `-v`, `--verbose` - displays detailed tracing information. -## Example: Retrieve a model from a DVC remote +## Example: Get a DVC-tracked model file > Note that `dvc get` can be used from anywhere in the file system, as long as > DVC is [installed](/doc/install). -We can use `dvc get` to obtain the resulting model file from our +We can use `dvc get` to download the resulting model file from our [get started example repo](https://github.com/iterative/example-get-started), a -DVC project external to the current working directory. The desired -output file would be located in the root of the external project -(if the -[`train.dvc` stage](https://github.com/iterative/example-get-started/blob/master/train.dvc) -was reproduced) and named `model.pkl`. +DVC project hosted on Github: ```dvc $ dvc get https://github.com/iterative/example-get-started model.pkl @@ -96,18 +94,18 @@ is found, that specifies `model.pkl` in its outputs (`outs`). DVC then its [config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)). -> A recommended use for obtaining binary files from DVC repositories, as done in -> this example, is to place a ML model inside a wrapper application that serves -> as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) pipeline -> or as an HTTP/RESTful API (web service) that provides predictions upon -> request. This can be automated leveraging DVC with +> A recommended use for downloading binary files from DVC repositories, as done +> in this example, is to place a ML model inside a wrapper application that +> serves as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) +> pipeline or as an HTTP/RESTful API (web service) that provides predictions +> upon request. This can be automated leveraging DVC with > [CI/CD](https://en.wikipedia.org/wiki/CI/CD) tools. The same example applies to raw or intermediate data artifacts as -well, of course, for cases where we want to obtain those files or directories +well, of course, for cases where we want to download those files or directories and perform some analysis on them. -## Examples: Retrieve a file from a git repository +## Examples: Get a Git-tracked model file We can also use `dvc get` to retrieve any file or directory that exists in a git repository. @@ -121,11 +119,12 @@ install.sh ## Example: Compare different versions of data or model `dvc get` has the `--rev` option, to specify which version of the repository to -obtain a data artifact from. It also has the `--out` option to -specify the target path. Combining these two options allows us to do something -we can't achieve with the regular `git checkout` + `dvc checkout` process – see -for example the [Get Older Data Version](/doc/get-started/older-versions) -chapter of our _Get Started_ section. +download a data artifact from. It also has the `--out` option to +specify the location to place the artifact within the workspace. Combining these +two options allows us to do something we can't achieve with the regular +`git checkout` + `dvc checkout` process – see for example the +[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get +Started_ section. Let's use the [get started example repo](https://github.com/iterative/example-get-started) @@ -159,7 +158,7 @@ get the most recent one, we use a similar command, but with `-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev` (since it's the latest version anyway). In fact, in this case using `dvc pull` with the corresponding [DVC-files](/doc/user-guide/dvc-file-format) should -suffice, obtaining the file as just `model.pkl`. We can then rename it to make +suffice, downloading the file as just `model.pkl`. We can then rename it to make its version explicit: ```dvc diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index a5eead5b21..412e7ea32c 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -85,7 +85,7 @@ use-cases/cats-dogs └── dogs [400 image files] ``` -In a local DVC project, we could have obtained this dataset at this point with +In a local DVC project, we could have downloaded this dataset at this point with the following command: ```dvc From 735cae1c1ef8c1a26a6420f8a2e7153ae61ebc9b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 17:34:52 -0800 Subject: [PATCH 08/18] cmd ref: clarify around term "download" in get and import Fix #825 --- static/docs/command-reference/get.md | 4 +++- static/docs/command-reference/import.md | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index fbe5aea3ab..e9f1c7528c 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -33,7 +33,9 @@ single-purpose command that can be used out of the box after installing DVC. The `url` argument specifies the address of the Git repository containing the external project. Both HTTP and SSH protocols are supported for online repositories (e.g. `[user@]server:project.git`). `url` can also be a -local file system path to an "offline" repository. +local file system path to an "offline" repository (in this case instead of +downloading, DVC may copy the target data from the external source project or +it's cache). The `path` argument of this command is used to specify the location, within the source repository at `url`, of the target(s) to be downloaded. It can point to diff --git a/static/docs/command-reference/import.md b/static/docs/command-reference/import.md index c7260ccc40..dd7980efef 100644 --- a/static/docs/command-reference/import.md +++ b/static/docs/command-reference/import.md @@ -30,7 +30,9 @@ the data source changes. (See `dvc update`.) The `url` argument specifies the address of the Git repository containing the source project. Both HTTP and SSH protocols are supported for online repositories (e.g. `[user@]server:project.git`). `url` can also be a -local file system path to an "offline" repository. +local file system path to an "offline" repository (in this case instead of +downloading, DVC may copy the target data from the external source project or +it's cache). The `path` argument of this command is used to specify the location of the data to be downloaded within the source project. It should point to a data file or From 3f1524bc372d9c59b167751a438afd1d70ff69ef Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 9 Dec 2019 17:37:36 -0800 Subject: [PATCH 09/18] SEO: expand list of remotes in main landing page meta info per https://github.com/iterative/dvc.org/pull/846#pullrequestreview-329431223 --- src/Diagram/index.js | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/Diagram/index.js b/src/Diagram/index.js index 7a0ea4ae12..668220adb4 100644 --- a/src/Diagram/index.js +++ b/src/Diagram/index.js @@ -41,8 +41,9 @@ const ColumnOne = () => (

Version control machine learning models, data sets and intermediate - files. DVC connects them with code, and uses cloud storage, SSH, NAS, - etc. to store file contents. + files. DVC connects them with code, and uses Amazon S3, Microsoft Azure + Blob Storage, Google Drive, Google Cloud Storage, Aliyun OSS, SSH/SFTP, + HDFS, HTTP, network-attached storage, or rsync to store file contents.

Full code and data provenance help track the complete evolution of every From c68f2fae03f264109fcdef1330b882841b36ae28 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 10:49:33 -0800 Subject: [PATCH 10/18] term: remove some bold notes in quotes, add some emojis for #848 and #855 --- static/docs/command-reference/config.md | 22 +++++++-------- static/docs/command-reference/remote/add.md | 30 +++++++++------------ static/docs/user-guide/contributing/core.md | 6 ++--- 3 files changed, 26 insertions(+), 32 deletions(-) diff --git a/static/docs/command-reference/config.md b/static/docs/command-reference/config.md index 0bdd86116a..9241d5068b 100644 --- a/static/docs/command-reference/config.md +++ b/static/docs/command-reference/config.md @@ -124,10 +124,10 @@ for more details.) effective of those two. DVC avoids `symlink` and `hardlink` types by default to protect user from accidental cache and repository corruption. - > **Note!** If you manually set `cache.type` to `hardlink` or `symlink`, **you - > will corrupt the cache** if you modify tracked data files in the workspace. - > See the `cache.protected` config option above and corresponding - > `dvc unprotect` command to modify files safely. + **Note** ⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you + will corrupt the cache** if you modify tracked data files in the workspace. + See the `cache.protected` config option above and corresponding + `dvc unprotect` command to modify files safely. There are pros and cons to different link types. Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) @@ -185,20 +185,18 @@ more about the state file (database) that is used for optimization. so that when it needs to cleanup the database it could sort them by the timestamp and remove the oldest ones. Default quota is set to 50(percent). -## Example: Core config options - -Set the `dvc` log level to `debug`: +## Example: Set the debug level ```dvc $ dvc config core.loglevel debug ``` -Add an S3 remote and set it as the project default: +## Example: Add an S3 remote + +> 💡 Before adding an S3 remote, be sure to +> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html). -> **Note!** Before adding a new remote be sure to login into AWS services and -> follow instructions at -> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) -> to create your bucket. +This also sets the remote as the project default: ```dvc $ dvc remote add myremote s3://bucket/path diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 0a64331ef6..9d4b17155d 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -91,10 +91,8 @@ These are the possible remote storage (protocols) DVC can work with: ### Click for Amazon S3 -> **Note!** Before adding a new remote be sure to login into AWS services and -> follow instructions at -> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) -> to create your bucket. +> 💡 Before adding an S3 remote, be sure to +> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html). ```dvc $ dvc remote add myremote s3://bucket/path @@ -303,11 +301,9 @@ $ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' $ dvc remote add myremote ssh://user@example.com/path/to/dir ``` -> **Note!** DVC requires both SSH and SFTP access to work with SSH remote -> storage. Please check that you are able to connect both ways to the remote -> location, with tools like `ssh` and `sftp` (GNU/Linux). - - +**Note** ⚠️ DVC requires both SSH and SFTP access to work with SSH remote +storage. Please check that you are able to connect both ways to the remote +location, with tools like `ssh` and `sftp` (GNU/Linux). > Note that your server's SFTP root might differ from its physical root (`/`). > (On Linux, see the `ChrootDirectory` config option in `/etc/ssh/sshd_config`.) @@ -326,8 +322,8 @@ $ dvc remote add myremote ssh://user@example.com/path/to/dir $ dvc remote add myremote hdfs://user@example.com/path/to/dir ``` -> **Note!** If you are seeing `Unable to load libjvm` error on ubuntu with -> openjdk-8, try setting JAVA_HOME env variable. This issue is solved in the +> If you are seeing an `Unable to load libjvm` error on Ubuntu with openjdk-8, +> try setting the `JAVA_HOME` environment variable. This issue is solved in the > [upstream version of pyarrow](https://github.com/apache/arrow/pull/4907) and > the fix will be included into the next pyarrow release. @@ -337,16 +333,16 @@ $ dvc remote add myremote hdfs://user@example.com/path/to/dir ### Click for HTTP -> **Note!** Currently HTTP remotes only support downloads operations: -> -> - `pull` and `fetch` -> - `import-url` and `get-url` -> - As an [external dependency](/doc/user-guide/external-dependencies) - ```dvc $ dvc remote add myremote https://example.com/path/to/dir ``` +**Note** ⚠️ HTTP remotes only support downloads operations: + +- `pull` and `fetch` +- `import-url` and `get-url` +- As an [external dependency](/doc/user-guide/external-dependencies) +

diff --git a/static/docs/user-guide/contributing/core.md b/static/docs/user-guide/contributing/core.md index 22a748c4de..885b81ce5a 100644 --- a/static/docs/user-guide/contributing/core.md +++ b/static/docs/user-guide/contributing/core.md @@ -224,9 +224,9 @@ $ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountN ### Click for Google Drive instructions -> Please remember that Google Drive access tokens are personal credentials and -> should not be shared with anyone, otherwise risking unauthorized usage of the -> Google account. +> 💡 Please remember that Google Drive access tokens are personal credentials +> and should not be shared with anyone, otherwise risking unauthorized usage of +> the Google account. To avoid tests flow interruption by manual login, perform authorization once and backup the obtained Google Drive access token, which is stored by default under From 15cf7c5f2ccc1f8ff0b9b3d76c2edbad4a67cc91 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 10:55:03 -0800 Subject: [PATCH 11/18] term: more note and emojis reviews --- static/docs/command-reference/remote/index.md | 2 +- static/docs/command-reference/version.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/static/docs/command-reference/remote/index.md b/static/docs/command-reference/remote/index.md index e3f3b24d7e..9856561927 100644 --- a/static/docs/command-reference/remote/index.md +++ b/static/docs/command-reference/remote/index.md @@ -97,7 +97,7 @@ remote = myremote ## Example: Add Amazon S3 remote and modify its region -> **Note!** Before adding a new remote be sure follow the instructions at +> 💡 Before adding an S3 remote, be sure to > [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html). ```dvc diff --git a/static/docs/command-reference/version.md b/static/docs/command-reference/version.md index 339a012400..5f6d839053 100644 --- a/static/docs/command-reference/version.md +++ b/static/docs/command-reference/version.md @@ -29,8 +29,8 @@ system/environment: -> **Note** that if you've installed dvc using pip, you will need to install -> `psutil` by yourself with `pip install psutil` in order for `dvc version` to +> Note that if you've installed DVC using `pip`, you will need to install +> `psutil` manually with `pip install psutil` in order for `dvc version` to > report file system information. Please see the original > [issue on GitHub](https://github.com/iterative/dvc/issues/2284) for more info. From 66d1e52541d26466a62c74bd5ea84b1db2af9330 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 11:15:13 -0800 Subject: [PATCH 12/18] term: finish reviewing bold notes in docs, adds some more emojis to finish #848, and advance #855 --- static/docs/install/windows.md | 2 +- static/docs/tutorials/pipelines.md | 4 ++-- static/docs/user-guide/dvc-files-and-directories.md | 2 +- static/docs/user-guide/large-dataset-optimization.md | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/static/docs/install/windows.md b/static/docs/install/windows.md index 95a799edb4..34b605c027 100644 --- a/static/docs/install/windows.md +++ b/static/docs/install/windows.md @@ -1,6 +1,6 @@ # Installation on Windows -> **Note!** Please review +> 💡 Please review > [Running DVC on Windows](/doc/user-guide/running-dvc-on-windows) for important > tips to improve your experience using DVC on Windows. diff --git a/static/docs/tutorials/pipelines.md b/static/docs/tutorials/pipelines.md index 2a920d2546..134e64805c 100644 --- a/static/docs/tutorials/pipelines.md +++ b/static/docs/tutorials/pipelines.md @@ -198,8 +198,8 @@ is similar to Git's [objects database](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects), but made specifically to handle large data files. -> **Note!** For performance with large datasets, DVC can use file links from the -> cache to the workspace to avoid copying actual file contents. Refer to +> Note that for performance with large datasets, DVC can use file links from the +> cache to the workspace. This avoids copying actual file contents. Refer to > [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) > to learn which options exist and how to enable them. diff --git a/static/docs/user-guide/dvc-files-and-directories.md b/static/docs/user-guide/dvc-files-and-directories.md index 54e3e0ed8d..3fd291b735 100644 --- a/static/docs/user-guide/dvc-files-and-directories.md +++ b/static/docs/user-guide/dvc-files-and-directories.md @@ -53,7 +53,7 @@ example, if a data file `Posts.xml.zip` has checksum `ec1d2935f811b77cc49b031b999cbf17`, its cache entry will be `.dvc/cache/ec/1d2935f811b77cc49b031b999cbf17` locally. -> **Note!** File checksums are calculated from file contents only. 2 or more +> Note that file checksums are calculated from file contents only. 2 or more > files with different names but the same contents can exist in the workspace > and be tracked by DVC, but only one copy is stored in the cache. This helps > avoid data duplication in cache and remotes. diff --git a/static/docs/user-guide/large-dataset-optimization.md b/static/docs/user-guide/large-dataset-optimization.md index 323ac4711e..5c29ed0f09 100644 --- a/static/docs/user-guide/large-dataset-optimization.md +++ b/static/docs/user-guide/large-dataset-optimization.md @@ -25,7 +25,7 @@ supported by the file system. File links are entries in the file system that don't necessarily hold the file contents, but point to where the file is actually stored. File links are more common in file systems used with UNIX-like operating systems and come in -different kinds, that differ in how they connect file names to inodes in the +different kinds, that differ in how they connect file names to _inodes_ in the system. > **Inodes** are metadata file records to locate and store permissions to the @@ -117,8 +117,8 @@ $ dvc config cache.protected true Setting `cache.protected` is important with `hardlink` and/or `symlink` cache file link types. Please refer to the -[Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to -manage tracked files under these cache configurations. +[Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage +tracked files under these cache configurations. --- From 2570a1863320ca4bcdef60fe3586a44bfb04e44f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 14:54:18 -0800 Subject: [PATCH 13/18] rewrap server.js comment --- server.js | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/server.js b/server.js index d5e452f950..f7b7e0e7ad 100644 --- a/server.js +++ b/server.js @@ -1,10 +1,9 @@ /* eslint-env node */ -// This file doesn't go through babel or webpack transformation. -// Make sure the syntax and sources this file requires are compatible with the -// current node version you are running. -// See https://github.com/zeit/next.js/issues/1245 for discussions on Universal -// Webpack or universal Babel. +// This file doesn't go through babel or webpack transformation. Make sure the +// syntax and sources this file requires are compatible with the current Node.js +// version you are running. (See https://github.com/zeit/next.js/issues/1245 for +// discussions on universal Webpack vs universal Babel.) const { createServer } = require('http') const { parse } = require('url') From 99558335322de467c2c2eaafac3a302e2effd833 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 15:32:55 -0800 Subject: [PATCH 14/18] back ticks for `dvc` and H for "GitHub" per https://github.com/iterative/dvc.org/pull/831#discussion_r356882725 --- .github/PULL_REQUEST_TEMPLATE.md | 14 ++++++++++---- static/docs/command-reference/get.md | 2 +- static/docs/user-guide/running-dvc-on-windows.md | 2 +- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index d0a66124ad..053a8746f3 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,7 +1,13 @@ -Disregard the recommendations below if you use **Edit on Github** button to improve the docs in place. +Disregard the recommendations below if you use **Edit on GitHub** button to +improve the docs in place. -❗ Please read the guidelines in the [Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) list if you make any substantial changes to the documentation or JS engine. +❗ Please read the guidelines in the +[Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) +list if you make any substantial changes to the documentation or JS engine. -🐛 Please make sure to mention `Fix #issue` (if applicable) in the description of the PR. This enables GitHub to link the PR to the corresponding bug and close it automatically when PR is merged. +🐛 Please make sure to mention `Fix #issue` (if applicable) in the description +of the PR. This enables GitHub to link the PR to the corresponding bug and close +it automatically when PR is merged. -Thank you for the contribution - we'll try to review and merge it as soon as possible. 🙏 +Thank you for the contribution - we'll try to review and merge it as soon as +possible. 🙏 diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index e9f1c7528c..f0e37cdd64 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -78,7 +78,7 @@ created in the current working directory, with its original file name. We can use `dvc get` to download the resulting model file from our [get started example repo](https://github.com/iterative/example-get-started), a -DVC project hosted on Github: +DVC project hosted on GitHub: ```dvc $ dvc get https://github.com/iterative/example-get-started model.pkl diff --git a/static/docs/user-guide/running-dvc-on-windows.md b/static/docs/user-guide/running-dvc-on-windows.md index a896dd7019..f2b79af9cb 100644 --- a/static/docs/user-guide/running-dvc-on-windows.md +++ b/static/docs/user-guide/running-dvc-on-windows.md @@ -60,5 +60,5 @@ $ choco install less ``` `less` can be installed in other ways, just make sure it's available in -`cmd`/Powershell, where you run dvc. (This usually means adding the directory +`cmd`/PowerShell, where you run `dvc`. (This usually means adding the directory where `less` is installed to the `PATH` environment variable.) From dc6a03e73d1064700742a7a176449e31c1eaa922 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 15:50:32 -0800 Subject: [PATCH 15/18] cmd ref: remove outdated note about pyarrow in remote add per https://github.com/iterative/dvc.org/pull/846#pullrequestreview-330064610 --- static/docs/command-reference/remote/add.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 9d4b17155d..5d2297d6f7 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -322,11 +322,6 @@ location, with tools like `ssh` and `sftp` (GNU/Linux). $ dvc remote add myremote hdfs://user@example.com/path/to/dir ``` -> If you are seeing an `Unable to load libjvm` error on Ubuntu with openjdk-8, -> try setting the `JAVA_HOME` environment variable. This issue is solved in the -> [upstream version of pyarrow](https://github.com/apache/arrow/pull/4907) and -> the fix will be included into the next pyarrow release. -
From 408073c1d7efa09eef9fb2e7e072758790f4c270 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 15:54:42 -0800 Subject: [PATCH 16/18] cmd ref: revise yellow ! notes per https://github.com/iterative/dvc.org/pull/846#pullrequestreview-330249467 --- pages/api/comments.js | 2 +- static/docs/command-reference/config.md | 8 ++++---- static/docs/command-reference/remote/add.md | 8 ++++---- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/pages/api/comments.js b/pages/api/comments.js index 14e4aaf1de..405901f132 100644 --- a/pages/api/comments.js +++ b/pages/api/comments.js @@ -1,6 +1,6 @@ /* * This API endpoint is used by our blog to get comments count for the post, it - * gets discuss.dvc.org topic url as a param and returns comments count or + * gets discuss.dvc.org topic URL as a param and returns comments count or * error. * * It made this way to configure CORS, reduce user's payload and to add diff --git a/static/docs/command-reference/config.md b/static/docs/command-reference/config.md index 9241d5068b..d2abc81f2c 100644 --- a/static/docs/command-reference/config.md +++ b/static/docs/command-reference/config.md @@ -124,10 +124,10 @@ for more details.) effective of those two. DVC avoids `symlink` and `hardlink` types by default to protect user from accidental cache and repository corruption. - **Note** ⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you - will corrupt the cache** if you modify tracked data files in the workspace. - See the `cache.protected` config option above and corresponding - `dvc unprotect` command to modify files safely. + ⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will + corrupt the cache** if you modify tracked data files in the workspace. See the + `cache.protected` config option above and corresponding `dvc unprotect` + command to modify files safely. There are pros and cons to different link types. Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 5d2297d6f7..6dec91c96b 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -301,9 +301,9 @@ $ export OSS_ACCESS_KEY_SECRET='AccessKeySecret' $ dvc remote add myremote ssh://user@example.com/path/to/dir ``` -**Note** ⚠️ DVC requires both SSH and SFTP access to work with SSH remote -storage. Please check that you are able to connect both ways to the remote -location, with tools like `ssh` and `sftp` (GNU/Linux). +⚠️ DVC requires both SSH and SFTP access to work with SSH remote storage. Please +check that you are able to connect both ways to the remote location, with tools +like `ssh` and `sftp` (GNU/Linux). > Note that your server's SFTP root might differ from its physical root (`/`). > (On Linux, see the `ChrootDirectory` config option in `/etc/ssh/sshd_config`.) @@ -332,7 +332,7 @@ $ dvc remote add myremote hdfs://user@example.com/path/to/dir $ dvc remote add myremote https://example.com/path/to/dir ``` -**Note** ⚠️ HTTP remotes only support downloads operations: +⚠️ HTTP remotes only support downloads operations: - `pull` and `fetch` - `import-url` and `get-url` From 4455fb056b7e758d8c1fca9d762cf3448a89d8ae Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 15:58:56 -0800 Subject: [PATCH 17/18] cmd ref: addressed misc. feedback for PR #846 --- static/docs/command-reference/remote/add.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 6dec91c96b..3ef036363f 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -23,14 +23,14 @@ positional arguments: ## Description -`name` and `url` are required. `url` specifies a location (path, address, -endpoint) to store your data. It can represent a cloud storage service, an SSH -server, network-attached storage, or even a directory in the local file system. -(See all the supported remote storage types in the examples below.) If `url` is -a relative path, it will be resolved against the current working directory, but -saved **relative to the config file location** (see LOCAL example below). -Whenever possible, DVC will create a remote directory if it doesn't exists yet. -(It won't create an S3 bucket though, and will rely on default access settings.) +`name` and `url` are required. `url` specifies a location to store your data. It +can point to a cloud storage service, an SSH server, network-attached storage, +or even a directory in the local file system. (See all the supported remote +storage types in the examples below.) If `url` is a relative path, it will be +resolved against the current working directory, but saved **relative to the +config file location** (see LOCAL example below). Whenever possible, DVC will +create a remote directory if it doesn't exists yet. (It won't create an S3 +bucket though, and will rely on default access settings.) > If you installed DVC via `pip` and plan to use cloud services as remote > storage, you might need to install these optional dependencies: `[s3]`, From 53f840c65bfcf987397e880a7b4e67ff23e76b4e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 16:10:31 -0800 Subject: [PATCH 18/18] revert .github/PULL_REQUEST_TEMPLATE.md auto formatting --- .github/PULL_REQUEST_TEMPLATE.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 053a8746f3..13096e33d0 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,13 +1,7 @@ -Disregard the recommendations below if you use **Edit on GitHub** button to -improve the docs in place. +Disregard the recommendations below if you use **Edit on GitHub** button to improve the docs in place. -❗ Please read the guidelines in the -[Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) -list if you make any substantial changes to the documentation or JS engine. +❗ Please read the guidelines in the [Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) list if you make any substantial changes to the documentation or JS engine. -🐛 Please make sure to mention `Fix #issue` (if applicable) in the description -of the PR. This enables GitHub to link the PR to the corresponding bug and close -it automatically when PR is merged. +🐛 Please make sure to mention `Fix #issue` (if applicable) in the description of the PR. This enables GitHub to link the PR to the corresponding bug and close it automatically when PR is merged. -Thank you for the contribution - we'll try to review and merge it as soon as -possible. 🙏 +Thank you for the contribution - we'll try to review and merge it as soon as possible. 🙏