Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@
+ [BR Tool Overview](/br/backup-and-restore-tool.md)
+ [Use BR Command-line for Backup and Restoration](/br/backup-and-restore-tool.md)
+ [BR Use Cases](/br/backup-and-restore-use-cases.md)
+ [BR Storages](/br/backup-and-restore-storages.md)
+ [External Storages](/br/backup-and-restore-storages.md)
+ [BR FAQ](/br/backup-and-restore-faq.md)
+ TiDB Binlog
+ [Overview](/tidb-binlog/tidb-binlog-overview.md)
Expand Down
20 changes: 10 additions & 10 deletions benchmark/benchmark-tidb-using-tpcc.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ This process might last for several hours depending on the machine configuration

### Use TiDB Lightning to load data

The amount of loaded data increases as the number of warehouses increases. When you need to load more than 1000 warehouses of data, you can first use BenchmarkSQL to generate CSV files, and then quickly load the CSV files through TiDB Lightning (hereinafter referred to as Lightning). The CSV files can be reused multiple times, which saves the time required for each generation.
The amount of loaded data increases as the number of warehouses increases. When you need to load more than 1000 warehouses of data, you can first use BenchmarkSQL to generate CSV files, and then quickly load the CSV files through TiDB Lightning. The CSV files can be reused multiple times, which saves the time required for each generation.

Follow the steps below to use TiDB Lightning to load data:

Expand All @@ -194,7 +194,7 @@ Follow the steps below to use TiDB Lightning to load data:
fileLocation=/home/user/csv/ # The absolute path of the directory where your CSV files are stored
```

It is recommended that the CSV file names adhere to the naming rules in Lightning, that is, `{database}.{table}.csv`, because eventually you'll use Lightning to load data. Here you can modify the above configuration as follows:
It is recommended that the CSV file names adhere to the naming rules in TiDB Lightning, that is, `{database}.{table}.csv`, because eventually you'll use TiDB Lightning to load data. Here you can modify the above configuration as follows:

```text
fileLocation=/home/user/csv/tpcc. # The absolute path of the directory where your CSV files are stored + the file name prefix (database)
Expand All @@ -210,9 +210,9 @@ Follow the steps below to use TiDB Lightning to load data:
./runLoader.sh props.mysql
```

3. Use Lightning to load data.
3. Use TiDB Lightning to load data.

To load data using Lightning, see [TiDB Lightning Deployment](/tidb-lightning/deploy-tidb-lightning.md). The following steps introduce how to use TiDB Ansible to deploy Lightning and use Lightning to load data.
To load data using TiDB Lightning, see [TiDB Lightning Deployment](/tidb-lightning/deploy-tidb-lightning.md). The following steps introduce how to use TiDB Ansible to deploy TiDB Lightning and use TiDB Lightning to load data.

1. Edit `inventory.ini`.

Expand Down Expand Up @@ -240,22 +240,22 @@ Follow the steps below to use TiDB Lightning to load data:
trim-last-separator: false
```

3. Deploy Lightning and Importer.
3. Deploy TiDB Lightning and TiKV Importer.

{{< copyable "shell-regular" >}}

```shell
ansible-playbook deploy.yml --tags=lightning
```

4. Start Lightning and Importer.
4. Start TiDB Lightning and TiKV Importer.

* Log into the server where Lightning and Importer are deployed.
* Log into the server where TiDB Lightning and TiKV Importer are deployed.
* Enter the deployment directory.
* Execute `scripts/start_importer.sh` under the Importer directory to start Importer.
* Execute `scripts/start_lightning.sh` under the Lightning directory to begin to load data.
* Execute `scripts/start_importer.sh` under the TiKV Importer directory to start Importer.
* Execute `scripts/start_lightning.sh` under the TiDB Lightning directory to begin to load data.

Because you've used TiDB Ansible deployment method, you can see the loading progress of Lightning on the monitoring page, or check whether the loading process is completed through the log.
Because you've used TiDB Ansible deployment method, you can see the loading progress of TiDB Lightning on the monitoring page, or check whether the loading process is completed through the log.

Fourth, after successfully loading data, you can run `sql.common/test.sql` to validate the correctness of the data. If all SQL statements return an empty result, then the data is correctly loaded.

Expand Down
109 changes: 87 additions & 22 deletions br/backup-and-restore-storages.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: BR Storages
summary: Describes the storage URL format used in BR.
title: External Storages
summary: Describes the storage URL format used in BR, TiDB Lightning, and Dumpling.
aliases: ['/docs/stable/br/backup-and-restore-storages/','/docs/v4.0/br/backup-and-restore-storages/']
---

# BR Storages
# External Storages

BR supports reading and writing data on the local filesystem, as well as on Amazon S3 and Google Cloud Storage. These are distinguished by the URL scheme in the `--storage` parameter passed into BR.
Backup & Restore (BR), TiDB Lighting, and Dumpling support reading and writing data on the local filesystem and on Amazon S3. BR also supports reading and writing data on the Google Cloud Storage (GCS). These are distinguished by the URL scheme in the `--storage` parameter passed into BR, in the `-d` parameter passed into TiDB Lightning, and in the `--output` (`-o`) parameter passed into Dumpling.

## Schemes

Expand All @@ -19,19 +19,40 @@ The following services are supported:
| Google Cloud Storage (GCS) | gcs, gs | `gcs://bucket-name/prefix/of/dest/` |
| Write to nowhere (for benchmarking only) | noop | `noop://` |

## Parameters
## URL parameters

Cloud storages such as S3 and GCS sometimes require additional configuration for connection. You can specify parameters for such configuration. For example:

{{< copyable "shell-regular" >}}
+ Use Dumpling to export data to S3:

```shell
./br backup full -u 127.0.0.1:2379 -s 's3://bucket-name/prefix?region=us-west-2'
```
{{< copyable "shell-regular" >}}

```bash
./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \
-o 's3://my-bucket/sql-backup?region=us-west-2'
```

+ Use TiDB Lightning to import data from S3:

{{< copyable "shell-regular" >}}

```bash
./tidb-lightning --tidb-port=4000 --pd-urls=127.0.0.1:2379 --backend=local --sorted-kv-dir=/tmp/sorted-kvs \
-d 's3://my-bucket/sql-backup?region=us-west-2'
```

+ Use BR to back up data to GCS:

### S3 parameters
{{< copyable "shell-regular" >}}

| Parameter | Description |
```bash
./br backup full -u 127.0.0.1:2379 \
-s 'gcs://bucket-name/prefix'
```

### S3 URL parameters

| URL parameter | Description |
|----------:|---------|
| `access-key` | The access key |
| `secret-access-key` | The secret access key |
Expand All @@ -46,37 +67,81 @@ Cloud storages such as S3 and GCS sometimes require additional configuration for

> **Note:**
>
> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. BR tries to infer these keys from the environment in the following order:
> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. The migration tools try to infer these keys from the environment in the following order:

1. `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables
2. `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables
3. Shared credentials file on the BR node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
4. Shared credentials file on the BR node at `~/.aws/credentials`
3. Shared credentials file on the tool node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
4. Shared credentials file on the tool node at `~/.aws/credentials`
5. Current IAM role of the Amazon EC2 container
6. Current IAM role of the Amazon ECS task

### GCS parameters
### GCS URL parameters

| Parameter | Description |
| URL parameter | Description |
|----------:|---------|
| `credentials-file` | The path to the credentials JSON file on the TiDB node |
| `credentials-file` | The path to the credentials JSON file on the tool node |
| `storage-class` | Storage class of the uploaded objects (for example, `STANDARD`, `COLDLINE`) |
| `predefined-acl` | Predefined ACL of the uploaded objects (for example, `private`, `project-private`) |

When `credentials-file` is not specified, BR will try to infer the credentials from the environment, in the following order:
When `credentials-file` is not specified, the migration tool will try to infer the credentials from the environment, in the following order:

1. Content of the file on the BR node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable
2. Content of the file on the BR node at `~/.config/gcloud/application_default_credentials.json`
1. Content of the file on the tool node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable
2. Content of the file on the tool node at `~/.config/gcloud/application_default_credentials.json`
3. When running in GCE or GAE, the credentials fetched from the metadata server.

## Sending credentials to TiKV
## Command-line parameters

In addition to the URL parameters, BR and Dumpling also support specifying these configurations using command-line parameters. For example:

{{< copyable "shell-regular" >}}

```bash
./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \
-o 's3://my-bucket/sql-backup' \
--s3.region 'us-west-2'
```

If you have specified URL parameters and command-line parameters at the same time, the URL parameters are overwritten by the command-line parameters.

### S3 command-line parameters

| Command-line parameter | Description |
|----------:|------|
| `--s3.region` | Amazon S3's service region, which defaults to `us-east-1`. |
| `--s3.endpoint` | The URL of custom endpoint for S3-compatible services. For example, `https://s3.example.com/`. |
| `--s3.storage-class` | The storage class of the upload object. For example, `STANDARD` and `STANDARD_IA`. |
| `--s3.sse` | The server-side encryption algorithm used to encrypt the upload. The value options are empty, `AES256` and `aws:kms`. |
| `--s3.sse-kms-key-id` | If `--s3.sse` is configured as `aws:kms`, this parameter is used to specify the KMS ID. |
| `--s3.acl` | The canned ACL of the upload object. For example, `private` and `authenticated-read`. |
| `--s3.provider` | The type of the S3-compatible service. The supported types are `aws`, `alibaba`, `ceph`, `netease` and `other`. |

### GCS command-line parameters

| Command-line parameter | Description |
|----------:|---------|
| `--gcs.credentials-file` | The path of the JSON-formatted credential on the tool node. |
| `--gcs.storage-class` | The storage type of the upload object, such as `STANDARD` and `COLDLINE`. |
| `--gcs.predefined-acl` | The pre-defined ACL of the upload object, such as `private` and `project-private`. |

## BR sending credentials to TiKV

By default, when using S3 and GCS destinations, BR will send the credentials to every TiKV nodes to reduce setup complexity.

However, this is unsuitable on cloud environment, where every node has their own role and permission. In such cases, you need to disable credentials sending with `--send-credentials-to-tikv=false` (or the short form `-c=0`):

{{< copyable "shell-regular" >}}

```shell
```bash
./br backup full -c=0 -u pd-service:2379 -s 's3://bucket-name/prefix'
```

When using SQL statements to [back up](/sql-statements/sql-statement-backup.md) and [restore](/sql-statements/sql-statement-restore.md) data, you can add the `SEND_CREDENTIALS_TO_TIKV = FALSE` option:

{{< copyable "sql" >}}

```sql
BACKUP DATABASE * TO 's3://bucket-name/prefix' SEND_CREDENTIALS_TO_TIKV = FALSE;
```

This option is not supported in TiDB Lightning and Dumpling, because the two applications are currently standalone.
4 changes: 2 additions & 2 deletions br/backup-and-restore-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d

> **Note:**
>
> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [BR Storages](/br/backup-and-restore-storages.md) document.
> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [External Storages](/br/backup-and-restore-storages.md#url-parameters) document.

- [Back up Data to S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br)
- [Restore Data from S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/restore-from-aws-s3-using-br)
Expand All @@ -194,4 +194,4 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d
- [Use BR Command-line](/br/use-br-command-line-tool.md)
- [BR Use Cases](/br/backup-and-restore-use-cases.md)
- [BR FAQ](/br/backup-and-restore-faq.md)
- [BR Storages](/br/backup-and-restore-storages.md)
- [External Storages](/br/backup-and-restore-storages.md)
6 changes: 3 additions & 3 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey}
export AWS_SECRET_ACCESS_KEY=${SecretKey}
```

Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [BR storages](/br/backup-and-restore-storages.md), which is consistent with the Dumpling configuration.
Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [External storages](/br/backup-and-restore-storages.md).

When you back up data using Dumpling, explicitly specify the `--s3.region` parameter, which means the region of the S3 storage:

Expand Down Expand Up @@ -311,7 +311,7 @@ After your operation is completed, set the GC time back (the default value is `1
update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';
```

Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-backends.md).
Finally, all the exported data can be imported back to TiDB using [TiDB Lightning](/tidb-lightning/tidb-lightning-backends.md).

## Option list of Dumpling

Expand All @@ -335,7 +335,7 @@ Finally, all the exported data can be imported back to TiDB using [Lightning](/t
| `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes |
| `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. |
| `--filetype` | Exported file type (csv/sql) | "sql" |
| `-o` or `--output` | Exported file path | "./export-${time}" |
| `-o` or `--output` | The path of exported local files or [the URL of the external storage](/br/backup-and-restore-storages.md) | "./export-${time}" |
| `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. |
| `--consistency` | flush: use FTWRL before the dump <br/> snapshot: dump the TiDB data of a specific snapshot of a TSO <br/> lock: execute `lock tables read` on all tables to be dumped <br/> none: dump without adding locks, which cannot guarantee consistency <br/> auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" |
| `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` |
Expand Down
6 changes: 3 additions & 3 deletions faq/migration-tidb-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,9 @@ See [Syncer User Guide](/syncer-overview.md).
Download and import [Syncer Json](https://github.com/pingcap/docs/blob/master/etc/Syncer.json) to Grafana. Edit the Prometheus configuration file and add the following content:

```
- job_name: 'syncer_ops' // task name
- job_name: 'syncer_ops' # task name
static_configs:
- targets: [’10.10.1.1:10096’] // Syncer monitoring address and port, informing Prometheus to pull the data of Syncer
- targets: [’10.10.1.1:10096’] # Syncer monitoring address and port, informing Prometheus to pull the data of Syncer
```

Restart Prometheus.
Expand Down Expand Up @@ -169,7 +169,7 @@ If the amount of data that needs to be deleted at a time is very large, this loo

### How to improve the data loading speed in TiDB?

- The [Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
- The [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
- Data loading in TiDB is related to the status of disks and the whole cluster. When loading data, pay attention to metrics like the disk usage rate of the host, TiClient Error, Backoff, Thread CPU and so on. You can analyze the bottlenecks using these metrics.

### What should I do if it is slow to reclaim storage space after deleting data?
Expand Down
Loading