diff --git a/TOC.md b/TOC.md
index d9bf343c70bcf..59b4f00b43d10 100644
--- a/TOC.md
+++ b/TOC.md
@@ -167,7 +167,7 @@
+ [BR Tool Overview](/br/backup-and-restore-tool.md)
+ [Use BR Command-line for Backup and Restoration](/br/backup-and-restore-tool.md)
+ [BR Use Cases](/br/backup-and-restore-use-cases.md)
- + [BR Storages](/br/backup-and-restore-storages.md)
+ + [External Storages](/br/backup-and-restore-storages.md)
+ [BR FAQ](/br/backup-and-restore-faq.md)
+ TiDB Binlog
+ [Overview](/tidb-binlog/tidb-binlog-overview.md)
diff --git a/benchmark/benchmark-tidb-using-tpcc.md b/benchmark/benchmark-tidb-using-tpcc.md
index 836d4175d0a4c..9d64aeda652eb 100644
--- a/benchmark/benchmark-tidb-using-tpcc.md
+++ b/benchmark/benchmark-tidb-using-tpcc.md
@@ -182,7 +182,7 @@ This process might last for several hours depending on the machine configuration
### Use TiDB Lightning to load data
-The amount of loaded data increases as the number of warehouses increases. When you need to load more than 1000 warehouses of data, you can first use BenchmarkSQL to generate CSV files, and then quickly load the CSV files through TiDB Lightning (hereinafter referred to as Lightning). The CSV files can be reused multiple times, which saves the time required for each generation.
+The amount of loaded data increases as the number of warehouses increases. When you need to load more than 1000 warehouses of data, you can first use BenchmarkSQL to generate CSV files, and then quickly load the CSV files through TiDB Lightning. The CSV files can be reused multiple times, which saves the time required for each generation.
Follow the steps below to use TiDB Lightning to load data:
@@ -194,7 +194,7 @@ Follow the steps below to use TiDB Lightning to load data:
fileLocation=/home/user/csv/ # The absolute path of the directory where your CSV files are stored
```
- It is recommended that the CSV file names adhere to the naming rules in Lightning, that is, `{database}.{table}.csv`, because eventually you'll use Lightning to load data. Here you can modify the above configuration as follows:
+ It is recommended that the CSV file names adhere to the naming rules in TiDB Lightning, that is, `{database}.{table}.csv`, because eventually you'll use TiDB Lightning to load data. Here you can modify the above configuration as follows:
```text
fileLocation=/home/user/csv/tpcc. # The absolute path of the directory where your CSV files are stored + the file name prefix (database)
@@ -210,9 +210,9 @@ Follow the steps below to use TiDB Lightning to load data:
./runLoader.sh props.mysql
```
-3. Use Lightning to load data.
+3. Use TiDB Lightning to load data.
- To load data using Lightning, see [TiDB Lightning Deployment](/tidb-lightning/deploy-tidb-lightning.md). The following steps introduce how to use TiDB Ansible to deploy Lightning and use Lightning to load data.
+ To load data using TiDB Lightning, see [TiDB Lightning Deployment](/tidb-lightning/deploy-tidb-lightning.md). The following steps introduce how to use TiDB Ansible to deploy TiDB Lightning and use TiDB Lightning to load data.
1. Edit `inventory.ini`.
@@ -240,7 +240,7 @@ Follow the steps below to use TiDB Lightning to load data:
trim-last-separator: false
```
- 3. Deploy Lightning and Importer.
+ 3. Deploy TiDB Lightning and TiKV Importer.
{{< copyable "shell-regular" >}}
@@ -248,14 +248,14 @@ Follow the steps below to use TiDB Lightning to load data:
ansible-playbook deploy.yml --tags=lightning
```
- 4. Start Lightning and Importer.
+ 4. Start TiDB Lightning and TiKV Importer.
- * Log into the server where Lightning and Importer are deployed.
+ * Log into the server where TiDB Lightning and TiKV Importer are deployed.
* Enter the deployment directory.
- * Execute `scripts/start_importer.sh` under the Importer directory to start Importer.
- * Execute `scripts/start_lightning.sh` under the Lightning directory to begin to load data.
+ * Execute `scripts/start_importer.sh` under the TiKV Importer directory to start Importer.
+ * Execute `scripts/start_lightning.sh` under the TiDB Lightning directory to begin to load data.
- Because you've used TiDB Ansible deployment method, you can see the loading progress of Lightning on the monitoring page, or check whether the loading process is completed through the log.
+ Because you've used TiDB Ansible deployment method, you can see the loading progress of TiDB Lightning on the monitoring page, or check whether the loading process is completed through the log.
Fourth, after successfully loading data, you can run `sql.common/test.sql` to validate the correctness of the data. If all SQL statements return an empty result, then the data is correctly loaded.
diff --git a/br/backup-and-restore-storages.md b/br/backup-and-restore-storages.md
index 0bae43d6a8cce..c4587cfb4ef33 100644
--- a/br/backup-and-restore-storages.md
+++ b/br/backup-and-restore-storages.md
@@ -1,12 +1,12 @@
---
-title: BR Storages
-summary: Describes the storage URL format used in BR.
+title: External Storages
+summary: Describes the storage URL format used in BR, TiDB Lightning, and Dumpling.
aliases: ['/docs/stable/br/backup-and-restore-storages/','/docs/v4.0/br/backup-and-restore-storages/']
---
-# BR Storages
+# External Storages
-BR supports reading and writing data on the local filesystem, as well as on Amazon S3 and Google Cloud Storage. These are distinguished by the URL scheme in the `--storage` parameter passed into BR.
+Backup & Restore (BR), TiDB Lighting, and Dumpling support reading and writing data on the local filesystem and on Amazon S3. BR also supports reading and writing data on the Google Cloud Storage (GCS). These are distinguished by the URL scheme in the `--storage` parameter passed into BR, in the `-d` parameter passed into TiDB Lightning, and in the `--output` (`-o`) parameter passed into Dumpling.
## Schemes
@@ -19,19 +19,40 @@ The following services are supported:
| Google Cloud Storage (GCS) | gcs, gs | `gcs://bucket-name/prefix/of/dest/` |
| Write to nowhere (for benchmarking only) | noop | `noop://` |
-## Parameters
+## URL parameters
Cloud storages such as S3 and GCS sometimes require additional configuration for connection. You can specify parameters for such configuration. For example:
-{{< copyable "shell-regular" >}}
++ Use Dumpling to export data to S3:
-```shell
-./br backup full -u 127.0.0.1:2379 -s 's3://bucket-name/prefix?region=us-west-2'
-```
+ {{< copyable "shell-regular" >}}
+
+ ```bash
+ ./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \
+ -o 's3://my-bucket/sql-backup?region=us-west-2'
+ ```
+
++ Use TiDB Lightning to import data from S3:
+
+ {{< copyable "shell-regular" >}}
+
+ ```bash
+ ./tidb-lightning --tidb-port=4000 --pd-urls=127.0.0.1:2379 --backend=local --sorted-kv-dir=/tmp/sorted-kvs \
+ -d 's3://my-bucket/sql-backup?region=us-west-2'
+ ```
+
++ Use BR to back up data to GCS:
-### S3 parameters
+ {{< copyable "shell-regular" >}}
-| Parameter | Description |
+ ```bash
+ ./br backup full -u 127.0.0.1:2379 \
+ -s 'gcs://bucket-name/prefix'
+ ```
+
+### S3 URL parameters
+
+| URL parameter | Description |
|----------:|---------|
| `access-key` | The access key |
| `secret-access-key` | The secret access key |
@@ -46,30 +67,64 @@ Cloud storages such as S3 and GCS sometimes require additional configuration for
> **Note:**
>
-> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. BR tries to infer these keys from the environment in the following order:
+> It is not recommended to pass in the access key and secret access key directly in the storage URL, because these keys are logged in plain text. The migration tools try to infer these keys from the environment in the following order:
1. `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables
2. `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables
-3. Shared credentials file on the BR node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
-4. Shared credentials file on the BR node at `~/.aws/credentials`
+3. Shared credentials file on the tool node at the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable
+4. Shared credentials file on the tool node at `~/.aws/credentials`
5. Current IAM role of the Amazon EC2 container
6. Current IAM role of the Amazon ECS task
-### GCS parameters
+### GCS URL parameters
-| Parameter | Description |
+| URL parameter | Description |
|----------:|---------|
-| `credentials-file` | The path to the credentials JSON file on the TiDB node |
+| `credentials-file` | The path to the credentials JSON file on the tool node |
| `storage-class` | Storage class of the uploaded objects (for example, `STANDARD`, `COLDLINE`) |
| `predefined-acl` | Predefined ACL of the uploaded objects (for example, `private`, `project-private`) |
-When `credentials-file` is not specified, BR will try to infer the credentials from the environment, in the following order:
+When `credentials-file` is not specified, the migration tool will try to infer the credentials from the environment, in the following order:
-1. Content of the file on the BR node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable
-2. Content of the file on the BR node at `~/.config/gcloud/application_default_credentials.json`
+1. Content of the file on the tool node at the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable
+2. Content of the file on the tool node at `~/.config/gcloud/application_default_credentials.json`
3. When running in GCE or GAE, the credentials fetched from the metadata server.
-## Sending credentials to TiKV
+## Command-line parameters
+
+In addition to the URL parameters, BR and Dumpling also support specifying these configurations using command-line parameters. For example:
+
+{{< copyable "shell-regular" >}}
+
+```bash
+./dumpling -u root -h 127.0.0.1 -P 3306 -B mydb -F 256MiB \
+ -o 's3://my-bucket/sql-backup' \
+ --s3.region 'us-west-2'
+```
+
+If you have specified URL parameters and command-line parameters at the same time, the URL parameters are overwritten by the command-line parameters.
+
+### S3 command-line parameters
+
+| Command-line parameter | Description |
+|----------:|------|
+| `--s3.region` | Amazon S3's service region, which defaults to `us-east-1`. |
+| `--s3.endpoint` | The URL of custom endpoint for S3-compatible services. For example, `https://s3.example.com/`. |
+| `--s3.storage-class` | The storage class of the upload object. For example, `STANDARD` and `STANDARD_IA`. |
+| `--s3.sse` | The server-side encryption algorithm used to encrypt the upload. The value options are empty, `AES256` and `aws:kms`. |
+| `--s3.sse-kms-key-id` | If `--s3.sse` is configured as `aws:kms`, this parameter is used to specify the KMS ID. |
+| `--s3.acl` | The canned ACL of the upload object. For example, `private` and `authenticated-read`. |
+| `--s3.provider` | The type of the S3-compatible service. The supported types are `aws`, `alibaba`, `ceph`, `netease` and `other`. |
+
+### GCS command-line parameters
+
+| Command-line parameter | Description |
+|----------:|---------|
+| `--gcs.credentials-file` | The path of the JSON-formatted credential on the tool node. |
+| `--gcs.storage-class` | The storage type of the upload object, such as `STANDARD` and `COLDLINE`. |
+| `--gcs.predefined-acl` | The pre-defined ACL of the upload object, such as `private` and `project-private`. |
+
+## BR sending credentials to TiKV
By default, when using S3 and GCS destinations, BR will send the credentials to every TiKV nodes to reduce setup complexity.
@@ -77,6 +132,16 @@ However, this is unsuitable on cloud environment, where every node has their own
{{< copyable "shell-regular" >}}
-```shell
+```bash
./br backup full -c=0 -u pd-service:2379 -s 's3://bucket-name/prefix'
```
+
+When using SQL statements to [back up](/sql-statements/sql-statement-backup.md) and [restore](/sql-statements/sql-statement-restore.md) data, you can add the `SEND_CREDENTIALS_TO_TIKV = FALSE` option:
+
+{{< copyable "sql" >}}
+
+```sql
+BACKUP DATABASE * TO 's3://bucket-name/prefix' SEND_CREDENTIALS_TO_TIKV = FALSE;
+```
+
+This option is not supported in TiDB Lightning and Dumpling, because the two applications are currently standalone.
diff --git a/br/backup-and-restore-tool.md b/br/backup-and-restore-tool.md
index 4ad58ba3a979b..55cb5703685e8 100644
--- a/br/backup-and-restore-tool.md
+++ b/br/backup-and-restore-tool.md
@@ -180,7 +180,7 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d
> **Note:**
>
-> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [BR Storages](/br/backup-and-restore-storages.md) document.
+> For Amazon S3 and Google Cloud Storage parameter descriptions, see the [External Storages](/br/backup-and-restore-storages.md#url-parameters) document.
- [Back up Data to S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br)
- [Restore Data from S3-Compatible Storage Using BR](https://docs.pingcap.com/tidb-in-kubernetes/stable/restore-from-aws-s3-using-br)
@@ -194,4 +194,4 @@ In the Kubernetes environment, you can use the BR tool to back up TiDB cluster d
- [Use BR Command-line](/br/use-br-command-line-tool.md)
- [BR Use Cases](/br/backup-and-restore-use-cases.md)
- [BR FAQ](/br/backup-and-restore-faq.md)
-- [BR Storages](/br/backup-and-restore-storages.md)
+- [External Storages](/br/backup-and-restore-storages.md)
diff --git a/dumpling-overview.md b/dumpling-overview.md
index 9de667a7c0397..2c54d0c52dfa8 100644
--- a/dumpling-overview.md
+++ b/dumpling-overview.md
@@ -159,7 +159,7 @@ export AWS_ACCESS_KEY_ID=${AccessKey}
export AWS_SECRET_ACCESS_KEY=${SecretKey}
```
-Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [BR storages](/br/backup-and-restore-storages.md), which is consistent with the Dumpling configuration.
+Dumpling also supports reading credential files from `~/.aws/credentials`. For more Dumpling configuration, see the configuration of [External storages](/br/backup-and-restore-storages.md).
When you back up data using Dumpling, explicitly specify the `--s3.region` parameter, which means the region of the S3 storage:
@@ -311,7 +311,7 @@ After your operation is completed, set the GC time back (the default value is `1
update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';
```
-Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-backends.md).
+Finally, all the exported data can be imported back to TiDB using [TiDB Lightning](/tidb-lightning/tidb-lightning-backends.md).
## Option list of Dumpling
@@ -335,7 +335,7 @@ Finally, all the exported data can be imported back to TiDB using [Lightning](/t
| `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes |
| `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. |
| `--filetype` | Exported file type (csv/sql) | "sql" |
-| `-o` or `--output` | Exported file path | "./export-${time}" |
+| `-o` or `--output` | The path of exported local files or [the URL of the external storage](/br/backup-and-restore-storages.md) | "./export-${time}" |
| `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. |
| `--consistency` | flush: use FTWRL before the dump
snapshot: dump the TiDB data of a specific snapshot of a TSO
lock: execute `lock tables read` on all tables to be dumped
none: dump without adding locks, which cannot guarantee consistency
auto: use --consistency flush for MySQL; use --consistency snapshot for TiDB | "auto" |
| `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` |
diff --git a/faq/migration-tidb-faq.md b/faq/migration-tidb-faq.md
index fce3d14ac41be..4b1c3de5b349a 100644
--- a/faq/migration-tidb-faq.md
+++ b/faq/migration-tidb-faq.md
@@ -90,9 +90,9 @@ See [Syncer User Guide](/syncer-overview.md).
Download and import [Syncer Json](https://github.com/pingcap/docs/blob/master/etc/Syncer.json) to Grafana. Edit the Prometheus configuration file and add the following content:
```
-- job_name: 'syncer_ops' // task name
+- job_name: 'syncer_ops' # task name
static_configs:
- - targets: [’10.10.1.1:10096’] // Syncer monitoring address and port, informing Prometheus to pull the data of Syncer
+ - targets: [’10.10.1.1:10096’] # Syncer monitoring address and port, informing Prometheus to pull the data of Syncer
```
Restart Prometheus.
@@ -169,7 +169,7 @@ If the amount of data that needs to be deleted at a time is very large, this loo
### How to improve the data loading speed in TiDB?
-- The [Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
+- The [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
- Data loading in TiDB is related to the status of disks and the whole cluster. When loading data, pay attention to metrics like the disk usage rate of the host, TiClient Error, Backoff, Thread CPU and so on. You can analyze the bottlenecks using these metrics.
### What should I do if it is slow to reclaim storage space after deleting data?
diff --git a/sql-statements/sql-statement-backup.md b/sql-statements/sql-statement-backup.md
index ce84c08fd5464..ebfb1db92e576 100644
--- a/sql-statements/sql-statement-backup.md
+++ b/sql-statements/sql-statement-backup.md
@@ -98,7 +98,7 @@ BACKUP DATABASE * TO 'local:///mnt/backup/full/';
Note that the system tables (`mysql.*`, `INFORMATION_SCHEMA.*`, `PERFORMANCE_SCHEMA.*`, …) will not be included into the backup.
-### Remote destinations
+### External storages
BR supports backing up data to S3 or GCS:
@@ -108,7 +108,7 @@ BR supports backing up data to S3 or GCS:
BACKUP DATABASE `test` TO 's3://example-bucket-2020/backup-05/?region=us-west-2&access-key={YOUR_ACCESS_KEY}&secret-access-key={YOUR_SECRET_KEY}';
```
-The URL syntax is further explained in [BR storages](/br/backup-and-restore-storages.md).
+The URL syntax is further explained in [External Storages](/br/backup-and-restore-storages.md).
When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`:
diff --git a/sql-statements/sql-statement-restore.md b/sql-statements/sql-statement-restore.md
index 4ed0309a3e7d4..75d9efd8db126 100644
--- a/sql-statements/sql-statement-restore.md
+++ b/sql-statements/sql-statement-restore.md
@@ -89,7 +89,7 @@ RESTORE DATABASE `test` FROM 'local:///mnt/backup/2020/04/';
RESTORE TABLE `test`.`sbtest01`, `test`.`sbtest02` FROM 'local:///mnt/backup/2020/04/';
```
-### Remote destinations
+### External storages
BR supports restoring data from S3 or GCS:
@@ -99,7 +99,7 @@ BR supports restoring data from S3 or GCS:
RESTORE DATABASE * FROM 's3://example-bucket-2020/backup-05/?region=us-west-2';
```
-The URL syntax is further explained in [BR storages](/br/backup-and-restore-storages.md).
+The URL syntax is further explained in [External Storages](/br/backup-and-restore-storages.md).
When running on cloud environment where credentials should not be distributed, set the `SEND_CREDENTIALS_TO_TIKV` option to `FALSE`:
diff --git a/table-filter.md b/table-filter.md
index 5ce306ad0c8fb..d3db62a5c568d 100644
--- a/table-filter.md
+++ b/table-filter.md
@@ -36,7 +36,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm
# ^~~~~~~~~~~~~~~~~~~~~~~
```
-* [Lightning](/tidb-lightning/tidb-lightning-overview.md):
+* [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md):
{{< copyable "shell-regular" >}}
@@ -49,7 +49,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm
Table filters in TOML files are specified as [array of strings](https://toml.io/en/v1.0.0-rc.1#section-15). The following lists the example usage in each tool.
-* Lightning:
+* TiDB Lightning:
```toml
[mydumper]
diff --git a/tidb-lightning/migrate-from-csv-using-tidb-lightning.md b/tidb-lightning/migrate-from-csv-using-tidb-lightning.md
index 0f8cffa6134d7..2d4b2f7c15e23 100644
--- a/tidb-lightning/migrate-from-csv-using-tidb-lightning.md
+++ b/tidb-lightning/migrate-from-csv-using-tidb-lightning.md
@@ -138,9 +138,9 @@ TiDB Lightning does not support every option supported by the `LOAD DATA` statem
## Strict format
-Lightning works the best when the input files have uniform size around 256 MB. When the input is a single huge CSV file, Lightning can only use one thread to process it, which slows down import speed a lot.
+Lightning works the best when the input files have uniform size around 256 MB. When the input is a single huge CSV file, TiDB Lightning can only use one thread to process it, which slows down import speed a lot.
-This can be fixed by splitting the CSV into multiple files first. For the generic CSV format, there is no way to quickly identify when a row starts and ends without reading the whole file. Therefore, Lightning by default does *not* automatically split a CSV file. However, if you are certain that the CSV input adheres to certain restrictions, you can enable the `strict-format` setting to allow Lightning to split the file into multiple 256 MB-sized chunks for parallel processing.
+This can be fixed by splitting the CSV into multiple files first. For the generic CSV format, there is no way to quickly identify when a row starts and ends without reading the whole file. Therefore, TiDB Lightning by default does *not* automatically split a CSV file. However, if you are certain that the CSV input adheres to certain restrictions, you can enable the `strict-format` setting to allow TiDB Lightning to split the file into multiple 256 MB-sized chunks for parallel processing.
```toml
[mydumper]
diff --git a/tidb-lightning/monitor-tidb-lightning.md b/tidb-lightning/monitor-tidb-lightning.md
index c64e1cbc9b2f7..e968a45304b44 100644
--- a/tidb-lightning/monitor-tidb-lightning.md
+++ b/tidb-lightning/monitor-tidb-lightning.md
@@ -47,8 +47,7 @@ scrape_configs:
[Grafana](https://grafana.com/) is a web interface to visualize Prometheus metrics as dashboards.
-If TiDB Lightning is installed using TiDB Ansible, its dashboard is already installed.
-Otherwise, the dashboard JSON can be imported from .
+If TiDB Lightning is installed using TiDB Ansible, its dashboard is already installed. Otherwise, the dashboard JSON can be imported from .
### Row 1: Speed
@@ -56,7 +55,7 @@ Otherwise, the dashboard JSON can be imported from Ctrl+C to exit. Otherwise, obtain the process ID using the `ps aux | grep tidb-lighting` command and then terminate the process using the `kill -2 «pid»` command.
@@ -170,7 +170,7 @@ With the default settings of 3 replicas, the space requirement of the target TiK
## Can TiKV Importer be restarted while TiDB Lightning is running?
-No. Importer stores some information of engines in memory. If `tikv-importer` is restarted, `tidb-lightning` will be stopped due to lost connection. At this point, you need to [destroy the failed checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) as those Importer-specific information is lost. You can restart Lightning afterwards.
+No. TiKV Importer stores some information of engines in memory. If `tikv-importer` is restarted, `tidb-lightning` will be stopped due to lost connection. At this point, you need to [destroy the failed checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md#--checkpoint-error-destroy) as those TiKV Importer-specific information is lost. You can restart TiDB Lightning afterwards.
See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb-lightning) for the correct sequence.
@@ -192,7 +192,7 @@ See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb
## Why does TiDB Lightning report the `could not find first pair, this shouldn't happen` error?
-This error occurs possibly because the number of files opened by TiDB Lightning exceeds the system limit when TiDB Lightning reads the sorted local files. In the Linux system, you can use the `ulimit -n` command to confirm whether the value of this system limit is too small. It is recommended that you adjust this value to `1000000` (`ulimit -n 1000000`) during TiDB Lightning import.
+This error occurs possibly because the number of files opened by TiDB Lightning exceeds the system limit when TiDB Lightning reads the sorted local files. In the Linux system, you can use the `ulimit -n` command to confirm whether the value of this system limit is too small. It is recommended that you adjust this value to `1000000` (`ulimit -n 1000000`) during the import.
## Import speed is too slow
@@ -256,7 +256,7 @@ Try the latest version! Maybe there is new speed improvement.
## `Checkpoint for … has invalid status:` (error code)
-**Cause**: [Checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md) is enabled, and TiDB Lightning or TiKV Importer has previously abnormally exited. To prevent accidental data corruption, Lightning will not start until the error is addressed.
+**Cause**: [Checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md) is enabled, and TiDB Lightning or TiKV Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed.
The error code is an integer smaller than 25, with possible values of 0, 3, 6, 9, 12, 14, 15, 17, 18, 20, and 21. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later step the exit occurs at.
@@ -282,7 +282,7 @@ See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#chec
2. Decrease the value of `table-concurrency` + `index-concurrency` so it is less than `max-open-engines`.
-3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires Lightning to clear the outdated checkpoints.
+3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires TiDB Lightning to clear the outdated checkpoints.
```sh
tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all
@@ -306,9 +306,9 @@ See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#chec
**Solutions**:
-1. Ensure Lightning and the source database are using the same time zone.
+1. Ensure TiDB Lightning and the source database are using the same time zone.
- When executing Lightning directly, the time zone can be forced using the `$TZ` environment variable.
+ When executing TiDB Lightning directly, the time zone can be forced using the `$TZ` environment variable.
```sh
# Manual deployment, and force Asia/Shanghai.
diff --git a/tidb-lightning/tidb-lightning-glossary.md b/tidb-lightning/tidb-lightning-glossary.md
index 805e0b8002383..c8aee759eff33 100644
--- a/tidb-lightning/tidb-lightning-glossary.md
+++ b/tidb-lightning/tidb-lightning-glossary.md
@@ -58,7 +58,7 @@ See also the [FAQs](/tidb-lightning/tidb-lightning-faq.md#checksum-failed-checks
A continuous range of source data, normally equivalent to a single file in the data source.
-When a file is too large, Lightning may split a file into multiple chunks.
+When a file is too large, TiDB Lightning might split a file into multiple chunks.
### Compaction
diff --git a/tidb-lightning/tidb-lightning-overview.md b/tidb-lightning/tidb-lightning-overview.md
index 9c773000849db..3f77f710e398e 100644
--- a/tidb-lightning/tidb-lightning-overview.md
+++ b/tidb-lightning/tidb-lightning-overview.md
@@ -8,11 +8,16 @@ aliases: ['/docs/stable/tidb-lightning/tidb-lightning-overview/','/docs/v4.0/tid
[TiDB Lightning](https://github.com/pingcap/tidb-lightning) is a tool used for fast full import of large amounts of data into a TiDB cluster. You can download TiDB Lightning from [here](/download-ecosystem-tools.md#tidb-lightning).
-Currently, TiDB Lightning supports reading SQL dump exported via Dumpling or CSV data source. You can use it in the following two scenarios:
+Currently, TiDB Lightning can mainly be used in the following two scenarios:
- Importing **large amounts** of **new** data **quickly**
- Restore all backup data
+Currently, TiDB Lightning supports:
+
+- The data source of the [Dumpling](/dumpling-overview.md), CSV or [Amazon Aurora Parquet](/migrate-from-aurora-using-lightning.md) exported formats.
+- Reading data from a local disk or from the Amazon S3 storage. For details, see [External Storages](/br/backup-and-restore-storages.md).
+
## TiDB Lightning architecture

diff --git a/tidb-troubleshooting-map.md b/tidb-troubleshooting-map.md
index e0126b7af9155..5fbee8f635655 100644
--- a/tidb-troubleshooting-map.md
+++ b/tidb-troubleshooting-map.md
@@ -526,7 +526,7 @@ Check the specific cause for busy by viewing the monitor **Grafana** -> **TiKV**
- 6.3.4 `Checkpoint for … has invalid status:(error code)`
- - Cause: Checkpoint is enabled, and Lightning/Importer has previously abnormally exited. To prevent accidental data corruption, Lightning will not start until the error is addressed. The error code is an integer less than 25, with possible values as `0, 3, 6, 9, 12, 14, 15, 17, 18, 20 and 21`. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later the exit occurs.
+ - Cause: Checkpoint is enabled, and Lightning/Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed. The error code is an integer less than 25, with possible values as `0, 3, 6, 9, 12, 14, 15, 17, 18, 20 and 21`. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later the exit occurs.
- Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#checkpoint-for--has-invalid-status-error-code).
diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md
index 1bf8f9168715f..2474bd665c622 100644
--- a/tiflash/use-tiflash.md
+++ b/tiflash/use-tiflash.md
@@ -56,7 +56,7 @@ ALTER TABLE `tpch50`.`lineitem` SET TIFLASH REPLICA 0
* For versions earlier than v4.0.6, if you create the TiFlash replica before using TiDB Lightning to import the data, the data import will fail. You must import data to the table before creating the TiFlash replica for the table.
-* If TiDB and TiDB Lightning are both v4.0.6 or later, no matter a table has TiFlash replica(s) or not, you can import data to that table using TiDB Lightning. Note that this might slow the TiDB Lightning procedure, which depends on the NIC bandwidth on the lightning host, the CPU and disk load of the TiFlash node, and the number of TiFlash replicas.
+* If TiDB and TiDB Lightning are both v4.0.6 or later, no matter a table has TiFlash replica(s) or not, you can import data to that table using TiDB Lightning. Note that this might slow the TiDB Lightning procedure, which depends on the NIC bandwidth on the TiDB Lightning host, the CPU and disk load of the TiFlash node, and the number of TiFlash replicas.
* It is recommended that you do not replicate more than 1,000 tables because this lowers the PD scheduling performance. This limit will be removed in later versions.