From e40564b6ffdddd7413764c3ac08e3cb249b3b91c Mon Sep 17 00:00:00 2001 From: Keke Yi <40977455+yikeke@users.noreply.github.com> Date: Fri, 31 Jul 2020 16:44:19 +0800 Subject: [PATCH 1/3] cherry pick #3453 to release-4.0 Signed-off-by: ti-srebot --- TOC.md | 7 +- ...up-and-restore-using-dumpling-lightning.md | 86 ++++++ ...up-and-restore-using-mydumper-lightning.md | 20 +- br/backup-and-restore-tool.md | 2 +- download-ecosystem-tools.md | 14 + dumpling-overview.md | 255 ++++++++++++++++++ ecosystem-tool-user-case.md | 6 +- ecosystem-tool-user-guide.md | 4 +- faq/deploy-and-maintain-faq.md | 2 +- migrate-from-mysql-mydumper-files.md | 2 +- mydumper-overview.md | 4 + table-filter.md | 2 +- 12 files changed, 382 insertions(+), 22 deletions(-) create mode 100644 backup-and-restore-using-dumpling-lightning.md create mode 100644 dumpling-overview.md diff --git a/TOC.md b/TOC.md index 7c1f680415b56..c43d2c9ad2e98 100644 --- a/TOC.md +++ b/TOC.md @@ -60,12 +60,12 @@ + [Use TiDB Ansible](/scale-tidb-using-ansible.md) + [Use TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/scale-a-tidb-cluster) + Backup and Restore - + [Use Mydumper and TiDB Lightning](/backup-and-restore-using-mydumper-lightning.md) - + [Use Dumpling for Export or Backup](/export-or-backup-using-dumpling.md) - + Use BR Tool + + Use BR Tool (Recommended) + [Use BR Tool](/br/backup-and-restore-tool.md) + [BR Use Cases](/br/backup-and-restore-use-cases.md) + [BR storages](/br/backup-and-restore-storages.md) + + [Use Dumpling and TiDB Lightning (Recommended)](/backup-and-restore-using-dumpling-lightning.md) + + [Use Mydumper and TiDB Lightning](/backup-and-restore-using-mydumper-lightning.md) + [Read Historical Data](/read-historical-data.md) + [Configure Time Zone](/configure-time-zone.md) + [Daily Checklist](/daily-check.md) @@ -184,6 +184,7 @@ + [FAQ](/tidb-lightning/tidb-lightning-faq.md) + [Glossary](/tidb-lightning/tidb-lightning-glossary.md) + [TiCDC](/ticdc/ticdc-overview.md) + + [Dumpling](/dumpling-overview.md) + sync-diff-inspector + [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md) + [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md) diff --git a/backup-and-restore-using-dumpling-lightning.md b/backup-and-restore-using-dumpling-lightning.md new file mode 100644 index 0000000000000..ecdb947074526 --- /dev/null +++ b/backup-and-restore-using-dumpling-lightning.md @@ -0,0 +1,86 @@ +--- +title: Use Dumpling and TiDB Lightning for Data Backup and Restoration +summary: Introduce how to use Dumpling and TiDB Lightning to backup and restore full data of TiDB. +aliases: ['/docs-cn/dev/export-or-backup-using-dumpling/','/zh/tidb/dev/export-or-backup-using-dumpling'] +--- + +# Use Dumpling and TiDB Lightning for Data Backup and Restoration + +This document introduces in detail how to use Dumpling and TiDB Lightning to backup and restore full data of TiDB. For incremental backup and replication to downstream, refer to [TiDB Binlog](/tidb-binlog/tidb-binlog-overview.md). + +Suppose that the TiDB server information is as follows: + +|Server Name|Server Address|Port|User|Password| +|----|-------|----|----|--------| +|TiDB|127.0.0.1|4000|root|*| + +Use the following tools for data backup and restoration: + +- [Dumpling](/dumpling-overview.md): to export data from TiDB +- [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md): to import data into TiDB + +## Best practices for full backup and restoration using Dumpling/TiDB Lightning + +To quickly backup and restore data (especially large amounts of data), refer to the following recommendations: + +* Keep the exported data file as small as possible. It is recommended to use the `-F` option of Dumpling to set the file size. If you use TiDB Lightning to restore data, it is recommended that you set the value of `-F` to `256m`. +* If some of the exported tables have many rows, you can enable concurrency in the table by setting the `-r` option. + +## Backup data from TiDB + +Use the following `dumpling` command to backup data from TiDB. + +{{< copyable "shell-regular" >}} + +```bash +./bin/dumpling -h 127.0.0.1 -P 4000 -u root -t 32 -F 256m -T test.t1 -T test.t2 -o ./var/test +``` + +In this command: + +- `-T test.t1 -T test.t2` means that only the two tables `test`.`t1` and `test`.`t2` are exported. For more methods to filter exported data, refer to [Filter exported data](/dumpling-overview.md#filter-the-exported-data). +- `-t 32` means that 32 threads are used to export the data. +- `-F 256m` means that a table is partitioned into chunks, and one chunk is 256MB. + +Starting from v4.0.0, Dumpling can automatically extends the GC time if it can access the PD address of the TiDB cluster. But for TiDB earlier than v4.0.0, you need to manually modify the GC time. Otherwise, you might bump into the following error: + +```log +Could not read data from testSchema.testTable: GC life time is shorter than transaction duration, transaction starts at 2019-08-05 21:10:01.451 +0800 CST, GC safe point is 2019-08-05 21:14:53.801 +0800 CST +``` + +The steps to manually modify the GC time are as follows: + +1. Before executing the `dumpling` command, query the [GC](/garbage-collection-overview.md) value of the TiDB cluster and execute the following statement in the MySQL client to adjust it to a suitable value: + + {{< copyable "sql" >}} + + ```sql + SELECT * FROM mysql.tidb WHERE VARIABLE_NAME = 'tikv_gc_life_time'; + ``` + + ```sql + +-----------------------+------------------------------------------------------------------------------------------------+ + | VARIABLE_NAME | VARIABLE_VALUE | + +-----------------------+------------------------------------------------------------------------------------------------+ + | tikv_gc_life_time | 10m0s | + +-----------------------+------------------------------------------------------------------------------------------------+ + 1 rows in set (0.02 sec) + ``` + + {{< copyable "sql" >}} + + ```sql + update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time'; + ``` + +2. After executing the `dumpling` command, restore the GC value of the TiDB cluster to the initial value in step 1: + + {{< copyable "sql" >}} + + ```sql + update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time'; + ``` + +## Restore data into TiDB + +To restore data into TiDB, use TiDB Lightning to import the exported data. See [TiDB Lightning Tutorial](/tidb-lightning/tidb-lightning-tidb-backend.md). diff --git a/backup-and-restore-using-mydumper-lightning.md b/backup-and-restore-using-mydumper-lightning.md index a2447e6f7fb3e..0da0b0adb271d 100644 --- a/backup-and-restore-using-mydumper-lightning.md +++ b/backup-and-restore-using-mydumper-lightning.md @@ -7,9 +7,9 @@ aliases: ['/docs/stable/backup-and-restore-using-mydumper-lightning/','/docs/v4. This document describes how to perform full backup and restoration of the TiDB data using Mydumper and TiDB Lightning. For incremental backup and restoration, refer to [TiDB Binlog](/tidb-binlog/tidb-binlog-overview.md). -Suppose that the TiDB service information is as follows: +Suppose that the TiDB server information is as follows: -|Name|Address|Port|User|Password| +|Server Name|Server Address|Port|User|Password| |:----|:-------|:----|:----|:--------| |TiDB|127.0.0.1|4000|root|*| @@ -32,11 +32,11 @@ Use [Mydumper](/mydumper-overview.md) to export data from TiDB and use [TiDB Lig To quickly backup and restore data (especially large amounts of data), refer to the following recommendations: -* Keep the exported data file as small as possible. It is recommended to use the `-F` parameter to set the file size. If you use TiDB Lightning to restore data, it is recommended that you set the value of `-F` to `256` (MB). If you use `loader` for restoration, it is recommended to set the value to `64` (MB). +* Keep the exported data file as small as possible. It is recommended to use the `-F` option of Mydumper to set the file size. If you use TiDB Lightning to restore data, it is recommended that you set the value of `-F` to `256` (MB). If you use `loader` for restoration, it is recommended to set the value to `64` (MB). ## Backup data from TiDB -Use `mydumper` to backup data from TiDB. +Use the following `mydumper` command to backup data from TiDB: {{< copyable "shell-regular" >}} @@ -44,13 +44,13 @@ Use `mydumper` to backup data from TiDB. ./bin/mydumper -h 127.0.0.1 -P 4000 -u root -t 32 -F 256 -B test -T t1,t2 --skip-tz-utc -o ./var/test ``` -In this command, +In this command: -`-B test` means that the data is exported from the `test` database. -`-T t1,t2` means that only the `t1` and `t2` tables are exported. -`-t 32` means that 32 threads are used to export the data. -`-F 256` means that a table is partitioned into chunks, and one chunk is 256MB. -`--skip-tz-utc` means to ignore the inconsistency of time zone setting between MySQL and the data exporting machine and to disable automatic conversion. +- `-B test` means that the data is exported from the `test` database. +- `-T t1,t2` means that only the `t1` and `t2` tables are exported. +- `-t 32` means that 32 threads are used to export the data. +- `-F 256` means that a table is partitioned into chunks, and one chunk is 256MB. +- `--skip-tz-utc` means to ignore the inconsistency of time zone setting between MySQL and the data exporting machine and to disable automatic conversion. If `mydumper` returns the following error: diff --git a/br/backup-and-restore-tool.md b/br/backup-and-restore-tool.md index ca5ccfd97d55e..ef0d4c7f23c59 100644 --- a/br/backup-and-restore-tool.md +++ b/br/backup-and-restore-tool.md @@ -6,7 +6,7 @@ aliases: ['/docs/stable/br/backup-and-restore-tool/','/docs/v4.0/br/backup-and-r # Use BR to Back up and Restore Data -[Backup & Restore](http://github.com/pingcap/br) (BR) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with [`dumpling`](/export-or-backup-using-dumpling.md) and [`mydumper`/`loader`](/backup-and-restore-using-mydumper-lightning.md), BR is more suitable for scenarios of huge data volume. This document describes the BR command line, detailed use examples, best practices, restrictions, and introduces the implementation principles of BR. +[Backup & Restore](http://github.com/pingcap/br) (BR) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with [`dumpling`](/backup-and-restore-using-dumpling-lightning.md) and [`mydumper`/`loader`](/backup-and-restore-using-mydumper-lightning.md), BR is more suitable for scenarios of huge data volume. This document describes the BR command line, detailed use examples, best practices, restrictions, and introduces the implementation principles of BR. ## Usage restrictions diff --git a/download-ecosystem-tools.md b/download-ecosystem-tools.md index 4d230ada83144..40a3d22e72ca2 100644 --- a/download-ecosystem-tools.md +++ b/download-ecosystem-tools.md @@ -59,6 +59,20 @@ Download [DM](https://docs.pingcap.com/tidb-data-migration/v1.0/overview) by usi > > `{version}` in the above download link indicates the version number of DM. For example, the download link for `v1.0.1` is `https://download.pingcap.org/dm-v1.0.1-linux-amd64.tar.gz`. You can check the published DM versions in the [DM Release](https://github.com/pingcap/dm/releases) page. +## Dumpling + +Download [Dumpling](/dumpling-overview.md) from the links below: + +| Installation package | Operating system | Architecture | SHA256 checksum | +|:---|:---|:---|:---| +| `https://download.pingcap.org/tidb-toolkit-{version}-linux-amd64.tar.gz` | Linux | amd64 | `https://download.pingcap.org/tidb-toolkit-{version}-linux-amd64.sha256` | + +> **Note:** +> +> The `{version}` in the download link is the version number of Dumpling. For example, the link for downloading the `v4.0.2` version of Dumpling is `https://download.pingcap.org/tidb-toolkit-v4.0.2-linux-amd64.tar.gz`. You can view the currently released versions in [Dumpling Releases](https://github.com/pingcap/dumpling/releases). +> +> Dumpling supports arm64 linux. You can replace `amd64` in the download link with `arm64`, which means the `arm64` version of Dumpling. + ## Syncer, Loader, and Mydumper If you want to download the latest version of [Syncer](/syncer-overview.md), [Loader](/loader-overview.md), or [Mydumper](/mydumper-overview.md), directly download the tidb-enterprise-tools package, because all these tools are included in this package. diff --git a/dumpling-overview.md b/dumpling-overview.md new file mode 100644 index 0000000000000..ddfbc210408ae --- /dev/null +++ b/dumpling-overview.md @@ -0,0 +1,255 @@ +--- +title: Dumpling Overview +summary: Use the Dumpling tool to export data from TiDB. +--- + +# Dumpling Overview + +This document introduces the data export tool - [Dumpling](https://github.com/pingcap/dumpling). Dumpling exports data stored in TiDB/MySQL as SQL or CSV data files and can be used to make a logical full backup or export. + +For backups of SST files (key-value pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md). For real-time backups of incremental data, refer to [TiCDC](/ticdc/ticdc-overview.md). + +## Improvements of Dumpling compared with Mydumper + +1. Support exporting data in multiple formats, including SQL and CSV +2. Support the [table-filter](https://github.com/pingcap/tidb-tools/blob/master/pkg/table-filter/README.md) feature, which makes it easier to filter data +3. More optimizations are made for TiDB: + - Support configuring the memory limit of a single TiDB SQL statement + - Support automatic adjustment of TiDB GC time for TiDB v4.0.0 and above + - Use TiDB's hidden column `_tidb_rowid` to optimize the performance of concurrent data export from a single table + - For TiDB, you can set the value of [`tidb_snapshot`](/read-historical-data.md#how-tidb-reads-data-from-history-versions) to specify the time point of the data backup. This ensures the consistency of the backup, instead of using `FLUSH TABLES WITH READ LOCK` to ensure the consistency. + +## Dumpling introduction + +Dumpling is written in Go. The Github project is [pingcap/dumpling](https://github.com/pingcap/dumpling). + +For detailed usage of Dumpling, use the `--help` option or refer to [Option list of Dumpling](#option-list-of-dumpling). + +When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the `127.0.0.1:4000` host and that this TiDB instance has a root user without a password. + +Dumpling is included in the tidb-toolkit installation package and can be [download here](/download-ecosystem-tools.md#dumpling). + +## Export data from TiDB/MySQL + +### Required privileges + +- SELECT +- RELOAD +- LOCK TABLES +- REPLICATION CLIENT + +### Export to SQL files + +Dumpling exports data to SQL files by default. You can also export data to SQL files by adding the `--filetype sql` flag: + +{{< copyable "shell-regular" >}} + +```shell +dumpling \ + -u root \ + -P 4000 \ + -h 127.0.0.1 \ + --filetype sql \ + --threads 32 \ + -o /tmp/test \ + -F 256 +``` + +In the above command, `-h`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`. + +### Export to CSV files + +If Dumpling exports data to CSV files (use `--filetype csv` to export to CSV files), you can also use `--sql ` to export the records selected by the specified SQL statement. + +For example, you can export all records that match `id < 100` in `test.sbtest1` using the following command: + +{{< copyable "shell-regular" >}} + +```shell +./dumpling \ + -u root \ + -P 4000 \ + -h 127.0.0.1 \ + -o /tmp/test \ + --filetype csv \ + --sql 'select * from `test`.`sbtest1` where id < 100' +``` + +> **Note:** +> +> - Currently, the `--sql` option can be used only for exporting to CSV files. +> +> - Here you need to execute the `select * from where id <100` statement on all tables to be exported. If some tables do not have specified fields, the export fails. + +### Filter the exported data + +#### Use the `--where` option to filter data + +By default, Dumpling exports the tables of the entire database except the tables in the system databases. You can use `--where ` to select the records to be exported. + +{{< copyable "shell-regular" >}} + +```shell +./dumpling \ + -u root \ + -P 4000 \ + -h 127.0.0.1 \ + -o /tmp/test \ + --where "id < 100" +``` + +The above command exports the data that matches `id < 100` from each table. + +#### Use the `--filter` option to filter data + +Dumpling can filter specific databases or tables by specifying the table filter with the `--filter` option. The syntax of table filters is similar to that of `.gitignore`. For details, see [Table Filter](/table-filter.md). + +{{< copyable "shell-regular" >}} + +```shell +./dumpling \ + -u root \ + -P 4000 \ + -h 127.0.0.1 \ + -o /tmp/test \ + --filter "employees.*" \ + --filter "*.WorkOrder" +``` + +The above command exports all the tables in the `employees` database and the `WorkOrder` tables in all databases. + +#### Use the `-B` or `-T` option to filter data + +Dumpling can also export specific databases with the `-B` option or specific tables with the `-T` option. + +> **Note:** +> +> - The `--filter` option and the `-T` option cannot be used at the same time. +> - The `-T` option can only accept a complete form of inputs like `database-name.table-name`, and inputs with only the table name are not accepted. Example: Dumpling cannot recognize `-T WorkOrder`. + +Examples: + +- `-B employees` exports the `employees` database. +- `-T employees.WorkOrder` exports the `employees.WorkOrder` table. + +### Improve export efficiency through concurrency + +The exported file is stored in the `./export-` directory by default. Commonly used options are as follows: + +- `-o` is used to select the directory where the exported files are stored. +- `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). +- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file. When it is enabled, Dumpling enables concurrency in the table to improve the speed of exporting large tables. + +With the above options specified, Dumpling can have a higher degree of parallelism. + +### Adjust Dumpling's data consistency options + +> **Note:** +> +> In most scenarios, you do not need to adjust the default data consistency options of Dumpling. + +Dumpling uses the `--consistency ` option to control the way in which data is exported for "consistency assurance". For TiDB, data consistency is guaranteed by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` option to specify the timestamp to be backed up. You can also use the following levels of consistency: + +- `flush`: Use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency. +- `snapshot`: Get a consistent snapshot of the specified timestamp and export it. +- `lock`: Add read locks on all tables to be exported. +- `none`: No guarantee for consistency. +- `auto`: Use `flush` for MySQL and `snapshot` for TiDB. + +After everything is done, you can see the exported file in `/tmp/test`: + +{{< copyable "shell-regular" >}} + +```shell +ls -lh /tmp/test | awk '{print $5 "\t" $9}' +``` + +``` +140B metadata +66B test-schema-create.sql +300B test.sbtest1-schema.sql +190K test.sbtest1.0.sql +300B test.sbtest2-schema.sql +190K test.sbtest2.0.sql +300B test.sbtest3-schema.sql +190K test.sbtest3.0.sql +``` + +### Export historical data snapshot of TiDB + +Dumpling can export the data of a certain [tidb_snapshot](/read-historical-data.md#how-tidb-reads-data-from-history-versions) with the `--snapshot` option specified. + +The `--snapshot` option can be set to a TSO (the `Position` field output by the `SHOW MASTER STATUS` command) or a valid time of the `datetime` data type, for example: + +{{< copyable "shell-regular" >}} + +```shell +./dumpling --snapshot 417773951312461825 +./dumpling --snapshot "2020-07-02 17:12:45" +``` + +The TiDB historical data snapshots when the TSO is `417773951312461825` and the time is `2020-07-02 17:12:45` are exported. + +### TiDB GC settings when exporting a large volume of data + +When exporting data from TiDB, if the TiDB version is greater than v4.0.0 and Dumpling can access the PD address of the TiDB cluster, Dumpling automatically extends the GC time without affecting the original cluster. But for TiDB earlier than v4.0.0, you need to manually modify the GC time. + +In other scenarios, if the data size is very large, to avoid export failure due to GC during the export process, you can extend the GC time in advance: + +{{< copyable "sql" >}} + +```sql +update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time'; +``` + +After your operation is completed, set the GC time back (the default value is `10m`): + +{{< copyable "sql" >}} + +```sql +update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time'; +``` + +Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md). + +## Option list of Dumpling + +| Options | Usage | Default value | +| --------| --- | --- | +| `-V` or `--version` | Output the Dumpling version and exit directly | +| `-B` or `--database` | Export specified databases | +| `-T` or `--tables-list` | Export specified tables | +| `-f` or `--filter` | Export tables that match the filter pattern. For the filter syntax, see [table-filter](/table-filter.md). | `"\*.\*"` (export all databases or tables) | +| `--case-sensitive` | whether table-filter is case-sensitive | false (case-insensitive) | +| `-h` or `--host` | The IP address of the connected database host | "127.0.0.1" | +| `-t` or `--threads` | The number of concurrent backup threads | 4 | +| `-r` or `--rows` | Divide the table into specified rows of data (generally applicable for concurrent operations of splitting a large table into multiple files. | +| `-L` or `--logfile` | Log output address. If it is empty, the log will be output to the console | "" | +| `--loglevel` | Log level {debug,info,warn,error,dpanic,panic,fatal} | "info" | +| `--logfmt` | Log output format {text,json} | "text" | +| `-d` or `--no-data` | Do not export data (suitable for scenarios where only the schema is exported) | +| `--no-header` | Export CSV files of the tables without generating header | +| `-W` or `--no-views` | Do not export the views | true | +| `-m` or `--no-schemas` | Do not export the schema with only the data exported | +| `-s` or `--statement-size` | Control the size of the `INSERT` statements; the unit is bytes | +| `-F` or `--filesize` | The file size of the divided tables. The unit must be specified such as `128B`, `64KiB`, `32MiB`, and `1.5GiB`. | +| `--filetype` | Exported file type (csv/sql) | "sql" | +| `-o` or `--output` | Exported file path | "./export-${time}" | +| `-S` or `--sql` | Export data according to the specified SQL statement. This command does not support concurrent export. | +| `--consistency` | flush: use FTWRL before the dump
snapshot: dump the TiDB data of a specific snapshot of a TSO
lock: execute `lock tables read` on all tables to be dumped
none: dump without adding locks, which cannot guarantee consistency
auto: MySQL defaults to using flush, TiDB defaults to using snapshot | "auto" | +| `--snapshot` | Snapshot TSO; valid only when `consistency=snapshot` | +| `--where` | Specify the scope of the table backup through the `where` condition | +| `-p` or `--password` | The password of the connected database host | +| `-P` or `--port` | The port of the connected database host | 4000 | +| `-u` or `--user` | The username of the connected database host | "root" | +| `--dump-empty-database` | Export the `CREATE DATABASE` statements of the empty databases | true | +| `--ca` | The address of the certificate authority file for TLS connection | +| `--cert` | The address of the client certificate file for TLS connection | +| `--key` | The address of the client private key file for TLS connection | +| `--csv-delimiter` | Delimiter of character type variables in CSV files | '"' | +| `--csv-separator` | Separator of each value in CSV files | ',' | +| `--csv-null-value` | Representation of null values in CSV files | "\\N" | +| `--escape-backslash` | Use backslash (`\`) to escape special characters in the export file | true | +| `--output-filename-template` | The filename templates represented in the format of [golang template](https://golang.org/pkg/text/template/#hdr-Arguments)
Support the `{{.DB}}`, `{{.Table}}`, and `{{.Index}}` arguments
The three arguments represent the database name, table name, and chunk ID of the data file | '{{.DB}}.{{.Table}}.{{.Index}}' | +| `--status-addr` | Dumpling's service address, including the address for Prometheus to pull metrics and pprof debugging | ":8281" | +| `--tidb-mem-quota-query` | The memory limit of exporting SQL statements by a single line of Dumpling command, the unit is byte, and the default value is 32 GB | 34359738368 | diff --git a/ecosystem-tool-user-case.md b/ecosystem-tool-user-case.md index 08bd88929437b..3772202a78996 100644 --- a/ecosystem-tool-user-case.md +++ b/ecosystem-tool-user-case.md @@ -14,13 +14,13 @@ If you need to import the compatible CSV files exported by other tools to TiDB, ## Import full data from MySQL/Aurora -If you need to import full data from MySQL or Aurora, use [Dumpling](/export-or-backup-using-dumpling.md) first to export data as SQL dump files, and then use [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to import data into the TiDB cluster. +If you need to import full data from MySQL or Aurora, use [Dumpling](/dumpling-overview.md) first to export data as SQL dump files, and then use [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to import data into the TiDB cluster. ## Migrate data from MySQL/Aurora If you need to migrate both full data and incremental data from MySQL/Aurora, use [TiDB Data Migration](https://docs.pingcap.com/tidb-data-migration/v1.0/overview) (DM) to perform the full and incremental data migration. -If the full data volume is large (at the TB level), you can first use [Dumpling](/export-or-backup-using-dumpling.md) and [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to perform the full data migration, and then use DM to perform the incremental data migration. +If the full data volume is large (at the TB level), you can first use [Dumpling](/dumpling-overview.md) and [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to perform the full data migration, and then use DM to perform the incremental data migration. ## Back up and restore TiDB cluster @@ -30,7 +30,7 @@ In addition, BR can also be used to perform [incremental backup](/br/backup-and- ## Migrate data from TiDB -If you need to migrate data from a TiDB cluster to MySQL or to another TiDB cluster, use [Dumpling](/export-or-backup-using-dumpling.md) to export full data from TiDB as SQL dump files, and then use [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to import data to MySQL or another TiDB cluster. +If you need to migrate data from a TiDB cluster to MySQL or to another TiDB cluster, use [Dumpling](/dumpling-overview.md) to export full data from TiDB as SQL dump files, and then use [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to import data to MySQL or another TiDB cluster. If you also need to migrate incremental data, use [TiDB Binlog](/tidb-binlog/tidb-binlog-overview.md). diff --git a/ecosystem-tool-user-guide.md b/ecosystem-tool-user-guide.md index daa946e1d096e..5c147b1b712e6 100644 --- a/ecosystem-tool-user-guide.md +++ b/ecosystem-tool-user-guide.md @@ -9,7 +9,7 @@ This document introduces the functionalities of TiDB ecosystem tools and their r ## Full data export -[Dumpling](/export-or-backup-using-dumpling.md) is a tool for the logical full data export from MySQL or TiDB. +[Dumpling](/dumpling-overview.md) is a tool for the logical full data export from MySQL or TiDB. The following are the basics of Dumpling: @@ -76,7 +76,7 @@ If the data volume is below the TB level, it is recommended to migrate data from If the data volume is at the TB level, take the following steps: -1. Use [Dumpling](/export-or-backup-using-dumpling.md) to export the full data from MySQL/MariaDB. +1. Use [Dumpling](/dumpling-overview.md) to export the full data from MySQL/MariaDB. 2. Use [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to import the data exported in Step 1 to the TiDB cluster. 3. Use DM to migrate the incremental data from MySQL/MariaDB to TiDB. diff --git a/faq/deploy-and-maintain-faq.md b/faq/deploy-and-maintain-faq.md index b9395d6868675..8e6953023170e 100644 --- a/faq/deploy-and-maintain-faq.md +++ b/faq/deploy-and-maintain-faq.md @@ -513,7 +513,7 @@ TiDB is not suitable for tables of small size (such as below ten million level), #### How to back up data in TiDB? -Currently, for the backup of a large volume of data, the preferred method is using [BR](/br/backup-and-restore-tool.md). Otherwise, the recommended tool is [Dumpling](/export-or-backup-using-dumpling.md). Although the official MySQL tool `mysqldump` is also supported in TiDB to back up and restore data, its performance is worse than [BR](/br/backup-and-restore-tool.md) and it needs much more time to back up and restore large volumes of data. +Currently, for the backup of a large volume of data, the preferred method is using [BR](/br/backup-and-restore-tool.md). Otherwise, the recommended tool is [Dumpling](/backup-and-restore-using-dumpling-lightning.md). Although the official MySQL tool `mysqldump` is also supported in TiDB to back up and restore data, its performance is worse than [BR](/br/backup-and-restore-tool.md) and it needs much more time to back up and restore large volumes of data. ## Monitoring diff --git a/migrate-from-mysql-mydumper-files.md b/migrate-from-mysql-mydumper-files.md index 6b13d1de74491..4b9f09f545cec 100644 --- a/migrate-from-mysql-mydumper-files.md +++ b/migrate-from-mysql-mydumper-files.md @@ -6,7 +6,7 @@ aliases: ['/docs/stable/migrate-from-mysql-mydumper-files/','/docs/v4.0/migrate- # Migrate Data from MySQL SQL Files -This document describes how to migrate data from MySQL SQL files to TiDB using TiDB Lightning. For details on how to generate MySQL SQL files, refer to [Mydumper](/mydumper-overview.md) or [Dumpling](/export-or-backup-using-dumpling.md). +This document describes how to migrate data from MySQL SQL files to TiDB using TiDB Lightning. For details on how to generate MySQL SQL files, refer to [Mydumper](/mydumper-overview.md) or [Dumpling](/dumpling-overview.md). The data migration process described in this document uses TiDB Lightning. The steps are as follows. diff --git a/mydumper-overview.md b/mydumper-overview.md index 4e8065298a64e..01ba722659a12 100644 --- a/mydumper-overview.md +++ b/mydumper-overview.md @@ -6,6 +6,10 @@ aliases: ['/docs/stable/mydumper-overview/','/docs/v4.0/mydumper-overview/','/do # Mydumper Instructions +> **Warning:** +> +> The maintainers have stopped developing new features for Mydumper, and most of its features have been replaced by [Dumpling](/dumpling-overview.md). It is strongly recommended that you switch to Dumpling. + ## What is Mydumper? [Mydumper](https://github.com/pingcap/mydumper) is a fork project optimized for TiDB. You can use this tool for logical backups of **MySQL** or **TiDB**. diff --git a/table-filter.md b/table-filter.md index 0038f74801752..5ce306ad0c8fb 100644 --- a/table-filter.md +++ b/table-filter.md @@ -27,7 +27,7 @@ Table filters can be applied to the tools using multiple `-f` or `--filter` comm # ^~~~~~~~~~~~~~~~~~~~~~~ ``` -* [Dumpling](/export-or-backup-using-dumpling.md): +* [Dumpling](/dumpling-overview.md): {{< copyable "shell-regular" >}} From dc4c86daa89e17e80b2622d7236cb3630528ce9b Mon Sep 17 00:00:00 2001 From: yikeke Date: Fri, 31 Jul 2020 16:46:41 +0800 Subject: [PATCH 2/3] Delete export-or-backup-using-dumpling.md --- export-or-backup-using-dumpling.md | 177 ----------------------------- 1 file changed, 177 deletions(-) delete mode 100644 export-or-backup-using-dumpling.md diff --git a/export-or-backup-using-dumpling.md b/export-or-backup-using-dumpling.md deleted file mode 100644 index 7381a690907f1..0000000000000 --- a/export-or-backup-using-dumpling.md +++ /dev/null @@ -1,177 +0,0 @@ ---- -title: Export or Backup Data Using Dumpling -summary: Use the Dumpling tool to export or backup data in TiDB. -aliases: ['/docs/stable/export-or-backup-using-dumpling/','/docs/v4.0/export-or-backup-using-dumpling/'] ---- - -# Export or Backup Data Using Dumpling - -This document introduces how to use the [Dumpling](https://github.com/pingcap/dumpling) tool to export or backup data in TiDB. Dumpling exports data stored in TiDB as SQL or CSV data files and can be used to make a logical full backup or export. - -For backups of SST files (KV pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md). For real-time backups of incremental data, refer to [TiCDC](/ticdc/ticdc-overview.md). - -For detailed usage of Dumpling, use the `--help` command or refer to [Dumpling User Guide](https://github.com/pingcap/dumpling/blob/master/docs/en/user-guide.md). - -When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the `127.0.0.1:4000` host and that this TiDB instance has a root user without a password. - -## Download Dumpling - -To download the latest version of Dumpling, click the [download link](https://download.pingcap.org/dumpling-nightly-linux-amd64.tar.gz). - -## Export data from TiDB - -### Export to SQL files - -Dumpling exports data to SQL files by default. You can also export data to SQL files by adding the `--filetype sql` flag: - -{{< copyable "shell-regular" >}} - -```shell -dumpling \ - -u root \ - -P 4000 \ - -h 127.0.0.1 \ - --filetype sql \ - --threads 32 \ - -o /tmp/test \ - -F 256 -``` - -In the above command, `-h`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`. - -### Export to CSV files - -If Dumpling exports data to CSV files (use `--filetype csv` to export to CSV files), you can also use `--sql ` to export the records selected by the specified SQL statement. - -For example, you can export all records that match `id < 100` in `test.sbtest1` using the following command: - -{{< copyable "shell-regular" >}} - -```shell -./dumpling \ - -u root \ - -P 4000 \ - -h 127.0.0.1 \ - -o /tmp/test \ - --filetype csv \ - --sql 'select * from `test`.`sbtest1` where id < 100' -``` - -> **Note:** -> -> - Currently, the `--sql` option can be used only for exporting to CSV files. -> -> - Here you need to execute the `select * from where id <100` statement on all tables to be exported. If some tables do not have specified fields, the export fails. - -### Filter the exported data - -#### Use the `--where` command to filter data - -By default, Dumpling exports the tables of the entire database except the tables in the system databases. You can use `--where ` to select the records to be exported. - -{{< copyable "shell-regular" >}} - -```shell -./dumpling \ - -u root \ - -P 4000 \ - -h 127.0.0.1 \ - -o /tmp/test \ - --where "id < 100" -``` - -The above command exports the data that matches `id < 100` from each table. - -#### Use the `--filter` command to filter data - -Dumpling can filter specific databases or tables by specifying the table filter with the `--filter` command. The syntax of table filters is similar to that of `.gitignore`. For details, see [Table Filter](/table-filter.md). - -{{< copyable "shell-regular" >}} - -```shell -./dumpling \ - -u root \ - -P 4000 \ - -h 127.0.0.1 \ - -o /tmp/test \ - --filter "employees.*" \ - --filter "*.WorkOrder" -``` - -The above command exports all the tables in the `employees` database and the `WorkOrder` tables in all databases. - -#### Use the `-B` or `-T` command to filter data - -Dumpling can also export specific databases with the `-B` command or specific tables with the `-T` command. - -> **Note:** -> -> - The `--filter` command and the `-T` command cannot be used at the same time. -> -> - The `-T` command can only accept a complete form of inputs like `database-name.table-name`, and inputs with only the table name are not accepted. Example: Dumpling cannot recognize `-T WorkOrder`. - -Examples: - --`-B employees` exports the `employees` database --`-T employees.WorkOrder` exports the `employees.WorkOrder` table - -### Improve export efficiency through concurrency - -The exported file is stored in the `./export-` directory by default. Commonly used parameters are as follows: - -- `-o` is used to select the directory where the exported files are stored. -- `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). -- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file. When it is enabled, Dumpling enables concurrency in the table to improve the speed of exporting large tables. - -You can use the above parameters to provide Dumpling with a higher degree of concurrency. - -### Adjust Dumpling's data consistency options - -> **Note:** -> -> In most scenarios, you do not need to adjust the default data consistency options of Dumpling. - -Dumpling uses the `--consistency ` option to control the way in which data is exported for "consistency assurance". For TiDB, data consistency is guaranteed by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to be backed up. You can also use the following levels of consistency: - -- `flush`: Use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency. -- `snapshot`: Get a consistent snapshot of the specified timestamp and export it. -- `lock`: Add read locks on all tables to be exported. -- `none`: No guarantee for consistency. -- `auto`: Use `flush` for MySQL and `snapshot` for TiDB. - -After everything is done, you can see the exported file in `/tmp/test`: - -{{< copyable "shell-regular" >}} - -```shell -ls -lh /tmp/test | awk '{print $5 "\t" $9}' -``` - -``` -140B metadata -66B test-schema-create.sql -300B test.sbtest1-schema.sql -190K test.sbtest1.0.sql -300B test.sbtest2-schema.sql -190K test.sbtest2.0.sql -300B test.sbtest3-schema.sql -190K test.sbtest3.0.sql -``` - -In addition, if the data volume is very large, to avoid export failure due to GC during the export process, you can extend the GC time in advance: - -{{< copyable "sql" >}} - -```sql -update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time'; -``` - -After your operation is completed, set the GC time back (the default value is `10m`): - -{{< copyable "sql" >}} - -```sql -update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time'; -``` - -Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md). From 256fbf5d6e75665e81f202937ad02b6821c6115a Mon Sep 17 00:00:00 2001 From: yikeke Date: Fri, 31 Jul 2020 16:50:42 +0800 Subject: [PATCH 3/3] update aliases --- backup-and-restore-using-dumpling-lightning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/backup-and-restore-using-dumpling-lightning.md b/backup-and-restore-using-dumpling-lightning.md index ecdb947074526..833af38d0d2bc 100644 --- a/backup-and-restore-using-dumpling-lightning.md +++ b/backup-and-restore-using-dumpling-lightning.md @@ -1,7 +1,7 @@ --- title: Use Dumpling and TiDB Lightning for Data Backup and Restoration summary: Introduce how to use Dumpling and TiDB Lightning to backup and restore full data of TiDB. -aliases: ['/docs-cn/dev/export-or-backup-using-dumpling/','/zh/tidb/dev/export-or-backup-using-dumpling'] +aliases: ['/docs/stable/export-or-backup-using-dumpling/','/docs/v4.0/export-or-backup-using-dumpling/','/tidb/stable/export-or-backup-using-dumpling'] --- # Use Dumpling and TiDB Lightning for Data Backup and Restoration