From 255b11923de6eb4d1b06bb96c017ad181dd2071e Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 10:45:28 +0800 Subject: [PATCH 01/12] TOC: add relay log entry --- TOC.md | 1 + 1 file changed, 1 insertion(+) diff --git a/TOC.md b/TOC.md index 196da5ad49213..1fefc28f970fd 100644 --- a/TOC.md +++ b/TOC.md @@ -304,6 +304,7 @@ - [Upgrade](/reference/tidb-binlog/upgrade.md) - [Reparo](/reference/tidb-binlog/reparo.md) - [Binlog Slave Client](/reference/tidb-binlog/binlog-slave-client.md) + - [TiDB Binlog Relay Log](/reference/tidb-binlog/relay-log.md) - [Glossary](/reference/tidb-binlog/glossary.md) + Troubleshoot - [Troubleshooting](/reference/tidb-binlog/troubleshoot/binlog.md) From 5ffcb76e2063c453c764fa3527d6b45c842233c6 Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 11:54:00 +0800 Subject: [PATCH 02/12] reference/tidb-binlog: update deploy --- reference/tidb-binlog/deploy.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/reference/tidb-binlog/deploy.md b/reference/tidb-binlog/deploy.md index ada46e212c906..63f0eb065ab0a 100644 --- a/reference/tidb-binlog/deploy.md +++ b/reference/tidb-binlog/deploy.md @@ -551,6 +551,13 @@ The following part shows how to use Pump and Drainer based on the nodes above. # replicate-do-db = ["~^b.*","s1"] + # [syncer.relay] + # It saves the TOC of relay log. It is not enabled if the value is empty. + # The configuration only comes to effect if the downstream service is TiDB or MySQL. + # log-dir = "" + # the maximum size of each file + # max-file-size = 10485760 + # [[syncer.replicate-do-table]] # db-name ="test" # tbl-name = "log" From 04b75026d007cee26d9bf95cf377d90de1caf08f Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 12:00:28 +0800 Subject: [PATCH 03/12] reference/tidb-binlog: add relay log in overview The commit only include the update that applies to dev, v3.1 and v3.0. Other updates that does not apply to v3.1 and v3.0 will be made in other PRs. --- reference/tidb-binlog/overview.md | 1 + 1 file changed, 1 insertion(+) diff --git a/reference/tidb-binlog/overview.md b/reference/tidb-binlog/overview.md index 36cd4738dd226..6a340bde1fa85 100644 --- a/reference/tidb-binlog/overview.md +++ b/reference/tidb-binlog/overview.md @@ -47,6 +47,7 @@ The TiDB Binlog cluster is composed of Pump and Drainer. * TiDB uses the built-in Pump Client to send the binlog to each Pump * Pump stores binlogs and sends the binlogs to Drainer in order * Drainer reads binlogs of each Pump, merges and sorts the binlogs, and sends the binlogs downstream +* Drainer supports [relay log](/reference/tidb-binlog/relay-log.md). It ensures the consistence of downstream clusters by relay log. ## Notes From dfd71cf439c6d52500e16bfe08bbf6a8ac5ef1fc Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 14:56:33 +0800 Subject: [PATCH 04/12] reference/tidb-binlog: add relay log doc --- reference/tidb-binlog/deploy.md | 4 +- reference/tidb-binlog/relay-log.md | 60 ++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 2 deletions(-) create mode 100644 reference/tidb-binlog/relay-log.md diff --git a/reference/tidb-binlog/deploy.md b/reference/tidb-binlog/deploy.md index 63f0eb065ab0a..ec867fbfebbb7 100644 --- a/reference/tidb-binlog/deploy.md +++ b/reference/tidb-binlog/deploy.md @@ -552,8 +552,8 @@ The following part shows how to use Pump and Drainer based on the nodes above. # replicate-do-db = ["~^b.*","s1"] # [syncer.relay] - # It saves the TOC of relay log. It is not enabled if the value is empty. - # The configuration only comes to effect if the downstream service is TiDB or MySQL. + # It saves the catalog of relay log. Relay log is not enabled if the value is empty. + # The configuration only comes to effect if the downstream is TiDB or MySQL. # log-dir = "" # the maximum size of each file # max-file-size = 10485760 diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md new file mode 100644 index 0000000000000..9d1f5b7b58bc6 --- /dev/null +++ b/reference/tidb-binlog/relay-log.md @@ -0,0 +1,60 @@ +--- +title: TiDB Binlog Relay Log +category: reference +aliases: ['/docs-cn/dev/reference/tools/tidb-binlog/relay-log/'] +--- + +# TiDB Binlog Relay Log + +When replicating binlogs, Drainer splits transactions from the upstream and replicates the split transactions concurrently to the downstream. + +In extreme cases where the upstream clusters is not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) may be in the intermediate states with inconsistent data. In such cases, Drainer can use relay log to make sure the downstream clusters are in a consistent state. + +## Consistent state during Drainer replication + +The downstream clusters reaching consistency means the data of the downstream clusters are the same as the snapshot of the upstream which sets `tidb_snapshot = ts`. + +The checkpoint consistency means Drainer checkpoint saves the consistent state of replication by `consistent`. When Drainer runs, `consistent` is `false`. After Drainer exits normally, `consistent` is set to `true`. + +You can query the downstream table of checkpoint as follows: + +``` +mysql> select * from tidb_binlog.checkpoint; ++---------------------+----------------------------------------------------------------+ +| clusterID | checkPoint | ++---------------------+----------------------------------------------------------------+ +| 6791641053252586769 | {"consistent":false,"commitTS":414529105591271429,"ts-map":{}} | ++---------------------+----------------------------------------------------------------+ +``` + +## Implementation principles + +After Drainer enables the relay log, it first writes the binlog events to the disks and then replicates the events to the downstream clusters. + +If the upstream clusters are not available, Drainer can restore the downstream clusters to a consistent state by reading the relay log. + +> **Note:** +> +> If the relay log data is lost at the same time, this method does not work, but its incidence is very low. +> Besides, you can use the Network File System to ensure data security of the relay log. + +### Trigger scenarios of Drainer consuming binlog from relay log + +If Drainer fails to connect to the Placement Drivers (PD) of the upstream clusters when Drainer is started, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process set the checkpoint `status` to `0` and then exit. + +### GC mechanism of relay log + +While Drainer is running, if it confirms that the whole data of a relay log file has been successfully replicated to the downstream, the file is deleted immediately. Therefore, the relay log does not occupy too much space. If the size of a relay log file reaches 10MB (by default), the file is split, and data is written in the new relay log file. + +## Configuration + +To enable the relay log, add the following configuration in Drainer: + +{{< copyable "" >}} + +``` +[syncer.relay] +# It saves the catalog of relay log. Relay log is not enabled if the value is empty. +# The configuration only comes to effect if the downstream is TiDB or MySQL. +log-dir = "/dir/to/save/log" +``` From fb9965ea83282b07eda122ac4c92626717c1802d Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 15:03:33 +0800 Subject: [PATCH 05/12] fix typos --- reference/tidb-binlog/deploy.md | 2 +- reference/tidb-binlog/relay-log.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/reference/tidb-binlog/deploy.md b/reference/tidb-binlog/deploy.md index ec867fbfebbb7..cb2938caa7fa4 100644 --- a/reference/tidb-binlog/deploy.md +++ b/reference/tidb-binlog/deploy.md @@ -552,7 +552,7 @@ The following part shows how to use Pump and Drainer based on the nodes above. # replicate-do-db = ["~^b.*","s1"] # [syncer.relay] - # It saves the catalog of relay log. Relay log is not enabled if the value is empty. + # It saves the catalog of the relay log. The relay log is not enabled if the value is empty. # The configuration only comes to effect if the downstream is TiDB or MySQL. # log-dir = "" # the maximum size of each file diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index 9d1f5b7b58bc6..4906ce91ae913 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -8,7 +8,7 @@ aliases: ['/docs-cn/dev/reference/tools/tidb-binlog/relay-log/'] When replicating binlogs, Drainer splits transactions from the upstream and replicates the split transactions concurrently to the downstream. -In extreme cases where the upstream clusters is not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) may be in the intermediate states with inconsistent data. In such cases, Drainer can use relay log to make sure the downstream clusters are in a consistent state. +In extreme cases where the upstream clusters are not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) may be in the intermediate states with inconsistent data. In such cases, Drainer can use the relay log to make sure the downstream clusters are in a consistent state. ## Consistent state during Drainer replication @@ -38,9 +38,9 @@ If the upstream clusters are not available, Drainer can restore the downstream c > If the relay log data is lost at the same time, this method does not work, but its incidence is very low. > Besides, you can use the Network File System to ensure data security of the relay log. -### Trigger scenarios of Drainer consuming binlog from relay log +### Trigger scenarios of Drainer consuming binlogs from the relay log -If Drainer fails to connect to the Placement Drivers (PD) of the upstream clusters when Drainer is started, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process set the checkpoint `status` to `0` and then exit. +If Drainer fails to connect to the Placement Drivers (PD) of the upstream clusters when Drainer is started, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `status` to `0` and then exit. ### GC mechanism of relay log @@ -54,7 +54,7 @@ To enable the relay log, add the following configuration in Drainer: ``` [syncer.relay] -# It saves the catalog of relay log. Relay log is not enabled if the value is empty. +# It saves the catalog of the relay log. The relay log is not enabled if the value is empty. # The configuration only comes to effect if the downstream is TiDB or MySQL. log-dir = "/dir/to/save/log" ``` From 765b45622fcded3fea9bb45d22487646bf958afa Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 15:29:04 +0800 Subject: [PATCH 06/12] ensure term consistency --- reference/tidb-binlog/overview.md | 2 +- reference/tidb-binlog/relay-log.md | 16 +++++++++------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/reference/tidb-binlog/overview.md b/reference/tidb-binlog/overview.md index 6a340bde1fa85..4bd2b26ca8597 100644 --- a/reference/tidb-binlog/overview.md +++ b/reference/tidb-binlog/overview.md @@ -47,7 +47,7 @@ The TiDB Binlog cluster is composed of Pump and Drainer. * TiDB uses the built-in Pump Client to send the binlog to each Pump * Pump stores binlogs and sends the binlogs to Drainer in order * Drainer reads binlogs of each Pump, merges and sorts the binlogs, and sends the binlogs downstream -* Drainer supports [relay log](/reference/tidb-binlog/relay-log.md). It ensures the consistence of downstream clusters by relay log. +* Drainer supports [relay log](/reference/tidb-binlog/relay-log.md). By the relay log, Drainer ensures that the downstream clusters are in a consistent state. ## Notes diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index 4906ce91ae913..1a99bce243493 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -8,15 +8,15 @@ aliases: ['/docs-cn/dev/reference/tools/tidb-binlog/relay-log/'] When replicating binlogs, Drainer splits transactions from the upstream and replicates the split transactions concurrently to the downstream. -In extreme cases where the upstream clusters are not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) may be in the intermediate states with inconsistent data. In such cases, Drainer can use the relay log to make sure the downstream clusters are in a consistent state. +In extreme cases where the upstream clusters are not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) may be in the intermediate states with inconsistent data. In such cases, Drainer can use the relay log to ensure that the downstream clusters are in a consistent state. ## Consistent state during Drainer replication -The downstream clusters reaching consistency means the data of the downstream clusters are the same as the snapshot of the upstream which sets `tidb_snapshot = ts`. +The downstream clusters reaching a consistent state means the data of the downstream clusters are the same as the snapshot of the upstream which sets `tidb_snapshot = ts`. -The checkpoint consistency means Drainer checkpoint saves the consistent state of replication by `consistent`. When Drainer runs, `consistent` is `false`. After Drainer exits normally, `consistent` is set to `true`. +The checkpoint consistency means Drainer checkpoint saves the consistent state of replication in `consistent`. When Drainer runs, `consistent` is `false`. After Drainer exits normally, `consistent` is set to `true`. -You can query the downstream table of checkpoint as follows: +You can query the downstream checkpoint table as follows: ``` mysql> select * from tidb_binlog.checkpoint; @@ -38,13 +38,15 @@ If the upstream clusters are not available, Drainer can restore the downstream c > If the relay log data is lost at the same time, this method does not work, but its incidence is very low. > Besides, you can use the Network File System to ensure data security of the relay log. -### Trigger scenarios of Drainer consuming binlogs from the relay log +### Trigger scenarios where Drainer consumes binlogs from the relay log -If Drainer fails to connect to the Placement Drivers (PD) of the upstream clusters when Drainer is started, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `status` to `0` and then exit. +Where Drainer is started, if it fails to connect to the Placement Drivers (PD) of the upstream clusters, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `status` to `0` and then exits. ### GC mechanism of relay log -While Drainer is running, if it confirms that the whole data of a relay log file has been successfully replicated to the downstream, the file is deleted immediately. Therefore, the relay log does not occupy too much space. If the size of a relay log file reaches 10MB (by default), the file is split, and data is written in the new relay log file. +While Drainer is running, if it confirms that the whole data of a relay log file has been successfully replicated to the downstream, the file is deleted immediately. Therefore, the relay log does not occupy too much space. + +If the size of a relay log file reaches 10MB (by default), the file is split, and data is written in the new relay log file. ## Configuration From c6d501b215a6ad077869b6a75f956fc7cd3c1ad6 Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 16:04:14 +0800 Subject: [PATCH 07/12] fix typo --- reference/tidb-binlog/relay-log.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index 1a99bce243493..b838ff3a90569 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -1,7 +1,7 @@ --- title: TiDB Binlog Relay Log category: reference -aliases: ['/docs-cn/dev/reference/tools/tidb-binlog/relay-log/'] +aliases: ['/docs/dev/reference/tools/tidb-binlog/relay-log/'] --- # TiDB Binlog Relay Log From 59cac091f2e4422c6e960ec004f96728413f5f1e Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 16:47:10 +0800 Subject: [PATCH 08/12] delete alias from relay-log --- reference/tidb-binlog/relay-log.md | 1 - 1 file changed, 1 deletion(-) diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index b838ff3a90569..a6c929eb1d3e9 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -1,7 +1,6 @@ --- title: TiDB Binlog Relay Log category: reference -aliases: ['/docs/dev/reference/tools/tidb-binlog/relay-log/'] --- # TiDB Binlog Relay Log From 24779248eb985d419f97c14dfc444fbe4856d84e Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 27 Feb 2020 18:37:47 +0800 Subject: [PATCH 09/12] minor update Align with this PR: https://github.com/pingcap/docs-cn/pull/2308 --- reference/tidb-binlog/relay-log.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index a6c929eb1d3e9..205432f9c793f 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -39,7 +39,7 @@ If the upstream clusters are not available, Drainer can restore the downstream c ### Trigger scenarios where Drainer consumes binlogs from the relay log -Where Drainer is started, if it fails to connect to the Placement Drivers (PD) of the upstream clusters, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `status` to `0` and then exits. +Where Drainer is started, if it fails to connect to the Placement Drivers (PD) of the upstream clusters, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `consistent` to `true` and then exits. ### GC mechanism of relay log From 973e2bde80e3a9aec6ed1957bb7c51467f9579a8 Mon Sep 17 00:00:00 2001 From: Ran Date: Mon, 9 Mar 2020 09:56:36 +0800 Subject: [PATCH 10/12] Apply suggestions from code review Co-Authored-By: Lilian Lee --- reference/tidb-binlog/deploy.md | 2 +- reference/tidb-binlog/relay-log.md | 17 +++++++++-------- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/reference/tidb-binlog/deploy.md b/reference/tidb-binlog/deploy.md index cb2938caa7fa4..97deb8f230c11 100644 --- a/reference/tidb-binlog/deploy.md +++ b/reference/tidb-binlog/deploy.md @@ -552,7 +552,7 @@ The following part shows how to use Pump and Drainer based on the nodes above. # replicate-do-db = ["~^b.*","s1"] # [syncer.relay] - # It saves the catalog of the relay log. The relay log is not enabled if the value is empty. + # It saves the directory of the relay log. The relay log is not enabled if the value is empty. # The configuration only comes to effect if the downstream is TiDB or MySQL. # log-dir = "" # the maximum size of each file diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index 205432f9c793f..14117eabb1a7b 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -7,7 +7,7 @@ category: reference When replicating binlogs, Drainer splits transactions from the upstream and replicates the split transactions concurrently to the downstream. -In extreme cases where the upstream clusters are not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) may be in the intermediate states with inconsistent data. In such cases, Drainer can use the relay log to ensure that the downstream clusters are in a consistent state. +In extreme cases where the upstream clusters are not available and Drainer exits abnormally, the downstream clusters (MySQL or TiDB) might be in the intermediate states with inconsistent data. In such cases, Drainer can use the relay log to ensure that the downstream clusters are in a consistent state. ## Consistent state during Drainer replication @@ -17,8 +17,10 @@ The checkpoint consistency means Drainer checkpoint saves the consistent state o You can query the downstream checkpoint table as follows: -``` -mysql> select * from tidb_binlog.checkpoint; +{{< copyable "sql" >}} + +```sql +select * from tidb_binlog.checkpoint; +---------------------+----------------------------------------------------------------+ | clusterID | checkPoint | +---------------------+----------------------------------------------------------------+ @@ -34,18 +36,17 @@ If the upstream clusters are not available, Drainer can restore the downstream c > **Note:** > -> If the relay log data is lost at the same time, this method does not work, but its incidence is very low. -> Besides, you can use the Network File System to ensure data security of the relay log. +> If the relay log data is lost at the same time, this method does not work, but its incidence is very low. In addition, you can use the Network File System to ensure data safety of the relay log. ### Trigger scenarios where Drainer consumes binlogs from the relay log -Where Drainer is started, if it fails to connect to the Placement Drivers (PD) of the upstream clusters, and if it detects that `consistent = false` in checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `consistent` to `true` and then exits. +When Drainer is started, if it fails to connect to the Placement Driver (PD) of the upstream clusters, and it detects that `consistent = false` in the checkpoint, Drainer will try to read the relay log, and restore the downstream clusters to a consistent state. After that, the Drainer process sets the checkpoint `consistent` to `true` and then exits. ### GC mechanism of relay log While Drainer is running, if it confirms that the whole data of a relay log file has been successfully replicated to the downstream, the file is deleted immediately. Therefore, the relay log does not occupy too much space. -If the size of a relay log file reaches 10MB (by default), the file is split, and data is written in the new relay log file. +If the size of a relay log file reaches 10MB (by default), the file is split, and data is written into a new relay log file. ## Configuration @@ -55,7 +56,7 @@ To enable the relay log, add the following configuration in Drainer: ``` [syncer.relay] -# It saves the catalog of the relay log. The relay log is not enabled if the value is empty. +# It saves the directory of the relay log. The relay log is not enabled if the value is empty. # The configuration only comes to effect if the downstream is TiDB or MySQL. log-dir = "/dir/to/save/log" ``` From 678bdbbf62a3617c00f39ba8c4775f6ac6827e6c Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Thu, 12 Mar 2020 11:38:10 +0800 Subject: [PATCH 11/12] tidb-binlog: update sql code block format --- reference/tidb-binlog/relay-log.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index 14117eabb1a7b..318849202f307 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -21,6 +21,9 @@ You can query the downstream checkpoint table as follows: ```sql select * from tidb_binlog.checkpoint; +``` + +``` +---------------------+----------------------------------------------------------------+ | clusterID | checkPoint | +---------------------+----------------------------------------------------------------+ From 7b3b23e8d82de7b9dabf418f6c8a25954eae70a3 Mon Sep 17 00:00:00 2001 From: Ran Date: Thu, 12 Mar 2020 23:50:29 +0800 Subject: [PATCH 12/12] Add one-sentence summary to relay log --- reference/tidb-binlog/relay-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/reference/tidb-binlog/relay-log.md b/reference/tidb-binlog/relay-log.md index 318849202f307..08cb6e7e2e3d9 100644 --- a/reference/tidb-binlog/relay-log.md +++ b/reference/tidb-binlog/relay-log.md @@ -1,5 +1,6 @@ --- title: TiDB Binlog Relay Log +summary: Learn how to use relay log to maintain data consistency in extreme cases. category: reference ---