From 89e1f3b384ab85ea96747e5ac1feb3aa676e05f0 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Tue, 9 Jun 2020 16:38:17 +0800 Subject: [PATCH 1/6] ticdc: add ticdc troubleshooting document --- TOC.md | 1 + ticdc/ticdc-overview.md | 2 +- ticdc/troubleshoot-ticdc.md | 57 +++++++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 ticdc/troubleshoot-ticdc.md diff --git a/TOC.md b/TOC.md index a73fdf52fd618..1e2b18dda8430 100644 --- a/TOC.md +++ b/TOC.md @@ -396,6 +396,7 @@ - [Deploy and Use TiCDC](/ticdc/deploy-ticdc.md) - [Manage TiCDC Cluster and Replication Tasks](/ticdc/manage-ticdc.md) - [Configure Sink URI](/ticdc/sink-url.md) + - [Troubleshoot TiCDC Issues](/ticdc/troubleshoot-ticdc.md) - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) - [Column and DDL Type Codes](/ticdc/column-ddl-type-codes.md) + sync-diff-inspector diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md index a65bd1d0fe54d..f2e8b1a48031b 100644 --- a/ticdc/ticdc-overview.md +++ b/ticdc/ticdc-overview.md @@ -8,7 +8,7 @@ aliases: ['/docs/dev/reference/tools/ticdc/overview/'] # TiCDC Overview > **Note:** -> +> > TiCDC is experimental. It is **not recommended** to use this feature in the production environment. [TiCDC](https://github.com/pingcap/ticdc) is a tool for replicating the incremental data of TiDB. This tool is implemented by pulling TiKV change logs. It can restore data to a consistent state with any upstream TSO, and provides [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) to support other systems to subscribe to data changes. diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md new file mode 100644 index 0000000000000..1a381e2e3def9 --- /dev/null +++ b/ticdc/troubleshoot-ticdc.md @@ -0,0 +1,57 @@ +--- +title: Troubleshoot TiCDC Issues +summary: Learn how to troubleshoot TiCDC issues. +category: reference +--- + +# Troubleshoot TiCDC Issues + +This document introduces the common issues and errors that you might often encountered when using TiCDC and provides the corresponding maintenance and troubleshooting methods. + +## How to choose `start-ts` when starting a task + +The `start-ts` of a replication task corresponds to a Time Sharing Option (TSO) in the upstream TiDB cluster. From this TSO, TiCDC requests data in a replication task. Therefore, the `start-ts` of the replication task must meet the following requirements: + +- The value of `start-ts` is larger than the `tikv_gc_safe_point` value of the current TiDB cluster. Otherwise, you will fail to create a task. +- Before starting a task, ensure that the downstream has all data before `start-ts`. For scenarios such as replicating data to message queues, if the data consistency between upstream and downstream is not required, you can relax this requirement according to your application need. + +If you do not specify `start-ts`, or specify `start-ts` as `0`, when a replication task is started, TiCDC gets a current TSO and starts the task from this TSO. + +## Some tables cannot be replicated when starting a task + +When you execute `cdc cli changefeed create` to create a replication task, TiCDC checks whether the upstream tables meet the [replication restrictions](/ticdc/ticdc-overview.md#restrictions). If some tables do not meet the restrictions, `some tables are not eligible to replicate` is returned with a list of ineligible tables. You can choose `Y` or `y` to continue creating the task, and all updates on these tables are automatically ignored during the replication. If you choose an input other than `Y` or `y`, the replication task is not created. + +## How to handle replication interruption + +A replication task might be interrupted in the following known scenarios: + +- The downstream continues to be abnormal, and TiCDC still fails after many retries. + + - In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`. + - Handling method: You can resume the replication task via the HTTP interface after the downstream is back to normal. + +- Replication cannot continue due to incompatible SQL statement(s) in the downstream. + + - In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`. + - Handling procedure: + 1. Query the status information of the replication task using the `cdc cli changefeed query` command and record the value of `checkpoint-ts`. + 2. Use the new task configuration file and add the `ignore-txn-commit-ts` parameter to skip the transaction corresponding to the specified `commit-ts`. + 3. Stop the old replication task via HTTP API. Execute `cdc cli changefeed create` to create a new task and specify the new task configuration file. Specify `checkpoint-ts` recorded in step 1 as the `start-ts` and start a new task to resume the replication. + +## `gc-ttl` and file sorting + +Since v4.0.0-rc.1, PD supports that external services set the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted. + +When starting the TiCDC server, you can specify the Time To Live (TTL) duration of GC safepoint through `gc-ttl`, which means the longest time that data is retained within the GC safepoint. This value is set by TiCDC in PD, which is 86,400 seconds by default. + +If the replication task is interrupted for a long time and a large volume of unconsumed data is accumulated, Out of Memory (OOM) might occur when TiCDC is started. In this situation, you can enable the file sorting feature of TiCDC that uses system files for sorting. To enable this feature, pass `--sort-engine=file` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --sort-engine="file" --sort-dir="/data/cdc/sort" +``` + +> **Note:** +> +> TiCDC (the 4.0 release version) does not support dynamically modifying the file sorting and memory sorting. From 76ece18fab392063754c1e824f0f05f1da9b3a40 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Tue, 9 Jun 2020 17:14:07 +0800 Subject: [PATCH 2/6] Apply suggestions from code review --- ticdc/troubleshoot-ticdc.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 1a381e2e3def9..da0545e22179d 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -1,12 +1,12 @@ --- -title: Troubleshoot TiCDC Issues +title: Handle TiCDC Issues summary: Learn how to troubleshoot TiCDC issues. category: reference --- -# Troubleshoot TiCDC Issues +# Handle TiCDC Issues -This document introduces the common issues and errors that you might often encountered when using TiCDC and provides the corresponding maintenance and troubleshooting methods. +This document introduces the common issues and errors that you might encounter when using TiCDC usage, and the corresponding maintenance and troubleshooting methods. ## How to choose `start-ts` when starting a task From 54b40db15ced22c037059609465bc94f60d6b5ce Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Tue, 9 Jun 2020 17:14:22 +0800 Subject: [PATCH 3/6] Update TOC.md --- TOC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TOC.md b/TOC.md index 1e2b18dda8430..8270b0ae1f71d 100644 --- a/TOC.md +++ b/TOC.md @@ -396,7 +396,7 @@ - [Deploy and Use TiCDC](/ticdc/deploy-ticdc.md) - [Manage TiCDC Cluster and Replication Tasks](/ticdc/manage-ticdc.md) - [Configure Sink URI](/ticdc/sink-url.md) - - [Troubleshoot TiCDC Issues](/ticdc/troubleshoot-ticdc.md) + - [Handle TiCDC Issues](/ticdc/troubleshoot-ticdc.md) - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) - [Column and DDL Type Codes](/ticdc/column-ddl-type-codes.md) + sync-diff-inspector From 38a9e9b733fb65ea1773d3c1ae2924b57b7df832 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Tue, 9 Jun 2020 17:14:46 +0800 Subject: [PATCH 4/6] Update ticdc/troubleshoot-ticdc.md --- ticdc/troubleshoot-ticdc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index da0545e22179d..308dfbace5396 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -1,6 +1,6 @@ --- title: Handle TiCDC Issues -summary: Learn how to troubleshoot TiCDC issues. +summary: Learn how to handle TiCDC issues. category: reference --- From c6ffe403ab0f558ac69bc236ec4dd6f72dd064e2 Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Fri, 12 Jun 2020 13:12:30 +0800 Subject: [PATCH 5/6] Apply suggestions from code review Co-authored-by: amyangfei --- ticdc/troubleshoot-ticdc.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 308dfbace5396..17e158508e4c2 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -6,11 +6,11 @@ category: reference # Handle TiCDC Issues -This document introduces the common issues and errors that you might encounter when using TiCDC usage, and the corresponding maintenance and troubleshooting methods. +This document introduces the common issues and errors that you might encounter when using TiCDC, and the corresponding maintenance and troubleshooting methods. ## How to choose `start-ts` when starting a task -The `start-ts` of a replication task corresponds to a Time Sharing Option (TSO) in the upstream TiDB cluster. From this TSO, TiCDC requests data in a replication task. Therefore, the `start-ts` of the replication task must meet the following requirements: +The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in the upstream TiDB cluster. TiCDC requests data from this TSO in a replication task. Therefore, the `start-ts` of the replication task must meet the following requirements: - The value of `start-ts` is larger than the `tikv_gc_safe_point` value of the current TiDB cluster. Otherwise, you will fail to create a task. - Before starting a task, ensure that the downstream has all data before `start-ts`. For scenarios such as replicating data to message queues, if the data consistency between upstream and downstream is not required, you can relax this requirement according to your application need. From 7894159ce9bfe7aafbe8981ef87a55168c52603d Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Fri, 12 Jun 2020 13:42:23 +0800 Subject: [PATCH 6/6] Apply suggestions from code review --- ticdc/troubleshoot-ticdc.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 17e158508e4c2..1dc3bab63e7c2 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -12,12 +12,12 @@ This document introduces the common issues and errors that you might encounter w The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in the upstream TiDB cluster. TiCDC requests data from this TSO in a replication task. Therefore, the `start-ts` of the replication task must meet the following requirements: -- The value of `start-ts` is larger than the `tikv_gc_safe_point` value of the current TiDB cluster. Otherwise, you will fail to create a task. +- The value of `start-ts` is larger than the `tikv_gc_safe_point` value of the current TiDB cluster. Otherwise, an error occurs when you create a task. - Before starting a task, ensure that the downstream has all data before `start-ts`. For scenarios such as replicating data to message queues, if the data consistency between upstream and downstream is not required, you can relax this requirement according to your application need. If you do not specify `start-ts`, or specify `start-ts` as `0`, when a replication task is started, TiCDC gets a current TSO and starts the task from this TSO. -## Some tables cannot be replicated when starting a task +## Some tables cannot be replicated when you start a task When you execute `cdc cli changefeed create` to create a replication task, TiCDC checks whether the upstream tables meet the [replication restrictions](/ticdc/ticdc-overview.md#restrictions). If some tables do not meet the restrictions, `some tables are not eligible to replicate` is returned with a list of ineligible tables. You can choose `Y` or `y` to continue creating the task, and all updates on these tables are automatically ignored during the replication. If you choose an input other than `Y` or `y`, the replication task is not created. @@ -30,17 +30,17 @@ A replication task might be interrupted in the following known scenarios: - In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`. - Handling method: You can resume the replication task via the HTTP interface after the downstream is back to normal. -- Replication cannot continue due to incompatible SQL statement(s) in the downstream. +- Replication cannot continue because of incompatible SQL statement(s) in the downstream. - In this scenario, TiCDC saves the task information. Because TiCDC has set the service GC safepoint in PD, the data after the task checkpoint is not cleaned by TiKV GC within the valid period of `gc-ttl`. - - Handling procedure: + - Handling procedures: 1. Query the status information of the replication task using the `cdc cli changefeed query` command and record the value of `checkpoint-ts`. 2. Use the new task configuration file and add the `ignore-txn-commit-ts` parameter to skip the transaction corresponding to the specified `commit-ts`. 3. Stop the old replication task via HTTP API. Execute `cdc cli changefeed create` to create a new task and specify the new task configuration file. Specify `checkpoint-ts` recorded in step 1 as the `start-ts` and start a new task to resume the replication. ## `gc-ttl` and file sorting -Since v4.0.0-rc.1, PD supports that external services set the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted. +Since v4.0.0-rc.1, PD supports external services in setting the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted. When starting the TiCDC server, you can specify the Time To Live (TTL) duration of GC safepoint through `gc-ttl`, which means the longest time that data is retained within the GC safepoint. This value is set by TiCDC in PD, which is 86,400 seconds by default. @@ -54,4 +54,4 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235 > **Note:** > -> TiCDC (the 4.0 release version) does not support dynamically modifying the file sorting and memory sorting. +> TiCDC (the 4.0 version) does not support dynamically modifying the file sorting and memory sorting yet.