From ef1b2fb4e6d92cee2f411a50800f3685e7b856ca Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Mon, 16 Aug 2021 19:57:55 +0800 Subject: [PATCH 1/8] added descriptions about slow node detection --- best-practices/pd-scheduling-best-practices.md | 2 ++ tikv-configuration-file.md | 7 +++++++ 2 files changed, 9 insertions(+) diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md index 91e0b0586b971..096c45f72efa9 100644 --- a/best-practices/pd-scheduling-best-practices.md +++ b/best-practices/pd-scheduling-best-practices.md @@ -280,3 +280,5 @@ For v3.0.4 and v2.1.16 or earlier, the `approximate_keys` of regions are inaccur If a TiKV node fails, PD defaults to setting the corresponding node to the **down** state after 30 minutes (customizable by configuration item `max-store-down-time`), and rebalancing replicas for regions involved. Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout. + +In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, it calculates a score ranging from 1 to 100. A TiKV node with a score greater than or equal to 80 is marked as slow. You can add `evict-slow-store-scheduler` to detect and schedule slow nodes. The current version supports that when only one slow node appears, all leaders in the node will be evicted. \ No newline at end of file diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index ad34ce379ffd1..bf495514b9cf8 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -637,6 +637,13 @@ Configuration items related to Raftstore + Default value: `1` + Minimum value: greater than `0` +### `inspect-interval` + ++ At a certain interval, TiKV inspects the latency status of the Raftstore thread. This parameter specifies the interval of inspection. If the latency exceeds this value, the Raftstore thread is marked as timeout. ++ Judges whether the TiKV node is slow based on the ratio of inspected latency. ++ Default value: 500ms ++ Minimum value: 1ms + ## Coprocessor Configuration items related to Coprocessor From e321515587893821098c3b803ac6a905d9f3fe22 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Mon, 16 Aug 2021 20:26:40 +0800 Subject: [PATCH 2/8] Update pd-scheduling-best-practices.md --- best-practices/pd-scheduling-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md index 096c45f72efa9..2c786c5df5f5e 100644 --- a/best-practices/pd-scheduling-best-practices.md +++ b/best-practices/pd-scheduling-best-practices.md @@ -281,4 +281,4 @@ If a TiKV node fails, PD defaults to setting the corresponding node to the **dow Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout. -In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, it calculates a score ranging from 1 to 100. A TiKV node with a score greater than or equal to 80 is marked as slow. You can add `evict-slow-store-scheduler` to detect and schedule slow nodes. The current version supports that when only one slow node appears, all leaders in the node will be evicted. \ No newline at end of file +In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, it calculates a score ranging from 1 to 100. A TiKV node with a score greater than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config) to detect and schedule slow nodes. When one and only one slow node appears, and the slow score reaches the upper limit (100 by default), all leaders in the node will be evicted. \ No newline at end of file From b74d2b7ec9bf23d489b714c90bab21f9db7ca6b7 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Mon, 16 Aug 2021 20:41:34 +0800 Subject: [PATCH 3/8] Update pd-control.md --- pd-control.md | 1 + 1 file changed, 1 insertion(+) diff --git a/pd-control.md b/pd-control.md index fe003c7457ba6..50505c5a5e5d1 100644 --- a/pd-control.md +++ b/pd-control.md @@ -701,6 +701,7 @@ Usage: >> scheduler config evict-leader-scheduler // Display the stores in which the scheduler is located since v4.0.0 >> scheduler add shuffle-leader-scheduler // Randomly exchange the leader on different stores >> scheduler add shuffle-region-scheduler // Randomly scheduling the regions on different stores +>> scheduler add evict-slow-store-scheduler // When there is one and only one slow score, evict all region leaders of that score >> scheduler remove grant-leader-scheduler-1 // Remove the corresponding scheduler, and `-1` corresponds to the store ID >> scheduler pause balance-region-scheduler 10 // Pause the balance-region scheduler for 10 seconds >> scheduler pause all 10 // Pause all schedulers for 10 seconds From ea0f2da64dfb7ab1d0508366578b26bc66c64972 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Fri, 20 Aug 2021 10:05:35 +0800 Subject: [PATCH 4/8] Apply suggestions from code review Co-authored-by: 5kbpers --- tikv-configuration-file.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index bf495514b9cf8..fe55bdc62be42 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -639,8 +639,8 @@ Configuration items related to Raftstore ### `inspect-interval` -+ At a certain interval, TiKV inspects the latency status of the Raftstore thread. This parameter specifies the interval of inspection. If the latency exceeds this value, the Raftstore thread is marked as timeout. -+ Judges whether the TiKV node is slow based on the ratio of inspected latency. ++ At a certain interval, TiKV inspects the latency of the Raftstore component. This parameter specifies the interval of inspection. If the latency exceeds this value, the Raftstore thread is marked as timeout. ++ Judges whether the TiKV node is slow based on the ratio of timeout inspection. + Default value: 500ms + Minimum value: 1ms From 938a6662a8a4a9b8ad94758454b858df95311df3 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Fri, 20 Aug 2021 14:31:02 +0800 Subject: [PATCH 5/8] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- pd-control.md | 4 ++-- tikv-configuration-file.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/pd-control.md b/pd-control.md index 50505c5a5e5d1..a9bebcf740aae 100644 --- a/pd-control.md +++ b/pd-control.md @@ -700,8 +700,8 @@ Usage: >> scheduler add evict-leader-scheduler 1 // Move all the Region leaders on store 1 out >> scheduler config evict-leader-scheduler // Display the stores in which the scheduler is located since v4.0.0 >> scheduler add shuffle-leader-scheduler // Randomly exchange the leader on different stores ->> scheduler add shuffle-region-scheduler // Randomly scheduling the regions on different stores ->> scheduler add evict-slow-store-scheduler // When there is one and only one slow score, evict all region leaders of that score +>> scheduler add shuffle-region-scheduler // Randomly scheduling the Regions on different stores +>> scheduler add evict-slow-store-scheduler // When there is one and only one slow score, evict all Region leaders of that score >> scheduler remove grant-leader-scheduler-1 // Remove the corresponding scheduler, and `-1` corresponds to the store ID >> scheduler pause balance-region-scheduler 10 // Pause the balance-region scheduler for 10 seconds >> scheduler pause all 10 // Pause all schedulers for 10 seconds diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 2ec3aae7ff94b..fbfb2c6272b5c 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -678,10 +678,10 @@ Configuration items related to Raftstore ### `inspect-interval` -+ At a certain interval, TiKV inspects the latency of the Raftstore component. This parameter specifies the interval of inspection. If the latency exceeds this value, the Raftstore thread is marked as timeout. ++ At a certain interval, TiKV inspects the latency of the Raftstore thread. This parameter specifies the interval of the inspection. If the latency exceeds this value, the Raftstore thread is marked as timeout. + Judges whether the TiKV node is slow based on the ratio of timeout inspection. -+ Default value: 500ms -+ Minimum value: 1ms ++ Default value: `"500ms"` ++ Minimum value: `"1ms"` ## Coprocessor From 9a64bed941fb8dea7cbe8788cde39ceece965332 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 20 Aug 2021 16:11:36 +0800 Subject: [PATCH 6/8] Update pd-control.md Co-authored-by: 5kbpers --- pd-control.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pd-control.md b/pd-control.md index a9bebcf740aae..b6e0ac083c029 100644 --- a/pd-control.md +++ b/pd-control.md @@ -701,7 +701,7 @@ Usage: >> scheduler config evict-leader-scheduler // Display the stores in which the scheduler is located since v4.0.0 >> scheduler add shuffle-leader-scheduler // Randomly exchange the leader on different stores >> scheduler add shuffle-region-scheduler // Randomly scheduling the Regions on different stores ->> scheduler add evict-slow-store-scheduler // When there is one and only one slow score, evict all Region leaders of that score +>> scheduler add evict-slow-store-scheduler // When there is one and only one slow store, evict all Region leaders of that store >> scheduler remove grant-leader-scheduler-1 // Remove the corresponding scheduler, and `-1` corresponds to the store ID >> scheduler pause balance-region-scheduler 10 // Pause the balance-region scheduler for 10 seconds >> scheduler pause all 10 // Pause all schedulers for 10 seconds From f608d0494e9f2e40db40735765ed384c92d0fccf Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Mon, 23 Aug 2021 14:17:49 +0800 Subject: [PATCH 7/8] Update tikv-configuration-file.md Co-authored-by: 5kbpers --- tikv-configuration-file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index fbfb2c6272b5c..cc994ecfe40a9 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -678,7 +678,7 @@ Configuration items related to Raftstore ### `inspect-interval` -+ At a certain interval, TiKV inspects the latency of the Raftstore thread. This parameter specifies the interval of the inspection. If the latency exceeds this value, the Raftstore thread is marked as timeout. ++ At a certain interval, TiKV inspects the latency of the Raftstore component. This parameter specifies the interval of the inspection. If the latency exceeds this value, that inspection is marked as timeout. + Judges whether the TiKV node is slow based on the ratio of timeout inspection. + Default value: `"500ms"` + Minimum value: `"1ms"` From fcd9bf8ff40fa3dfaa98da14fd6d17bbb7b2c474 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Mon, 23 Aug 2021 14:18:53 +0800 Subject: [PATCH 8/8] Update tikv-configuration-file.md --- tikv-configuration-file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index cc994ecfe40a9..5e0e6b238301e 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -678,7 +678,7 @@ Configuration items related to Raftstore ### `inspect-interval` -+ At a certain interval, TiKV inspects the latency of the Raftstore component. This parameter specifies the interval of the inspection. If the latency exceeds this value, that inspection is marked as timeout. ++ At a certain interval, TiKV inspects the latency of the Raftstore component. This parameter specifies the interval of the inspection. If the latency exceeds this value, this inspection is marked as timeout. + Judges whether the TiKV node is slow based on the ratio of timeout inspection. + Default value: `"500ms"` + Minimum value: `"1ms"`