From 29e79b08f7d604f19dfe9438bfeaaaee0c35fa27 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 11 Mar 2020 14:58:53 +0800 Subject: [PATCH 01/10] optimize pd operation and upgrade monitor version --- how-to/monitor/monitor-a-cluster.md | 51 +++++++++++++++++++---------- how-to/scale/with-ansible.md | 42 +++++++++++++++++------- 2 files changed, 65 insertions(+), 28 deletions(-) diff --git a/how-to/monitor/monitor-a-cluster.md b/how-to/monitor/monitor-a-cluster.md index c3b7f41dbd56b..50374886a4c14 100644 --- a/how-to/monitor/monitor-a-cluster.md +++ b/how-to/monitor/monitor-a-cluster.md @@ -96,22 +96,31 @@ Assume that the TiDB cluster topology is as follows: #### Step 1: Download the binary package +{{< copyable "shell-regular" >}} + ```bash # Downloads the package. -$ wget https://github.com/prometheus/prometheus/releases/download/v2.2.1/prometheus-2.2.1.linux-amd64.tar.gz -$ wget https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz -$ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.6.3.linux-x64.tar.gz +wget https://download.pingcap.org/prometheus-2.8.1.linux-amd64.tar.gz +wget https://download.pingcap.org/node_exporter-0.17.0.linux-amd64.tar.gz +wget https://download.pingcap.org/grafana-6.1.6.linux-amd64.tar.gz +``` # Extracts the package. -$ tar -xzf prometheus-2.2.1.linux-amd64.tar.gz -$ tar -xzf node_exporter-0.15.2.linux-amd64.tar.gz -$ tar -xzf grafana-4.6.3.linux-x64.tar.gz + +{{< copyable "shell-regular" >}} + +```bash +tar -xzf prometheus-2.8.1.linux-amd64.tar.gz +tar -xzf node_exporter-0.17.0.linux-amd64.tar.gz +tar -xzf grafana-6.1.6.linux-amd64.tar.gz ``` #### Step 2: Start `node_exporter` on Node1, Node2, Node3, and Node4 +{{< copyable "shell-regular" >}} + ```bash -$ cd node_exporter-0.15.2.linux-amd64 +cd node_exporter-0.17.0.linux-amd64 # Starts the node_exporter service. $ ./node_exporter --web.listen-address=":9100" \ @@ -122,10 +131,14 @@ $ ./node_exporter --web.listen-address=":9100" \ Edit the Prometheus configuration file: -```yml -$ cd prometheus-2.2.1.linux-amd64 -$ vi prometheus.yml +{{< copyable "shell-regular" >}} +```bash +cd prometheus-2.8.1.linux-amd64 && +vi prometheus.yml +``` + +```ini ... global: @@ -191,9 +204,11 @@ $ ./prometheus \ Edit the Grafana configuration file: +{{< copyable "shell-regular" >}} + ```ini -$ cd grafana-4.6.3 -$ vi conf/grafana.ini +cd grafana-6.1.6 && +vi conf/grafana.ini ... @@ -256,20 +271,22 @@ This section describes how to configure Grafana. - Default account: admin - Default password: admin -2. Click the Grafana logo to open the sidebar menu. + > **Note:** + > + > For the **Change Password** step, choose **Skip**. -3. In the sidebar menu, click **Data Source**. +2. In the Grafana sidebar menu, click **Data Source** within the **Configuration**. -4. Click **Add data source**. +3. Click **Add data source**. -5. Specify the data source information. +4. Specify the data source information. - Specify a **Name** for the data source. - For **Type**, select **Prometheus**. - For **URL**, specify the Prometheus address. - Specify other fields as needed. -6. Click **Add** to save the new data source. +5. Click **Add** to save the new data source. #### Step 2: Import a Grafana dashboard diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index a3a8a209fcad0..547baeaae094e 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -200,7 +200,8 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > You cannot add the `#` character at the beginning of the line. Otherwise, the following configuration cannot take effect. 2. Add `--join="http://172.16.10.1:2379" \`. The IP address (`172.16.10.1`) can be any of the existing PD IP address in the cluster. - 3. Manually start the PD service in the newly added PD node: + + 3. Start the PD service in the newly added PD node: ``` {deploy_dir}/scripts/start_pd.sh @@ -220,26 +221,35 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > > `pd-ctl` is a command used to check the number of PD nodes. -5. Apply a rolling update to the entire cluster: +5. Start the monitor service: ``` - ansible-playbook rolling_update.yml + ansible-playbook start.yml -l 172.16.10.103 ``` -6. Start the monitor service: + > **Note:** + > + > If you use an alias (inventory_name), use `-l` to specify the alias. + +6. Update the cluster configuration: ``` - ansible-playbook start.yml -l 172.16.10.103 + ansible-playbook deploy.yml ``` -7. Update the Prometheus configuration and restart the cluster: +7. Restart the Prometheus, and enable the monitoring of Pd nodes used for increasing the capacity: ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus + ansible-playbook stop.yml --tags=prometheus + ansible-playbook start.yml --tags=prometheus ``` 8. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. +> **Note:** +> +> The PD Client in TiKV caches PD node list. The list is updated only if the PD leader is switched or the TiKV is restarted to load the latest configuration. After operations of increasing or decreasing the capacity of a PD node, there should be two existing nodes as the members of the PD cluster before the operations to avoid the stale PD node list. If this condition is not met, perform the PD transfer leader operation manually to update the PD node list. + ## Decrease the capacity of a TiDB node For example, if you want to remove a TiDB node (node5) with the IP address `172.16.10.5`, take the following steps: @@ -430,6 +440,10 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 ansible-playbook stop.yml -l 172.16.10.2 ``` +> **Note:** + > + > In this case, you can stop the services on node2 with only PD nodes on the `172.16.10.2` server. If there are any other services, (for instance, `TiDB`), use `-t` to specify the service (such as `-t tidb`). + 4. Edit the `inventory.ini` file and remove the node information: ```ini @@ -480,16 +494,22 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 | node8 | 172.16.10.8 | TiKV3 | | node9 | 172.16.10.9 | TiKV4 | -5. Perform a rolling update to the entire TiDB cluster: +5. Update the cluster configuration: ``` - ansible-playbook rolling_update.yml + ansible-playbook deploy.yml ``` -6. Update the Prometheus configuration and restart the cluster: +6. Restart the Prometheus, and disable the monitoring of Pd nodes used for increasing the capacity: + ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus + ansible-playbook stop.yml --tags=prometheus + ansible-playbook start.yml --tags=prometheus ``` 7. To monitor the status of the entire cluster, open a browser to access the monitoring platform: `http://172.16.10.3:3000`. + +> **Note:** +> +> The PD Client in TiKV caches PD node list. The list is updated only if the PD leader is switched or the TiKV is restarted to load the latest configuration. After operations of increasing or decreasing the capacity of a PD node, there should be two existing nodes as the members of the PD cluster before the operations to avoid the stale PD node list. If this condition is not met, perform the PD transfer leader operation manually to update the PD node list. From 56fc62f5282a0e3060043fe4b5c1f92bb4142838 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 11 Mar 2020 15:54:13 +0800 Subject: [PATCH 02/10] small edit --- how-to/monitor/monitor-a-cluster.md | 3 +-- how-to/scale/with-ansible.md | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/how-to/monitor/monitor-a-cluster.md b/how-to/monitor/monitor-a-cluster.md index 50374886a4c14..0e62e53fdd26a 100644 --- a/how-to/monitor/monitor-a-cluster.md +++ b/how-to/monitor/monitor-a-cluster.md @@ -105,11 +105,10 @@ wget https://download.pingcap.org/node_exporter-0.17.0.linux-amd64.tar.gz wget https://download.pingcap.org/grafana-6.1.6.linux-amd64.tar.gz ``` -# Extracts the package. - {{< copyable "shell-regular" >}} ```bash +# Extracts the package. tar -xzf prometheus-2.8.1.linux-amd64.tar.gz tar -xzf node_exporter-0.17.0.linux-amd64.tar.gz tar -xzf grafana-6.1.6.linux-amd64.tar.gz diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index 547baeaae094e..651002a302962 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -440,7 +440,7 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 ansible-playbook stop.yml -l 172.16.10.2 ``` -> **Note:** + > **Note:** > > In this case, you can stop the services on node2 with only PD nodes on the `172.16.10.2` server. If there are any other services, (for instance, `TiDB`), use `-t` to specify the service (such as `-t tidb`). @@ -502,7 +502,6 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 6. Restart the Prometheus, and disable the monitoring of Pd nodes used for increasing the capacity: - ``` ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheus From 9fbf680332a7daac98fe9501150d6926c40ae3da Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:16:56 +0800 Subject: [PATCH 03/10] Update how-to/monitor/monitor-a-cluster.md Co-Authored-By: anotherrachel --- how-to/monitor/monitor-a-cluster.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/monitor/monitor-a-cluster.md b/how-to/monitor/monitor-a-cluster.md index 0e62e53fdd26a..e756c64e6dee8 100644 --- a/how-to/monitor/monitor-a-cluster.md +++ b/how-to/monitor/monitor-a-cluster.md @@ -272,7 +272,7 @@ This section describes how to configure Grafana. > **Note:** > - > For the **Change Password** step, choose **Skip**. + > For the **Change Password** step, you can choose **Skip**. 2. In the Grafana sidebar menu, click **Data Source** within the **Configuration**. From 99dd19e324b133dc940b4bf6fefc8aacdd75946d Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:17:06 +0800 Subject: [PATCH 04/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index 651002a302962..67a766a3f0bea 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -229,7 +229,7 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > **Note:** > - > If you use an alias (inventory_name), use `-l` to specify the alias. + > If you use an alias (inventory_name), use the `-l` option to specify the alias. 6. Update the cluster configuration: From 9f4bd34544d32527f9cf1e80e9006861877f875b Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:17:47 +0800 Subject: [PATCH 05/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index 67a766a3f0bea..4cf1e770f463e 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -221,7 +221,7 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > > `pd-ctl` is a command used to check the number of PD nodes. -5. Start the monitor service: +5. Start the monitoring service: ``` ansible-playbook start.yml -l 172.16.10.103 From 77b3c47cbdfd2a3692190371888ede00dda41f3c Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:17:59 +0800 Subject: [PATCH 06/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index 4cf1e770f463e..f695bb1118b95 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -500,7 +500,7 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 ansible-playbook deploy.yml ``` -6. Restart the Prometheus, and disable the monitoring of Pd nodes used for increasing the capacity: +6. Restart Prometheus, and disable the monitoring of PD nodes used for increasing the capacity: ``` ansible-playbook stop.yml --tags=prometheus From 5790af626dd40e6b84c38216313af25290928013 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:18:55 +0800 Subject: [PATCH 07/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index f695bb1118b95..a956eecc38a42 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -442,7 +442,7 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 > **Note:** > - > In this case, you can stop the services on node2 with only PD nodes on the `172.16.10.2` server. If there are any other services, (for instance, `TiDB`), use `-t` to specify the service (such as `-t tidb`). + > In this example, you can only stop the PD service on node2. If there are any other services deployed with the IP address `172.16.10.2`, use the `-t` option to specify the service (such as `-t tidb`). 4. Edit the `inventory.ini` file and remove the node information: From e6f3d7f72c2afcf23085f297df0fc075eb6317b8 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:19:10 +0800 Subject: [PATCH 08/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index a956eecc38a42..36fd56a54748f 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -237,7 +237,7 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. ansible-playbook deploy.yml ``` -7. Restart the Prometheus, and enable the monitoring of Pd nodes used for increasing the capacity: +7. Restart Prometheus, and enable the monitoring of PD nodes used for increasing the capacity: ``` ansible-playbook stop.yml --tags=prometheus From 8b390c3b717640a88943431840bb11a9fcade3a1 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:20:50 +0800 Subject: [PATCH 09/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index 36fd56a54748f..be16ac3ed0ca3 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -248,7 +248,7 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > **Note:** > -> The PD Client in TiKV caches PD node list. The list is updated only if the PD leader is switched or the TiKV is restarted to load the latest configuration. After operations of increasing or decreasing the capacity of a PD node, there should be two existing nodes as the members of the PD cluster before the operations to avoid the stale PD node list. If this condition is not met, perform the PD transfer leader operation manually to update the PD node list. +> The PD Client in TiKV caches the list of PD nodes. Currently, the list is updated only if the PD leader is switched or the TiKV server is restarted to load the latest configuration. To avoid TiKV caching an outdated list, there should be at least two existing PD members in the PD cluster after increasing or decreasing the capacity of a PD node. If this condition is not met, transfer the PD leader manually to update the list of PD nodes. ## Decrease the capacity of a TiDB node From 5d078eb8681b809a9ae81508621ca347cbb25598 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 16 Mar 2020 20:21:04 +0800 Subject: [PATCH 10/10] Update how-to/scale/with-ansible.md Co-Authored-By: anotherrachel --- how-to/scale/with-ansible.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index be16ac3ed0ca3..47f0ee423a13e 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -511,4 +511,4 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 > **Note:** > -> The PD Client in TiKV caches PD node list. The list is updated only if the PD leader is switched or the TiKV is restarted to load the latest configuration. After operations of increasing or decreasing the capacity of a PD node, there should be two existing nodes as the members of the PD cluster before the operations to avoid the stale PD node list. If this condition is not met, perform the PD transfer leader operation manually to update the PD node list. +> The PD Client in TiKV caches the list of PD nodes. Currently, the list is updated only if the PD leader is switched or the TiKV server is restarted to load the latest configuration. To avoid TiKV caching an outdated list, there should be at least two existing PD members in the PD cluster after increasing or decreasing the capacity of a PD node. If this condition is not met, transfer the PD leader manually to update the list of PD nodes.