diff --git a/how-to/monitor/monitor-a-cluster.md b/how-to/monitor/monitor-a-cluster.md index c3b7f41dbd56b..e756c64e6dee8 100644 --- a/how-to/monitor/monitor-a-cluster.md +++ b/how-to/monitor/monitor-a-cluster.md @@ -96,22 +96,30 @@ Assume that the TiDB cluster topology is as follows: #### Step 1: Download the binary package +{{< copyable "shell-regular" >}} + ```bash # Downloads the package. -$ wget https://github.com/prometheus/prometheus/releases/download/v2.2.1/prometheus-2.2.1.linux-amd64.tar.gz -$ wget https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz -$ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.6.3.linux-x64.tar.gz +wget https://download.pingcap.org/prometheus-2.8.1.linux-amd64.tar.gz +wget https://download.pingcap.org/node_exporter-0.17.0.linux-amd64.tar.gz +wget https://download.pingcap.org/grafana-6.1.6.linux-amd64.tar.gz +``` + +{{< copyable "shell-regular" >}} +```bash # Extracts the package. -$ tar -xzf prometheus-2.2.1.linux-amd64.tar.gz -$ tar -xzf node_exporter-0.15.2.linux-amd64.tar.gz -$ tar -xzf grafana-4.6.3.linux-x64.tar.gz +tar -xzf prometheus-2.8.1.linux-amd64.tar.gz +tar -xzf node_exporter-0.17.0.linux-amd64.tar.gz +tar -xzf grafana-6.1.6.linux-amd64.tar.gz ``` #### Step 2: Start `node_exporter` on Node1, Node2, Node3, and Node4 +{{< copyable "shell-regular" >}} + ```bash -$ cd node_exporter-0.15.2.linux-amd64 +cd node_exporter-0.17.0.linux-amd64 # Starts the node_exporter service. $ ./node_exporter --web.listen-address=":9100" \ @@ -122,10 +130,14 @@ $ ./node_exporter --web.listen-address=":9100" \ Edit the Prometheus configuration file: -```yml -$ cd prometheus-2.2.1.linux-amd64 -$ vi prometheus.yml +{{< copyable "shell-regular" >}} + +```bash +cd prometheus-2.8.1.linux-amd64 && +vi prometheus.yml +``` +```ini ... global: @@ -191,9 +203,11 @@ $ ./prometheus \ Edit the Grafana configuration file: +{{< copyable "shell-regular" >}} + ```ini -$ cd grafana-4.6.3 -$ vi conf/grafana.ini +cd grafana-6.1.6 && +vi conf/grafana.ini ... @@ -256,20 +270,22 @@ This section describes how to configure Grafana. - Default account: admin - Default password: admin -2. Click the Grafana logo to open the sidebar menu. + > **Note:** + > + > For the **Change Password** step, you can choose **Skip**. -3. In the sidebar menu, click **Data Source**. +2. In the Grafana sidebar menu, click **Data Source** within the **Configuration**. -4. Click **Add data source**. +3. Click **Add data source**. -5. Specify the data source information. +4. Specify the data source information. - Specify a **Name** for the data source. - For **Type**, select **Prometheus**. - For **URL**, specify the Prometheus address. - Specify other fields as needed. -6. Click **Add** to save the new data source. +5. Click **Add** to save the new data source. #### Step 2: Import a Grafana dashboard diff --git a/how-to/scale/with-ansible.md b/how-to/scale/with-ansible.md index a3a8a209fcad0..47f0ee423a13e 100644 --- a/how-to/scale/with-ansible.md +++ b/how-to/scale/with-ansible.md @@ -200,7 +200,8 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > You cannot add the `#` character at the beginning of the line. Otherwise, the following configuration cannot take effect. 2. Add `--join="http://172.16.10.1:2379" \`. The IP address (`172.16.10.1`) can be any of the existing PD IP address in the cluster. - 3. Manually start the PD service in the newly added PD node: + + 3. Start the PD service in the newly added PD node: ``` {deploy_dir}/scripts/start_pd.sh @@ -220,26 +221,35 @@ For example, if you want to add a PD node (node103) with the IP address `172.16. > > `pd-ctl` is a command used to check the number of PD nodes. -5. Apply a rolling update to the entire cluster: +5. Start the monitoring service: ``` - ansible-playbook rolling_update.yml + ansible-playbook start.yml -l 172.16.10.103 ``` -6. Start the monitor service: + > **Note:** + > + > If you use an alias (inventory_name), use the `-l` option to specify the alias. + +6. Update the cluster configuration: ``` - ansible-playbook start.yml -l 172.16.10.103 + ansible-playbook deploy.yml ``` -7. Update the Prometheus configuration and restart the cluster: +7. Restart Prometheus, and enable the monitoring of PD nodes used for increasing the capacity: ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus + ansible-playbook stop.yml --tags=prometheus + ansible-playbook start.yml --tags=prometheus ``` 8. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. +> **Note:** +> +> The PD Client in TiKV caches the list of PD nodes. Currently, the list is updated only if the PD leader is switched or the TiKV server is restarted to load the latest configuration. To avoid TiKV caching an outdated list, there should be at least two existing PD members in the PD cluster after increasing or decreasing the capacity of a PD node. If this condition is not met, transfer the PD leader manually to update the list of PD nodes. + ## Decrease the capacity of a TiDB node For example, if you want to remove a TiDB node (node5) with the IP address `172.16.10.5`, take the following steps: @@ -430,6 +440,10 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 ansible-playbook stop.yml -l 172.16.10.2 ``` + > **Note:** + > + > In this example, you can only stop the PD service on node2. If there are any other services deployed with the IP address `172.16.10.2`, use the `-t` option to specify the service (such as `-t tidb`). + 4. Edit the `inventory.ini` file and remove the node information: ```ini @@ -480,16 +494,21 @@ For example, if you want to remove a PD node (node2) with the IP address `172.16 | node8 | 172.16.10.8 | TiKV3 | | node9 | 172.16.10.9 | TiKV4 | -5. Perform a rolling update to the entire TiDB cluster: +5. Update the cluster configuration: ``` - ansible-playbook rolling_update.yml + ansible-playbook deploy.yml ``` -6. Update the Prometheus configuration and restart the cluster: +6. Restart Prometheus, and disable the monitoring of PD nodes used for increasing the capacity: ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus + ansible-playbook stop.yml --tags=prometheus + ansible-playbook start.yml --tags=prometheus ``` 7. To monitor the status of the entire cluster, open a browser to access the monitoring platform: `http://172.16.10.3:3000`. + +> **Note:** +> +> The PD Client in TiKV caches the list of PD nodes. Currently, the list is updated only if the PD leader is switched or the TiKV server is restarted to load the latest configuration. To avoid TiKV caching an outdated list, there should be at least two existing PD members in the PD cluster after increasing or decreasing the capacity of a PD node. If this condition is not met, transfer the PD leader manually to update the list of PD nodes.