diff --git a/TOC.md b/TOC.md index 4d3b48547e1d1..a7bbbb3758a58 100644 --- a/TOC.md +++ b/TOC.md @@ -307,6 +307,7 @@ - [Overview](/reference/tiflash/overview.md) - [Deploy a TiFlash Cluster](/reference/tiflash/deploy.md) - [Use TiFlash](/reference/tiflash/use-tiflash.md) + - [Monitor TiFlash](/reference/tiflash/monitor.md) + TiDB Binlog - [Overview](/reference/tidb-binlog/overview.md) - [Deploy](/reference/tidb-binlog/deploy.md) diff --git a/reference/tiflash/monitor.md b/reference/tiflash/monitor.md new file mode 100644 index 0000000000000..aa3ad917a4695 --- /dev/null +++ b/reference/tiflash/monitor.md @@ -0,0 +1,37 @@ +--- +title: Monitor the TiFlash Cluster +summary: Learn the monitoring items of TiFlash. +category: reference +--- + +# Monitor the TiFlash Cluster + +This document describes the monitoring items of TiFlash. + +## Monitor the Coprocessor + +| Monitoring items | Description | +|:---|:-----| +| `tiflash_coprocessor_request_count` | The number of coprocessor requests received. `batch` is the number of batch requests. `batch_cop` is the number of coprocessor requests in the batch requests. `cop` is the number of coprocessor requests that are sent directly via the coprocessor interface. `cop_dag` is the number of dag requests in all coprocessor requests. | +| `tiflash_coprocessor_executor_count` | The number of each type of dag executors. `table_scan` is the table scan executor. `selection` is the selection executor. `aggregation` is the aggregation executor. `top_n` is the `TopN` executor. `limit` is the limit executor. | +| `tiflash_coprocessor_request_duration_seconds` | The histogram of the duration of each coprocessor request, in which the duration is from the time that the coprocessor request is received to the time that the response to the request is completed. `batch` is the duration of batch requests. `cop` is the duration of coprocessor requests that are sent directly via the coprocessor interface. | +| `tiflash_coprocessor_request_error` | The number of errors of coprocessor requests. `meet_lock` means that the read data is locked. `region_not_found` means that the Region does not exist. `epoch_not_match` means the read Region epoch is inconsistent with the local epoch. `kv_client_error` means that the communication with TiKV returns an error. `internal_error` is the internal system error of TiFlash. `other` is other type of errors. | +| `tiflash_coprocessor_request_handle_seconds` | The histogram of the processing time of each coprocessor request, in which the processing time is from starting to execute the coprocessor request to completing the execution. `batch` is the processing time of batch request. `cop` is the processing time of coprocessor requests that are sent directly via the coprocessor interface. | +| `tiflash_coprocessor_response_bytes` | The total bytes of the response. | + +## Monitor DDL operations + +| Monitoring items | Description | +|:---|:-----| +| `tiflash_schema_version` | The version of the schema currently cached in TiFlash. | +| `tiflash_schema_apply_count` | This item includes the count of three types of `appy`: `diff apply`, `full apply`, and `failed apply`. `diff apply` is the normal process of a single apply. If `diff apply` fails, `failed apply` increases by `1`, and TiFlash rolls back to `full apply`. | +| `tiflash_schema_internal_ddl_count` | The number of specific DDL operations in TiFlash. | +| `tiflash_schema_apply_duration_seconds` | The time used for a single `apply schema` operation. | + +## Monitor Raft + +| Monitoring items | Description | +|:---|:-----| +| `tiflash_raft_read_index_count` | The number of times that the coprocessor triggers the `read_index` request, which equals to the number of Regions triggered by a coprocessor. | +| `tiflash_raft_read_index_duration_seconds` | The time used by `read_index`. Most time is used for interaction with Leader and retry. | +| `tiflash_raft_wait_index_duration_seconds` | The time used by `wait_index`, namely the time used to wait until local index >= read_index after the `read_index` request is received. |