From 15a4db085a07d63c2ed796767ba943fe3db51edd Mon Sep 17 00:00:00 2001 From: en-jin19 Date: Mon, 23 Aug 2021 01:00:51 +0200 Subject: [PATCH 01/18] add two HTAP documents --- TOC.md | 2 + _index.md | 4 +- explore-htap.md | 104 +++++++++++++++++++ quick-start-with-htap.md | 216 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 325 insertions(+), 1 deletion(-) create mode 100644 explore-htap.md create mode 100644 quick-start-with-htap.md diff --git a/TOC.md b/TOC.md index c9e5dff2e0a37..22e778f250dc9 100644 --- a/TOC.md +++ b/TOC.md @@ -20,7 +20,9 @@ + [Credits](/credits.md) + Quick Start + [Try Out TiDB](/quick-start-with-tidb.md) + + [Try Out HTAP](/quick-start-with-htap.md) + [Learn TiDB SQL](/basic-sql-operations.md) + + [Learn HTAP](/explore-htap.md) + [Import Example Database](/import-example-data.md) + Deploy + [Software and Hardware Requirements](/hardware-and-software-requirements.md) diff --git a/_index.md b/_index.md index 206090fa55e05..ba6a5499b8d74 100644 --- a/_index.md +++ b/_index.md @@ -26,8 +26,10 @@ Designed for the cloud, TiDB provides flexible scalability, reliability and secu Quick Start -- [Quick Start Guide](/quick-start-with-tidb.md) +- [Quick Start with TiDB](/quick-start-with-tidb.md) +- [Quick Start with HTAP](/quick-start-with-htap.md) - [Explore SQL with TiDB](/basic-sql-operations.md) +- [Explore HTAP](/explore-htap.md) diff --git a/explore-htap.md b/explore-htap.md new file mode 100644 index 0000000000000..210fcdc8d548a --- /dev/null +++ b/explore-htap.md @@ -0,0 +1,104 @@ +--- +title: Explore HTAP +summary: Learn how to explore and use the features of TiDB HTAP. +--- + +# Explore HTAP + +This guide describes how to explore and use the features of TiDB Hybrid Transactional and Analytical Processing (HTAP). + +> **Note:** +> +> If you are new to TiDB HTAP and want to start using it quickly, see [Quick start with HTAP](/quick-start-with-htap.md). + +## Use cases + +TiDB HTAP can meet the needs that have increment massive data, reduce the risk cost of operation, and be used for on-premises big data technology stacks without difficulty, thereby presenting the value of data assets in real time. + +The following are the use cases of HTAP: + +- Hybrid load + + When using TiDB for real-time Online Analytical Processing (OLAP) that is in hybrid load scenarios, you only need to provide an entry point. Then TiDB automatically selects different processing engines based on the specific business. + +- Real-time stream processing + + When using TiDB in real-time stream processing scenarios, TiDB ensures that the continuously flowed data is queried in real time. At the same time, TiDB also can handle highly concurrent workloads and Business Intelligence (BI) queries. + +- Data center + + When using TiDB as a data center, TiDB can meet specific business needs by seamlessly connecting the data for the application and the data warehouse. + +For more information about use cases of TiDB HTAP, see [blogs about HTAP on the PingCAP website](https://pingcap.com/blog-cn/#HTAP). + +## Architecture + +In TiDB, a row-based storage engine [TiKV](/tikv-overview.md) for Online Transactional Processing (OLTP) and a columnar storage engine [TiFlash](/tiflash/tiflash-overview.md) for Online Analytical Processing (OLAP) co-exist, replicate data automatically, and keep strong consistency. + +For more information about the architecture, see [architecture of TiDB HTAP](/tiflash/tiflash-overview.md#architecture). + +## Prerequisites for environment + +Before exploring the features of TiDB HTAP, you need to configure TiDB and the corresponding storage engine according to the data volume. If the data volume is huge (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution. + +- TiFlash + + - If you have deployed a TiDB cluster but not TiFlash nodes, add the TiFlash nodes in the on-premises TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). + - If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster using TiUP](/production-deployment-using-tiup.md). At the same time, you also need to deploy the minimal cluster topology along with [TiFlash](/tiflash-deployment-topology.md). + - When deciding how to choose the number of TiFlash nodes, consider the following scenarios: + + - If you mainly need OLTP that runs small-scale analytical processing, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries. + - If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time. + - If the OLTP throughput is relatively high (for example, the rate of write throughput or update throughput is higher than 10 million lines/hours), the hot write regions and hot read regions can be formed. This is because the I/O usage in TiKV and TiFlash becomes the bottleneck due to limited write capacity of network and physical disk in this case. + +- TiSpark + + - If your data needs to be analyzed with Spark, deploy TiSpark (Spark 3.x is not currently supported). For specific process, see [TiSpark User Guide](/tispark-overview.md). + + + +## Prerequisites for data + +After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated to TiFlash. + +- If there is no data in the TiDB Cluster, migrate the data to TiDB first. For detailed information, see [data migration](/migration-overview.md). +- If the TiDB cluster already has the replicated data from upstream, after TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated to TiFlash. For detailed information, see [Use TiFlash](/tiflash/use-tiflash.md). + +## Data processing + +With TiDB, you can simply enter SQL statements for queries or write requirements. For the tables to be replicated to TiFlash, TiDB chooses the best execution freely by the front-end optimizer. + +> **Note:** +> +> MPP mode of TiFlash is enabled by default. When an SQL statement is executed, TiDB automatically determines whether to run in MPP mode through the optimizer. +> +> - To disable MPP mode of TiFlash, set the value of [tidb_allow_mpp](/system-variables.md#tidb_allow_mpp-new-in-v50) system variable to `OFF`. +> - To forcibly enable MPP mode of TiFlash for query execution, set the value of [tidb_allow_mpp](/system-variables.md#tidb_allow_mpp-new-in-v50) and [tidb_enforce_mpp](/system-variables.md#tidb_enforce_mpp-new-in-v51) to `ON`. +> - To see whether to use MPP mode when TiDB execute queries, see [Explain Statements in the MPP Mode](/explain-mpp.md#explain-statements-in-the-mpp-mode). If the output of `EXPLAIN` statement includes `ExchangeSender` and `ExchangeReceiver` operator, MPP mode is activated. + +## Performance monitoring + +When using TiDB, you can monitor the running status and check TiDB performance by the following methods: + +- [TiDB Dashboard](/dashboard/dashboard-intro.md): you can see the overall running status of the TiDB cluster, analyse distribution and trends of read and write traffic, and learn the detailed execution information of slow queries. +- [Monitoring system (Prometheus & Grafana)](/grafana-overview-dashboard.md): you can see the monitoring parameters of TiDB cluster-related componants including PD, TiDB, TiKV, TiFlash,TiCDC, and Node_exporter. + +To see the alert rules of TiDB cluster and TiFlash cluster, see [TiDB cluster alert rules](/alert-rules.md) and [TiFlash alert rules](/tiflash/tiflash-alert-rules.md). + +## Troubleshooting + +If you have issues while using TiDB, refer to the following documents: + +- [Analyze slow queries](/analyze-slow-queries.md) +- [Identify expensive queries](/identify-expensive-queries.md) +- [Troubleshoot hotspot issues](/troubleshoot-hot-spot-issues.md) +- [TiDB cluster troubleshooting guide](/troubleshoot-tidb-cluster.md) +- [Troubleshoot a TiFlash Cluster](/tiflash/troubleshoot-tiflash.md) + +You are also welcome to create [Github Issues](https://github.com/pingcap/tiflash/issues) or submit your questions on [AskTUG](https://asktug.com/). + +## What's next + +- To see TiFlash version, critical logs and system tables of TiFlash, see [Maintain a TiFlash cluster](/tiflash/maintain-tiflash.md). +- To remove a specific TiFlash node, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). \ No newline at end of file diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md new file mode 100644 index 0000000000000..f3845b19accf5 --- /dev/null +++ b/quick-start-with-htap.md @@ -0,0 +1,216 @@ +--- +title: Quick start with HTAP +summary: Learn how to quickly get started with the TiDB HTAP. +--- + +# Quick Start Guide for TiDB HTAP + +This guide walks you through the quickest way to get started with TiDB's one-stop Hybrid Transactional and Analytical Processing (HTAP). + +> **Note:** +> +> The steps provided in this guide is ONLY for quick start, NOT for production. To explore more features of HTAP, see [explore HTAP](/explore-htap.md). + +## Basic concepts + +Before using TiDB HTAP, you need to have basic knowledge about [TiKV](/tikv-overview.md), a row-based storage engine for TiDB Online Transactional Processing (OLTP), and [TiFlash](/tiflash/tiflash-overview.md), a columnar storage engine for TiDB Online Analytical Processing (OLAP). + +- Storage engines of HTAP: The row-based storage engine and the columnar storage engine co-exist for HTAP. Both storage engines can replicate data automatically and keep strong consistency. The row-based storage engine optimizes OLTP performance, and the columnar storage engine optimizes OLAP performance. +- Data consistency of HTAP: As a distributed and transactional key-value database, TiKV provides transactional APIs with ACID compliance. With the implementation of the [Raft onsensus algorithm](https://raft.github.io/raft.pdf), TiKV guarantees data consistency between multiple replicas and high availability. TiFlash replicates data from TiKV in real time, which ensures that data is strongly consistent between TiKV. +- Data isolation of HTAP: TiKV and TiFlash can be deployed on different machines as needed to solve the problem of HTAP resource isolation. +- MPP computing engine: [MPP](/tiflash/use-tiflash.md#control-whether-to-select-the-mpp-mode) is a distributed computing framework provided by the TiFlash engine since TiDB 5.0, which allows data exchange between nodes and provides high-performance, high-throughput SQL algorithms. If the MPP mode is enabled, the run time of the analytic queries can be significantly reduced. + +## Steps + +This document take a popular [TPC-H](http://www.tpc.org/tpch/) dataset as an example to guide you to experience the convenience and high performance of TiDB HTAP by trying to query. + +### Step 1: Prerequisites for deployment + +Before using TiDB HTAP, follow the steps in the [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md) to deploy a local test environment. + +In [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md): + +- You are recommended to run `tiup playground` to start a TiDB cluster of the latest version. When you run the following command, 1 TiDB instance, 1 TiKV instance, 1 PD instance, and 1 TiFlash instance are deloyed automatically: + + {{< copyable "shell-regular" >}} + + ```shell + tiup playground + ``` + +- If you want to specify the TiDB version and the number of the instances of each component, run the command that specifies the number of the instances of TiFlash as following: + + {{< copyable "shell-regular" >}} + + ```shell + tiup playground v5.1.0 --db 2 --pd 3 --kv 3 --tiflash 1 --monitor + ``` + +> **Note:** +> +> `tiup playground` command is ONLY for quick start, NOT for production. + +### Step 2: Prerequisites for data + +By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) dataset for using TiDB HTAP. If you are interested in TPC-H, see [General Implementation Guidelines](http://tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.0.pdf). + +> **Note:** +> +> If you want to use your existing data for analytic queries, you can [migrate your data to TiDB](/migration-overview.md). If you want to design and create your own data, you can create it by executing SQL statements or using related tools. + +1. Install the tool that creates data by running the following command: + + {{< copyable "shell-regular" >}} + + ```shell + tiup install bench + ``` + +2. Create the data by by running the following command: + + {{< copyable "shell-regular" >}} + + ```shell + tiup bench tpch --sf=1 prepare + ``` + + If the output of this command shows `Finished`, it indicates that the data is created. + +3. Execute the following SQL statement to see the created data: + + {{< copyable "sql" >}} + + ```sql + SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name', table_rows AS 'Number of Rows', CONCAT(ROUND(data_length/(1024*1024*1024),4),'G') AS 'Data Size', CONCAT(ROUND(index_length/(1024*1024*1024),4),'G') AS 'Index Size', CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'test'; + ``` + + As you can see from the output, eight tables are created in total, and the largest table has 6 million rows of data (the actual amount of the created data is in line with the value of the SQL query, because the data is randomly created by the tool). + + ```sql + +---------------+----------------+-----------+------------+---------+ + | Table Name | Number of Rows | Data Size | Index Size | Total | + +---------------+----------------+-----------+------------+---------+ + | test.nation | 25 | 0.0000G | 0.0000G | 0.0000G | + | test.region | 5 | 0.0000G | 0.0000G | 0.0000G | + | test.part | 200000 | 0.0245G | 0.0000G | 0.0245G | + | test.supplier | 10000 | 0.0014G | 0.0000G | 0.0014G | + | test.partsupp | 800000 | 0.1174G | 0.0119G | 0.1293G | + | test.customer | 150000 | 0.0242G | 0.0000G | 0.0242G | + | test.orders | 1514336 | 0.1673G | 0.0000G | 0.1673G | + | test.lineitem | 6001215 | 0.7756G | 0.0894G | 0.8651G | + +---------------+----------------+-----------+------------+---------+ + 8 rows in set (0.06 sec) + ``` + + This is a database of a commercial ordering system. In which, the `test.nation` table indicates the information about countries, the `test.region` table indicates the information about regions, the `test.part` table indicates the information about parts, the `test.supplier` table indicates the information about suppliers, the `test.partsupp` table indicates the information about parts from suppliers, the `test.customer` table indicates the information about customers, the `test.customer` table indicates the information about orders, and the `test.lineitem` table indicates the information about online items. + +### Step 3: Query with the row-based storage engine + +You can see the performance of TiDB when using only the row-based storage engine (for most databases) by executing the following SQL statements: + +{{< copyable "sql" >}} + +```sql +SELECT + l_orderkey, + SUM( + l_extendedprice * (1 - l_discount) + ) AS revenue, + o_orderdate, + o_shippriority +FROM + customer, + orders, + lineitem +WHERE + c_mktsegment = 'BUILDING' +AND c_custkey = o_custkey +AND l_orderkey = o_orderkey +AND o_orderdate < DATE '1996-01-01' +AND l_shipdate > DATE '1996-02-01' +GROUP BY + l_orderkey, + o_orderdate, + o_shippriority +ORDER BY + revenue DESC, + o_orderdate +limit 10; +``` + +This is a shipping priority query that gives priority and potential revenue to the highest-revenue order that has not been shipped by a specified date. The potential income is defined as the sum of `l_extendedprice * (1-l_discount)`. The orders are listed in descending order of revenue. In this example, this query lists the unshipped orders with potential query revenue in the top 10. + +### Step 4: Replication with the columnar storage engine + +After TiFlash is deployed, data replication does not automatically begin. You need to send a DDL statement to TiDB through a MySQL client to create a TiFlash replica for a specific table: + +{{< copyable "sql" >}} + +```sql +ALTER TABLE test.customer SET TIFLASH REPLICA 1; +ALTER TABLE test.orders SET TIFLASH REPLICA 1; +ALTER TABLE test.lineitem SET TIFLASH REPLICA 1; +``` + +You can check the status of the TiFlash replicas of a specific table using the following statement: + +{{< copyable "sql" >}} + +```sql +SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'customer'; +SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'orders'; +SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'lineitem'; +``` + +In the result of above statement: + +- `AVAILABLE` indicates whether the TiFlash replicas of this table is available or not. `1` means available and `0` means unavailable. Once the replicas become available, this status does not change. If you use DDL statements to modify the number of replicas, the replication status will be recalculated. +- `PROGRESS` means the progress of the replication. The value is between 0.0 and 1.0. 1 means at least one replica is replicated. + +### Step 5: analyze data faster using HTAP + +If you execute the SQL statements in [Step 3](#step-3-query-with-the-row-based-storage-engine), you can see the performance of TiDB HTAP. + +For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. You can use the `desc` or `explain analyze` statement to check whether or not a TiFlash replica is selected. For example: + +{{< copyable "sql" >}} + +```sql +explain analyze SELECT + l_orderkey, + SUM( + l_extendedprice * (1 - l_discount) + ) AS revenue, + o_orderdate, + o_shippriority +FROM + customer, + orders, + lineitem +WHERE + c_mktsegment = 'BUILDING' +AND c_custkey = o_custkey +AND l_orderkey = o_orderkey +AND o_orderdate < DATE '1996-01-01' +AND l_shipdate > DATE '1996-02-01' +GROUP BY + l_orderkey, + o_orderdate, + o_shippriority +ORDER BY + revenue DESC, + o_orderdate +limit 10; +``` + +If the result of the `EXPLAIN` statement shows `ExchangeSender` and `ExchangeReceiver` operators, it indicates that the MPP mode has taken effect. + +In addition, you can specify that each part of the entire query is computed using only the TiFlash. For detailed inforamtion, see [Use TiDB to read TiFlash replicas](/tiflash/use-tiflash.md#use-TiDB-to-read-TiFlash-replicas). + +You can compare query results and query performance of these two methods. + +## What's next + +- [Architecture of TiDB HTAP](stable/tiflash-overview.md#architecture) +- [Explore HTAP](/explore-htap.md) +- [Use TiFlash](/stable/use-tiflash.md#use-tiflash) \ No newline at end of file From a8bc1f35169d1ae6528cbb3c1a800a5341dd8dbb Mon Sep 17 00:00:00 2001 From: en-jin19 Date: Mon, 23 Aug 2021 01:07:19 +0200 Subject: [PATCH 02/18] fix CI error --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index f3845b19accf5..fe99d7461b67b 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -213,4 +213,4 @@ You can compare query results and query performance of these two methods. - [Architecture of TiDB HTAP](stable/tiflash-overview.md#architecture) - [Explore HTAP](/explore-htap.md) -- [Use TiFlash](/stable/use-tiflash.md#use-tiflash) \ No newline at end of file +- [Use TiFlash](/use-tiflash.md#use-tiflash) \ No newline at end of file From 23eb99fe1d964850a5f59ca31f4645be0295ae3c Mon Sep 17 00:00:00 2001 From: en-jin19 Date: Mon, 23 Aug 2021 01:10:22 +0200 Subject: [PATCH 03/18] fix CI error --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index fe99d7461b67b..4e65fe1528846 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -211,6 +211,6 @@ You can compare query results and query performance of these two methods. ## What's next -- [Architecture of TiDB HTAP](stable/tiflash-overview.md#architecture) +- [Architecture of TiDB HTAP](/tiflash-overview.md#architecture) - [Explore HTAP](/explore-htap.md) - [Use TiFlash](/use-tiflash.md#use-tiflash) \ No newline at end of file From 06dbf5b90ebee08d95c94f005682020c5357c22c Mon Sep 17 00:00:00 2001 From: en-jin19 Date: Mon, 23 Aug 2021 01:14:16 +0200 Subject: [PATCH 04/18] fix CI error --- quick-start-with-htap.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 4e65fe1528846..95f6641ae8b5f 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -211,6 +211,6 @@ You can compare query results and query performance of these two methods. ## What's next -- [Architecture of TiDB HTAP](/tiflash-overview.md#architecture) +- [Architecture of TiDB HTAP](/tiflash/tiflash-overview.md#architecture) - [Explore HTAP](/explore-htap.md) -- [Use TiFlash](/use-tiflash.md#use-tiflash) \ No newline at end of file +- [Use TiFlash](/tiflash/use-tiflash.md#use-tiflash) \ No newline at end of file From 064b1d67ef6018e2110975f0f8f3e527fafeaf86 Mon Sep 17 00:00:00 2001 From: en-jin19 Date: Mon, 23 Aug 2021 01:17:29 +0200 Subject: [PATCH 05/18] fix CI errors --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 95f6641ae8b5f..14625cb36df33 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -205,7 +205,7 @@ limit 10; If the result of the `EXPLAIN` statement shows `ExchangeSender` and `ExchangeReceiver` operators, it indicates that the MPP mode has taken effect. -In addition, you can specify that each part of the entire query is computed using only the TiFlash. For detailed inforamtion, see [Use TiDB to read TiFlash replicas](/tiflash/use-tiflash.md#use-TiDB-to-read-TiFlash-replicas). +In addition, you can specify that each part of the entire query is computed using only the TiFlash. For detailed inforamtion, see [Use TiDB to read TiFlash replicas](/tiflash/use-tiflash.md#use-tidb-to-read-tiflash-replicas). You can compare query results and query performance of these two methods. From eb109b779c97b5b5b50eb9ec2b9cc3697154bc2b Mon Sep 17 00:00:00 2001 From: Enwei Date: Mon, 23 Aug 2021 21:58:38 +0200 Subject: [PATCH 06/18] Update explore-htap.md --- explore-htap.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/explore-htap.md b/explore-htap.md index 210fcdc8d548a..059e2b377392a 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -39,7 +39,7 @@ For more information about the architecture, see [architecture of TiDB HTAP](/ti ## Prerequisites for environment -Before exploring the features of TiDB HTAP, you need to configure TiDB and the corresponding storage engine according to the data volume. If the data volume is huge (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution. +Before exploring the features of TiDB HTAP, you need to configure TiDB and the corresponding storage engine according to the data volume. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution. - TiFlash @@ -101,4 +101,4 @@ You are also welcome to create [Github Issues](https://github.com/pingcap/tiflas ## What's next - To see TiFlash version, critical logs and system tables of TiFlash, see [Maintain a TiFlash cluster](/tiflash/maintain-tiflash.md). -- To remove a specific TiFlash node, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). \ No newline at end of file +- To remove a specific TiFlash node, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). From 1180a0e03a1da284fa3c820259b5999b8d357563 Mon Sep 17 00:00:00 2001 From: Enwei Date: Mon, 23 Aug 2021 22:07:07 +0200 Subject: [PATCH 07/18] Update quick-start-with-htap.md --- quick-start-with-htap.md | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 14625cb36df33..aca2ef54cd803 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -81,24 +81,33 @@ By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) d {{< copyable "sql" >}} ```sql - SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name', table_rows AS 'Number of Rows', CONCAT(ROUND(data_length/(1024*1024*1024),4),'G') AS 'Data Size', CONCAT(ROUND(index_length/(1024*1024*1024),4),'G') AS 'Index Size', CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'test'; + SELECT + CONCAT(table_schema,'.',table_name) AS 'Table Name', + table_rows AS 'Number of Rows', + FORMAT_BYTES(data_length) AS 'Data Size', + FORMAT_BYTES(index_length) AS 'Index Size', + FORMAT_BYTES(data_length+index_length) AS'Total' + FROM + information_schema.TABLES + WHERE + table_schema='test'; ``` - As you can see from the output, eight tables are created in total, and the largest table has 6 million rows of data (the actual amount of the created data is in line with the value of the SQL query, because the data is randomly created by the tool). + As you can see from the output, eight tables are created in total, and the largest table has 6.5 million rows of data (the actual amount of the created data is in line with the value of the SQL query, because the data is randomly created by the tool). ```sql - +---------------+----------------+-----------+------------+---------+ - | Table Name | Number of Rows | Data Size | Index Size | Total | - +---------------+----------------+-----------+------------+---------+ - | test.nation | 25 | 0.0000G | 0.0000G | 0.0000G | - | test.region | 5 | 0.0000G | 0.0000G | 0.0000G | - | test.part | 200000 | 0.0245G | 0.0000G | 0.0245G | - | test.supplier | 10000 | 0.0014G | 0.0000G | 0.0014G | - | test.partsupp | 800000 | 0.1174G | 0.0119G | 0.1293G | - | test.customer | 150000 | 0.0242G | 0.0000G | 0.0242G | - | test.orders | 1514336 | 0.1673G | 0.0000G | 0.1673G | - | test.lineitem | 6001215 | 0.7756G | 0.0894G | 0.8651G | - +---------------+----------------+-----------+------------+---------+ + +---------------+----------------+-----------+------------+-----------+ + | Table Name | Number of Rows | Data Size | Index Size | Total | + +---------------+----------------+-----------+------------+-----------+ + | test.nation | 25 | 2.44 KiB | 0 bytes | 2.44 KiB | + | test.region | 5 | 416 bytes | 0 bytes | 416 bytes | + | test.part | 200000 | 25.07 MiB | 0 bytes | 25.07 MiB | + | test.supplier | 10000 | 1.45 MiB | 0 bytes | 1.45 MiB | + | test.partsupp | 800000 | 120.17 MiB| 12.21 MiB | 132.38 MiB| + | test.customer | 150000 | 24.77 MiB | 0 bytes | 24.77 MiB | + | test.orders | 1527648 | 174.40 MiB| 0 bytes | 174.40 MiB| + | test.lineitem | 6491711 | 849.07 MiB| 99.06 MiB | 948.13 MiB| + +---------------+----------------+-----------+------------+-----------+ 8 rows in set (0.06 sec) ``` @@ -213,4 +222,4 @@ You can compare query results and query performance of these two methods. - [Architecture of TiDB HTAP](/tiflash/tiflash-overview.md#architecture) - [Explore HTAP](/explore-htap.md) -- [Use TiFlash](/tiflash/use-tiflash.md#use-tiflash) \ No newline at end of file +- [Use TiFlash](/tiflash/use-tiflash.md#use-tiflash) From c266e605ff04e85bfdd5879533791a96e07bb0f4 Mon Sep 17 00:00:00 2001 From: Enwei Date: Tue, 24 Aug 2021 13:18:15 +0200 Subject: [PATCH 08/18] Update quick-start-with-htap.md --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index aca2ef54cd803..dc8c8def64653 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -9,7 +9,7 @@ This guide walks you through the quickest way to get started with TiDB's one-sto > **Note:** > -> The steps provided in this guide is ONLY for quick start, NOT for production. To explore more features of HTAP, see [explore HTAP](/explore-htap.md). +> The steps provided in this guide are ONLY for quick start. For production environments, [explore HTAP](/explore-htap.md) is recommended. ## Basic concepts From f1900e57148e997fda5cfcc6550a9dbd6da8116f Mon Sep 17 00:00:00 2001 From: Enwei Date: Tue, 24 Aug 2021 13:20:47 +0200 Subject: [PATCH 09/18] Update quick-start-with-htap.md --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index dc8c8def64653..3e8a8e2151569 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -9,7 +9,7 @@ This guide walks you through the quickest way to get started with TiDB's one-sto > **Note:** > -> The steps provided in this guide are ONLY for quick start. For production environments, [explore HTAP](/explore-htap.md) is recommended. +> The steps provided in this guide is ONLY for quick start in the test environment. For production environments, [explore HTAP](/explore-htap.md) is recommended. ## Basic concepts From 0aa4719139ffff8723fd5ed79b954eced8650aa0 Mon Sep 17 00:00:00 2001 From: Enwei Date: Thu, 26 Aug 2021 13:20:47 +0200 Subject: [PATCH 10/18] Apply suggestions from code review Co-authored-by: Grace Cai --- explore-htap.md | 36 +++++++++++++-------------- quick-start-with-htap.md | 54 ++++++++++++++++++++-------------------- 2 files changed, 45 insertions(+), 45 deletions(-) diff --git a/explore-htap.md b/explore-htap.md index 059e2b377392a..7d387248d405f 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -13,13 +13,13 @@ This guide describes how to explore and use the features of TiDB Hybrid Transact ## Use cases -TiDB HTAP can meet the needs that have increment massive data, reduce the risk cost of operation, and be used for on-premises big data technology stacks without difficulty, thereby presenting the value of data assets in real time. +TiDB HTAP can handle the massive data that increases rapidly, reduce the cost of dev-ops, and be deployed as either on-premises or cloud easily, which brings the value of data assets in real time. -The following are the use cases of HTAP: +The following are the typical use cases of HTAP: -- Hybrid load +- Hybrid workload - When using TiDB for real-time Online Analytical Processing (OLAP) that is in hybrid load scenarios, you only need to provide an entry point. Then TiDB automatically selects different processing engines based on the specific business. + When using TiDB for real-time Online Analytical Processing (OLAP) that is in hybrid load scenarios, you only need to provide an entry point. TiDB automatically selects different processing engines based on the specific business. - Real-time stream processing @@ -37,14 +37,14 @@ In TiDB, a row-based storage engine [TiKV](/tikv-overview.md) for Online Transac For more information about the architecture, see [architecture of TiDB HTAP](/tiflash/tiflash-overview.md#architecture). -## Prerequisites for environment +## Environment preparation -Before exploring the features of TiDB HTAP, you need to configure TiDB and the corresponding storage engine according to the data volume. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution. +Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corresponding storage engines according to the data volume. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution. - TiFlash - - If you have deployed a TiDB cluster but not TiFlash nodes, add the TiFlash nodes in the on-premises TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). - - If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster using TiUP](/production-deployment-using-tiup.md). At the same time, you also need to deploy the minimal cluster topology along with [TiFlash](/tiflash-deployment-topology.md). + - If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). + - If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md). - When deciding how to choose the number of TiFlash nodes, consider the following scenarios: - If you mainly need OLTP that runs small-scale analytical processing, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries. @@ -58,28 +58,28 @@ Before exploring the features of TiDB HTAP, you need to configure TiDB and the c -## Prerequisites for data +## Data preparation -After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated to TiFlash. +After TiFlash is deployed, TiKV does not replicate data to TiFlash automatically. You need to manually specify which tables need to be replicated to TiFlash. After that, TiDB creates the corresponding TiFlash replicas. - If there is no data in the TiDB Cluster, migrate the data to TiDB first. For detailed information, see [data migration](/migration-overview.md). - If the TiDB cluster already has the replicated data from upstream, after TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated to TiFlash. For detailed information, see [Use TiFlash](/tiflash/use-tiflash.md). ## Data processing -With TiDB, you can simply enter SQL statements for queries or write requirements. For the tables to be replicated to TiFlash, TiDB chooses the best execution freely by the front-end optimizer. +With TiDB, you can simply enter SQL statements for query or write requests. For the tables with TiFlash replicas, TiDB uses the front-end optimizer to automatically choose the optimal execution plan. > **Note:** > -> MPP mode of TiFlash is enabled by default. When an SQL statement is executed, TiDB automatically determines whether to run in MPP mode through the optimizer. +> The MPP mode of TiFlash is enabled by default. When an SQL statement is executed, TiDB automatically determines whether to run in the MPP mode through the optimizer. > -> - To disable MPP mode of TiFlash, set the value of [tidb_allow_mpp](/system-variables.md#tidb_allow_mpp-new-in-v50) system variable to `OFF`. -> - To forcibly enable MPP mode of TiFlash for query execution, set the value of [tidb_allow_mpp](/system-variables.md#tidb_allow_mpp-new-in-v50) and [tidb_enforce_mpp](/system-variables.md#tidb_enforce_mpp-new-in-v51) to `ON`. -> - To see whether to use MPP mode when TiDB execute queries, see [Explain Statements in the MPP Mode](/explain-mpp.md#explain-statements-in-the-mpp-mode). If the output of `EXPLAIN` statement includes `ExchangeSender` and `ExchangeReceiver` operator, MPP mode is activated. +> - To disable the MPP mode of TiFlash, set the value of the [tidb_allow_mpp](/system-variables.md#tidb_allow_mpp-new-in-v50) system variable to `OFF`. +> - To forcibly enable MPP mode of TiFlash for query execution, set the values of [tidb_allow_mpp](/system-variables.md#tidb_allow_mpp-new-in-v50) and [tidb_enforce_mpp](/system-variables.md#tidb_enforce_mpp-new-in-v51) to `ON`. +> - To check whether TiDB chooses the MPP mode to execute a specific query, see [Explain Statements in the MPP Mode](/explain-mpp.md#explain-statements-in-the-mpp-mode). If the output of `EXPLAIN` statement includes the `ExchangeSender` and `ExchangeReceiver` operators, the MPP mode is in use. ## Performance monitoring -When using TiDB, you can monitor the running status and check TiDB performance by the following methods: +When using TiDB, you can monitor the TiDB cluster status and performance metrics in either of the following ways: - [TiDB Dashboard](/dashboard/dashboard-intro.md): you can see the overall running status of the TiDB cluster, analyse distribution and trends of read and write traffic, and learn the detailed execution information of slow queries. - [Monitoring system (Prometheus & Grafana)](/grafana-overview-dashboard.md): you can see the monitoring parameters of TiDB cluster-related componants including PD, TiDB, TiKV, TiFlash,TiCDC, and Node_exporter. @@ -88,7 +88,7 @@ To see the alert rules of TiDB cluster and TiFlash cluster, see [TiDB cluster al ## Troubleshooting -If you have issues while using TiDB, refer to the following documents: +If any issue occurs during using TiDB, refer to the following documents: - [Analyze slow queries](/analyze-slow-queries.md) - [Identify expensive queries](/identify-expensive-queries.md) @@ -100,5 +100,5 @@ You are also welcome to create [Github Issues](https://github.com/pingcap/tiflas ## What's next -- To see TiFlash version, critical logs and system tables of TiFlash, see [Maintain a TiFlash cluster](/tiflash/maintain-tiflash.md). +- To check the TiFlash version, critical logs and system tables, see [Maintain a TiFlash cluster](/tiflash/maintain-tiflash.md). - To remove a specific TiFlash node, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 3e8a8e2151569..49de13096ff03 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -5,7 +5,7 @@ summary: Learn how to quickly get started with the TiDB HTAP. # Quick Start Guide for TiDB HTAP -This guide walks you through the quickest way to get started with TiDB's one-stop Hybrid Transactional and Analytical Processing (HTAP). +This guide walks you through the quickest way to get started with TiDB's one-stop solution of Hybrid Transactional and Analytical Processing (HTAP). > **Note:** > @@ -13,24 +13,24 @@ This guide walks you through the quickest way to get started with TiDB's one-sto ## Basic concepts -Before using TiDB HTAP, you need to have basic knowledge about [TiKV](/tikv-overview.md), a row-based storage engine for TiDB Online Transactional Processing (OLTP), and [TiFlash](/tiflash/tiflash-overview.md), a columnar storage engine for TiDB Online Analytical Processing (OLAP). +Before using TiDB HTAP, you need to have some basic knowledge about [TiKV](/tikv-overview.md), a row-based storage engine for TiDB Online Transactional Processing (OLTP), and [TiFlash](/tiflash/tiflash-overview.md), a columnar storage engine for TiDB Online Analytical Processing (OLAP). - Storage engines of HTAP: The row-based storage engine and the columnar storage engine co-exist for HTAP. Both storage engines can replicate data automatically and keep strong consistency. The row-based storage engine optimizes OLTP performance, and the columnar storage engine optimizes OLAP performance. -- Data consistency of HTAP: As a distributed and transactional key-value database, TiKV provides transactional APIs with ACID compliance. With the implementation of the [Raft onsensus algorithm](https://raft.github.io/raft.pdf), TiKV guarantees data consistency between multiple replicas and high availability. TiFlash replicates data from TiKV in real time, which ensures that data is strongly consistent between TiKV. +- Data consistency of HTAP: As a distributed and transactional key-value database, TiKV provides transactional interfaces with ACID compliance, and guarantees data consistency between multiple replicas and high availability with the implementation of the [Raft consensus algorithm](https://raft.github.io/raft.pdf). As a columnar storage extension of TiKV, TiFlash replicates data from TiKV in real time according to the Raft Learner consensus algorithm, which ensures that data is strongly consistent between TiKV and TiFlash. - Data isolation of HTAP: TiKV and TiFlash can be deployed on different machines as needed to solve the problem of HTAP resource isolation. -- MPP computing engine: [MPP](/tiflash/use-tiflash.md#control-whether-to-select-the-mpp-mode) is a distributed computing framework provided by the TiFlash engine since TiDB 5.0, which allows data exchange between nodes and provides high-performance, high-throughput SQL algorithms. If the MPP mode is enabled, the run time of the analytic queries can be significantly reduced. +- MPP computing engine: [MPP](/tiflash/use-tiflash.md#control-whether-to-select-the-mpp-mode) is a distributed computing framework provided by the TiFlash engine since TiDB 5.0, which allows data exchange between nodes and provides high-performance, high-throughput SQL algorithms. In the MPP mode, the run time of the analytic queries can be significantly reduced. ## Steps -This document take a popular [TPC-H](http://www.tpc.org/tpch/) dataset as an example to guide you to experience the convenience and high performance of TiDB HTAP by trying to query. +In this document, you can experience the convenience and high performance of TiDB HTAP by querying an example table in a popular [TPC-H](http://www.tpc.org/tpch/) dataset. -### Step 1: Prerequisites for deployment +### Step 1. Deploy a local test environment Before using TiDB HTAP, follow the steps in the [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md) to deploy a local test environment. In [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md): -- You are recommended to run `tiup playground` to start a TiDB cluster of the latest version. When you run the following command, 1 TiDB instance, 1 TiKV instance, 1 PD instance, and 1 TiFlash instance are deloyed automatically: +- You are recommended to run `tiup playground` to deploy a TiDB cluster of the latest version. When you run the following command, 1 TiDB instance, 1 TiKV instance, 1 PD instance, and 1 TiFlash instance are deployed automatically: {{< copyable "shell-regular" >}} @@ -38,7 +38,7 @@ In [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md) tiup playground ``` -- If you want to specify the TiDB version and the number of the instances of each component, run the command that specifies the number of the instances of TiFlash as following: +- If you want to specify the TiDB version and the number of the instances of each component, you need to also specify the number of the TiFlash instances as in the following example command: {{< copyable "shell-regular" >}} @@ -50,15 +50,15 @@ In [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md) > > `tiup playground` command is ONLY for quick start, NOT for production. -### Step 2: Prerequisites for data +### Step 2. Prepare test data -By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) dataset for using TiDB HTAP. If you are interested in TPC-H, see [General Implementation Guidelines](http://tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.0.pdf). +In the following steps, you can create a [TPC-H](http://www.tpc.org/tpch/) dataset as the test data to use TiDB HTAP. If you are interested in TPC-H, see [General Implementation Guidelines](http://tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.0.pdf). > **Note:** > -> If you want to use your existing data for analytic queries, you can [migrate your data to TiDB](/migration-overview.md). If you want to design and create your own data, you can create it by executing SQL statements or using related tools. +> If you want to use your existing data for analytic queries, you can [migrate your data to TiDB](/migration-overview.md). If you want to design and create your own test data, you can create it by executing SQL statements or using related tools. -1. Install the tool that creates data by running the following command: +1. Install the test data generation tool by running the following command: {{< copyable "shell-regular" >}} @@ -66,7 +66,7 @@ By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) d tiup install bench ``` -2. Create the data by by running the following command: +2. Generate the test data by running the following command: {{< copyable "shell-regular" >}} @@ -76,7 +76,7 @@ By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) d If the output of this command shows `Finished`, it indicates that the data is created. -3. Execute the following SQL statement to see the created data: +3. Execute the following SQL statement to view the generated data: {{< copyable "sql" >}} @@ -93,7 +93,7 @@ By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) d table_schema='test'; ``` - As you can see from the output, eight tables are created in total, and the largest table has 6.5 million rows of data (the actual amount of the created data is in line with the value of the SQL query, because the data is randomly created by the tool). + As you can see from the output, eight tables are created in total, and the largest table has 6.5 million rows (the number of rows created by the tool depends on the actual SQL query result because the data is randomly generated). ```sql +---------------+----------------+-----------+------------+-----------+ @@ -111,11 +111,11 @@ By following these steps, you can create the [TPC-H](http://www.tpc.org/tpch/) d 8 rows in set (0.06 sec) ``` - This is a database of a commercial ordering system. In which, the `test.nation` table indicates the information about countries, the `test.region` table indicates the information about regions, the `test.part` table indicates the information about parts, the `test.supplier` table indicates the information about suppliers, the `test.partsupp` table indicates the information about parts from suppliers, the `test.customer` table indicates the information about customers, the `test.customer` table indicates the information about orders, and the `test.lineitem` table indicates the information about online items. + This is a database of a commercial ordering system. In which, the `test.nation` table indicates the information about countries, the `test.region` table indicates the information about regions, the `test.part` table indicates the information about parts, the `test.supplier` table indicates the information about suppliers, the `test.partsupp` table indicates the information about parts of suppliers, the `test.customer` table indicates the information about customers, the `test.customer` table indicates the information about orders, and the `test.lineitem` table indicates the information about online items. -### Step 3: Query with the row-based storage engine +### Step 3. Query data with the row-based storage engine -You can see the performance of TiDB when using only the row-based storage engine (for most databases) by executing the following SQL statements: +To know the performance of TiDB with only the row-based storage engine, execute the following SQL statements: {{< copyable "sql" >}} @@ -149,9 +149,9 @@ limit 10; This is a shipping priority query that gives priority and potential revenue to the highest-revenue order that has not been shipped by a specified date. The potential income is defined as the sum of `l_extendedprice * (1-l_discount)`. The orders are listed in descending order of revenue. In this example, this query lists the unshipped orders with potential query revenue in the top 10. -### Step 4: Replication with the columnar storage engine +### Step 4. Replicate the test data to the columnar storage engine -After TiFlash is deployed, data replication does not automatically begin. You need to send a DDL statement to TiDB through a MySQL client to create a TiFlash replica for a specific table: +After TiFlash is deployed, TiKV does not replicate data to TiFlash immediately. You need to execute the following DDL statements in a MySQL client of TiDB to specify which tables need to be replicated. After that, TiDB will create the specified replicas in TiFlash accordingly. {{< copyable "sql" >}} @@ -161,7 +161,7 @@ ALTER TABLE test.orders SET TIFLASH REPLICA 1; ALTER TABLE test.lineitem SET TIFLASH REPLICA 1; ``` -You can check the status of the TiFlash replicas of a specific table using the following statement: +To check the replication status of the specific tables, execute the following statements: {{< copyable "sql" >}} @@ -171,16 +171,16 @@ SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'lineitem'; ``` -In the result of above statement: +In the result of the above statements: -- `AVAILABLE` indicates whether the TiFlash replicas of this table is available or not. `1` means available and `0` means unavailable. Once the replicas become available, this status does not change. If you use DDL statements to modify the number of replicas, the replication status will be recalculated. +- `AVAILABLE` indicates whether the TiFlash replica of a specific table is available or not. `1` means available and `0` means unavailable. Once a replica becomes available, this status does not change any more. If you use DDL statements to modify the number of replicas, the replication status will be recalculated. - `PROGRESS` means the progress of the replication. The value is between 0.0 and 1.0. 1 means at least one replica is replicated. -### Step 5: analyze data faster using HTAP +### Step 5. Analyze data faster using HTAP -If you execute the SQL statements in [Step 3](#step-3-query-with-the-row-based-storage-engine), you can see the performance of TiDB HTAP. +Execute the SQL statements in [Step 3](#step-3-query-with-the-row-based-storage-engine) again, and you can see the performance of TiDB HTAP. -For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. You can use the `desc` or `explain analyze` statement to check whether or not a TiFlash replica is selected. For example: +For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. To check whether or not a TiFlash replica is selected, you can use the `desc` or `explain analyze` statement. For example: {{< copyable "sql" >}} @@ -214,7 +214,7 @@ limit 10; If the result of the `EXPLAIN` statement shows `ExchangeSender` and `ExchangeReceiver` operators, it indicates that the MPP mode has taken effect. -In addition, you can specify that each part of the entire query is computed using only the TiFlash. For detailed inforamtion, see [Use TiDB to read TiFlash replicas](/tiflash/use-tiflash.md#use-tidb-to-read-tiflash-replicas). +In addition, you can specify that each part of the entire query is computed using only the TiFlash engine. For detailed information, see [Use TiDB to read TiFlash replicas](/tiflash/use-tiflash.md#use-tidb-to-read-tiflash-replicas). You can compare query results and query performance of these two methods. From 22eb9ec7a0699e9b9c403ec501f1d361ee70e3cd Mon Sep 17 00:00:00 2001 From: Enwei Date: Thu, 26 Aug 2021 14:05:47 +0200 Subject: [PATCH 11/18] Apply suggestions from code review Co-authored-by: Grace Cai --- explore-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/explore-htap.md b/explore-htap.md index 7d387248d405f..f146969570fcc 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -23,7 +23,7 @@ The following are the typical use cases of HTAP: - Real-time stream processing - When using TiDB in real-time stream processing scenarios, TiDB ensures that the continuously flowed data is queried in real time. At the same time, TiDB also can handle highly concurrent workloads and Business Intelligence (BI) queries. + When using TiDB in real-time stream processing scenarios, TiDB ensures that all the data flowed in constantly can be queried in real time. At the same time, TiDB also can handle highly concurrent data workloads and Business Intelligence (BI) queries. - Data center From fa0843761669713bcdbb202364e88e84ae6a8392 Mon Sep 17 00:00:00 2001 From: Enwei Date: Thu, 26 Aug 2021 14:14:53 +0200 Subject: [PATCH 12/18] fix CI error --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 49de13096ff03..3ca79c8885be9 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -178,7 +178,7 @@ In the result of the above statements: ### Step 5. Analyze data faster using HTAP -Execute the SQL statements in [Step 3](#step-3-query-with-the-row-based-storage-engine) again, and you can see the performance of TiDB HTAP. +Execute the SQL statements in [Step 3](#step-3-query-data-with-the-row-based-storage-engine) again, and you can see the performance of TiDB HTAP. For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. To check whether or not a TiFlash replica is selected, you can use the `desc` or `explain analyze` statement. For example: From 097205964d2e659db349c7ea6b41989ac7e9b730 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 26 Aug 2021 22:18:21 +0800 Subject: [PATCH 13/18] Update explore-htap.md Co-authored-by: Enwei --- explore-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/explore-htap.md b/explore-htap.md index f146969570fcc..e16f2e76002b0 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -49,7 +49,7 @@ Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corr - If you mainly need OLTP that runs small-scale analytical processing, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries. - If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time. - - If the OLTP throughput is relatively high (for example, the rate of write throughput or update throughput is higher than 10 million lines/hours), the hot write regions and hot read regions can be formed. This is because the I/O usage in TiKV and TiFlash becomes the bottleneck due to limited write capacity of network and physical disk in this case. + - If the OLTP throughput is relatively high (for example, the rate of write throughput or update throughput is higher than 10 million lines/hours), the hot write regions and hot read regions can be formed. This is because the I/O usage in TiKV and TiFlash becomes the bottleneck due to the limited write capacity of network and physical disk in this case. At this point, the number of TiFlash nodes has a complex non-linear relationship with the quantity of analytical processing, so you need to tune the number of TiFlash nodes based on the specific status of the system. - TiSpark From b1ff5a9fcde2d90ce9116775b42b97b822587d97 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 26 Aug 2021 22:32:48 +0800 Subject: [PATCH 14/18] Apply suggestions from code review Co-authored-by: Enwei --- explore-htap.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/explore-htap.md b/explore-htap.md index e16f2e76002b0..7cceee951187d 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -25,9 +25,9 @@ The following are the typical use cases of HTAP: When using TiDB in real-time stream processing scenarios, TiDB ensures that all the data flowed in constantly can be queried in real time. At the same time, TiDB also can handle highly concurrent data workloads and Business Intelligence (BI) queries. -- Data center +- Data hub - When using TiDB as a data center, TiDB can meet specific business needs by seamlessly connecting the data for the application and the data warehouse. + When using TiDB as a data hub, TiDB can meet specific business needs by seamlessly connecting the data for the application and the data warehouse. For more information about use cases of TiDB HTAP, see [blogs about HTAP on the PingCAP website](https://pingcap.com/blog-cn/#HTAP). From 97b5d1d15befd2f3888cd53bdcde344240aec198 Mon Sep 17 00:00:00 2001 From: Enwei Date: Fri, 27 Aug 2021 07:31:27 +0200 Subject: [PATCH 15/18] Apply suggestions from code review Co-authored-by: Grace Cai --- explore-htap.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/explore-htap.md b/explore-htap.md index 7cceee951187d..75e1a1139bd53 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -13,13 +13,13 @@ This guide describes how to explore and use the features of TiDB Hybrid Transact ## Use cases -TiDB HTAP can handle the massive data that increases rapidly, reduce the cost of dev-ops, and be deployed as either on-premises or cloud easily, which brings the value of data assets in real time. +TiDB HTAP can handle the massive data that increases rapidly, reduce the cost of DevOps, and be deployed in either on-premises or cloud environments easily, which brings the value of data assets in real time. The following are the typical use cases of HTAP: - Hybrid workload - When using TiDB for real-time Online Analytical Processing (OLAP) that is in hybrid load scenarios, you only need to provide an entry point. TiDB automatically selects different processing engines based on the specific business. + When using TiDB for real-time Online Analytical Processing (OLAP) in hybrid load scenarios, you only need to provide an entry point of TiDB to your data. TiDB automatically selects different processing engines based on the specific business. - Real-time stream processing @@ -29,7 +29,7 @@ The following are the typical use cases of HTAP: When using TiDB as a data hub, TiDB can meet specific business needs by seamlessly connecting the data for the application and the data warehouse. -For more information about use cases of TiDB HTAP, see [blogs about HTAP on the PingCAP website](https://pingcap.com/blog-cn/#HTAP). +For more information about use cases of TiDB HTAP, see [blogs about HTAP on the PingCAP website](https://en.pingcap.com/blog/tag/HTAP). ## Architecture @@ -49,7 +49,7 @@ Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corr - If you mainly need OLTP that runs small-scale analytical processing, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries. - If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time. - - If the OLTP throughput is relatively high (for example, the rate of write throughput or update throughput is higher than 10 million lines/hours), the hot write regions and hot read regions can be formed. This is because the I/O usage in TiKV and TiFlash becomes the bottleneck due to the limited write capacity of network and physical disk in this case. At this point, the number of TiFlash nodes has a complex non-linear relationship with the quantity of analytical processing, so you need to tune the number of TiFlash nodes based on the specific status of the system. + - If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system. - TiSpark @@ -100,5 +100,5 @@ You are also welcome to create [Github Issues](https://github.com/pingcap/tiflas ## What's next -- To check the TiFlash version, critical logs and system tables, see [Maintain a TiFlash cluster](/tiflash/maintain-tiflash.md). +- To check the TiFlash version, critical logs, system tables, see [Maintain a TiFlash cluster](/tiflash/maintain-tiflash.md). - To remove a specific TiFlash node, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). From 86a365483f7ff18b99befbdb28bac86a3d1e1a5d Mon Sep 17 00:00:00 2001 From: Enwei Date: Fri, 27 Aug 2021 10:25:53 +0200 Subject: [PATCH 16/18] Apply suggestions from code review Co-authored-by: Grace Cai --- quick-start-with-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 3ca79c8885be9..3f6f1794d2810 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -147,7 +147,7 @@ ORDER BY limit 10; ``` -This is a shipping priority query that gives priority and potential revenue to the highest-revenue order that has not been shipped by a specified date. The potential income is defined as the sum of `l_extendedprice * (1-l_discount)`. The orders are listed in descending order of revenue. In this example, this query lists the unshipped orders with potential query revenue in the top 10. +This is a shipping priority query, which provides the priority and potential revenue of the highest-revenue order that has not been shipped before a specified date. The potential revenue is defined as the sum of `l_extendedprice * (1-l_discount)`. The orders are listed in the descending order of revenue. In this example, this query lists the unshipped orders with potential query revenue in the top 10. ### Step 4. Replicate the test data to the columnar storage engine From c82a7302311bb353465df9d7b030b67152ea5493 Mon Sep 17 00:00:00 2001 From: qiancai Date: Fri, 27 Aug 2021 18:52:25 +0800 Subject: [PATCH 17/18] Update quick-start-with-htap.md --- quick-start-with-htap.md | 22 +++++----------------- 1 file changed, 5 insertions(+), 17 deletions(-) diff --git a/quick-start-with-htap.md b/quick-start-with-htap.md index 3f6f1794d2810..e5b7d3e8366b9 100644 --- a/quick-start-with-htap.md +++ b/quick-start-with-htap.md @@ -26,25 +26,13 @@ In this document, you can experience the convenience and high performance of TiD ### Step 1. Deploy a local test environment -Before using TiDB HTAP, follow the steps in the [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md) to deploy a local test environment. +Before using TiDB HTAP, follow the steps in the [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md) to prepare a local test environment, and run the following command to deploy a TiDB cluster: -In [Quick Start Guide for the TiDB Database Platform](/quick-start-with-tidb.md): +{{< copyable "shell-regular" >}} -- You are recommended to run `tiup playground` to deploy a TiDB cluster of the latest version. When you run the following command, 1 TiDB instance, 1 TiKV instance, 1 PD instance, and 1 TiFlash instance are deployed automatically: - - {{< copyable "shell-regular" >}} - - ```shell - tiup playground - ``` - -- If you want to specify the TiDB version and the number of the instances of each component, you need to also specify the number of the TiFlash instances as in the following example command: - - {{< copyable "shell-regular" >}} - - ```shell - tiup playground v5.1.0 --db 2 --pd 3 --kv 3 --tiflash 1 --monitor - ``` +```shell +tiup playground +``` > **Note:** > From fbd524322757fd3db3338d2eacfc140367a2b0ae Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 27 Aug 2021 18:54:22 +0800 Subject: [PATCH 18/18] Update explore-htap.md Co-authored-by: Enwei --- explore-htap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/explore-htap.md b/explore-htap.md index 75e1a1139bd53..cd434d9558217 100644 --- a/explore-htap.md +++ b/explore-htap.md @@ -47,7 +47,7 @@ Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corr - If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md). - When deciding how to choose the number of TiFlash nodes, consider the following scenarios: - - If you mainly need OLTP that runs small-scale analytical processing, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries. + - If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries. - If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time. - If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.