From 12df19f98b2935782b2db9e80bd0ff749f1d04d1 Mon Sep 17 00:00:00 2001 From: lilin90 Date: Wed, 29 May 2024 15:46:58 +0800 Subject: [PATCH 1/6] Add warning and update wording for extended statistics --- extended-statistics.md | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/extended-statistics.md b/extended-statistics.md index b6a2fae9456c7..5e973d4a7c739 100644 --- a/extended-statistics.md +++ b/extended-statistics.md @@ -5,20 +5,18 @@ summary: Learn how to use extended statistics to guide the optimizer. # Introduction to Extended Statistics -TiDB can collect the following two types of statistics: +TiDB can collect the following two types of statistics. This documents describes how to use extended statistics to guide the optimizer. Before reading this document, it is recommended that you read [Introduction to Statistics](/statistics.md) first. - Basic statistics: statistics such as histograms and Count-Min Sketch. See [Introduction to Statistics](/statistics.md) for details. - Extended statistics: statistics filtered by tables and columns. -> **Tip:** -> -> Before reading this document, it is recommended that you read [Introduction to Statistics](/statistics.md) first. - When the `ANALYZE` statement is executed manually or automatically, TiDB by default only collects the basic statistics and does not collect the extended statistics. This is because the extended statistics are only used for optimizer estimates in specific scenarios, and collecting them requires additional overhead. -Extended statistics are disabled by default. To collect extended statistics, you need to first enable the extended statistics, and then register each individual extended statistics object. +Extended statistics are disabled by default. To collect extended statistics, you need to first enable the extended statistics, and then create every single extended statistics object. After the objects have been created, the next time the `ANALYZE` statement is executed, TiDB collects both the basic statistics and the corresponding extended statistics. -After the registration, the next time the `ANALYZE` statement is executed, TiDB collects both the basic statistics and the registered extended statistics. +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. ## Limitations @@ -40,17 +38,17 @@ SET GLOBAL tidb_enable_extended_stats = ON; The default value of this variable is `OFF`. The setting of this system variable applies to all extended statistics objects. -### Register extended statistics +### Create extended statistics objects The registration for extended statistics is not a one-time task, and you need repeat the registration for each extended statistics object. -To register extended statistics, use the SQL statement `ALTER TABLE ADD STATS_EXTENDED`. The syntax is as follows: +To create extended statistics objects, use the SQL statement `ALTER TABLE ADD STATS_EXTENDED`. The syntax is as follows: ```sql ALTER TABLE table_name ADD STATS_EXTENDED IF NOT EXISTS stats_name stats_type(column_name, column_name...); ``` -In the syntax, you can specify the table name, statistics name, statistics type, and column name of the extended statistics to be collected. +In the syntax, you can specify the table name, statistics name, statistics type, and column name of the extended statistics object to be collected. - `table_name` specifies the name of the table from which the extended statistics are collected. - `stats_name` specifies the name of the statistics object, which must be unique for each table. @@ -60,7 +58,7 @@ In the syntax, you can specify the table name, statistics name, statistics type,
How it works -To improve access performance, each TiDB node maintains a cache in the system table `mysql.stats_extended` for extended statistics. After you register the extended statistics, the next time the `ANALYZE` statement is executed, TiDB will collect the extended statistics if the system table `mysql.stats_extended` has the corresponding objects. +To improve access performance, each TiDB node maintains a cache in the system table `mysql.stats_extended` for extended statistics. After you create the extended statistics objects, the next time the `ANALYZE` statement is executed, TiDB will collect the extended statistics if the system table `mysql.stats_extended` has the corresponding objects. Each row in the `mysql.stats_extended` table has a `version` column. Once a row is updated, the value of `version` is increased. In this way, TiDB loads the table into memory incrementally, instead of fully. @@ -78,7 +76,7 @@ TiDB loads `mysql.stats_extended` periodically to ensure that the cache is kept
-### Delete extended statistics +### Delete extended statistics objects To delete an extended statistics object, use the following statement: @@ -140,13 +138,13 @@ Without extended statistics, the TiDB optimizer only supposes that `col1` and `c ### Step 3. Enable extended statistics -Set `tidb_enable_extended_stats` to `ON`, and register the extended statistics object for `col1` and `col2`: +Set `tidb_enable_extended_stats` to `ON`, and create the extended statistics object for `col1` and `col2`: ```sql ALTER TABLE t ADD STATS_EXTENDED s1 correlation(col1, col2); ``` -When you execute `ANALYZE` after the registration, TiDB calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) of `col` and `col2` of table `t`, and writes the object into the `mysql.stats_extended` table. +When you execute `ANALYZE` after the registration, TiDB calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) of `col1` and `col2` of table `t`, and writes the object into the `mysql.stats_extended` table. ### Step 4. See how extended statistics make a difference From 9705e20eb915b62df1a9b9b406a8dff592dbd632 Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Wed, 29 May 2024 16:23:34 +0800 Subject: [PATCH 2/6] Update wording --- extended-statistics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extended-statistics.md b/extended-statistics.md index 5e973d4a7c739..c2723fdf03c55 100644 --- a/extended-statistics.md +++ b/extended-statistics.md @@ -12,7 +12,7 @@ TiDB can collect the following two types of statistics. This documents describes When the `ANALYZE` statement is executed manually or automatically, TiDB by default only collects the basic statistics and does not collect the extended statistics. This is because the extended statistics are only used for optimizer estimates in specific scenarios, and collecting them requires additional overhead. -Extended statistics are disabled by default. To collect extended statistics, you need to first enable the extended statistics, and then create every single extended statistics object. After the objects have been created, the next time the `ANALYZE` statement is executed, TiDB collects both the basic statistics and the corresponding extended statistics. +Extended statistics are disabled by default. To collect extended statistics, you need to first enable the extended statistics, and then create every single extended statistics object. After the objects have been created, the next time the `ANALYZE` statement is executed, TiDB collects both the basic statistics and the corresponding extended statistics of the created objects. > **Warning:** > From e482e7d3f8d20387332fe61cb614dc12d4e2b6c8 Mon Sep 17 00:00:00 2001 From: lilin90 Date: Wed, 29 May 2024 16:36:37 +0800 Subject: [PATCH 3/6] Update wording --- extended-statistics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/extended-statistics.md b/extended-statistics.md index c2723fdf03c55..93f6127a7e173 100644 --- a/extended-statistics.md +++ b/extended-statistics.md @@ -40,7 +40,7 @@ The default value of this variable is `OFF`. The setting of this system variable ### Create extended statistics objects -The registration for extended statistics is not a one-time task, and you need repeat the registration for each extended statistics object. +The creation of extended statistics objects is not a one-time task. You need to repeat the creation for each extended statistics object. To create extended statistics objects, use the SQL statement `ALTER TABLE ADD STATS_EXTENDED`. The syntax is as follows: @@ -144,7 +144,7 @@ Set `tidb_enable_extended_stats` to `ON`, and create the extended statistics obj ALTER TABLE t ADD STATS_EXTENDED s1 correlation(col1, col2); ``` -When you execute `ANALYZE` after the registration, TiDB calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) of `col1` and `col2` of table `t`, and writes the object into the `mysql.stats_extended` table. +When you execute `ANALYZE` after the object creation, TiDB calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) of `col1` and `col2` of table `t`, and writes the object into the `mysql.stats_extended` table. ### Step 4. See how extended statistics make a difference From 249aef791d802b8ea6b82bf7ef1f220af2c67e5a Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Wed, 29 May 2024 18:15:18 +0800 Subject: [PATCH 4/6] Refine wording Co-authored-by: Grace Cai --- extended-statistics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/extended-statistics.md b/extended-statistics.md index 93f6127a7e173..62ff363071a04 100644 --- a/extended-statistics.md +++ b/extended-statistics.md @@ -12,7 +12,7 @@ TiDB can collect the following two types of statistics. This documents describes When the `ANALYZE` statement is executed manually or automatically, TiDB by default only collects the basic statistics and does not collect the extended statistics. This is because the extended statistics are only used for optimizer estimates in specific scenarios, and collecting them requires additional overhead. -Extended statistics are disabled by default. To collect extended statistics, you need to first enable the extended statistics, and then create every single extended statistics object. After the objects have been created, the next time the `ANALYZE` statement is executed, TiDB collects both the basic statistics and the corresponding extended statistics of the created objects. +Extended statistics are disabled by default. To collect extended statistics, you need to first enable extended statistics, and then create your desired extended statistics objects one by one. After the object creation, the next time the `ANALYZE` statement is executed, TiDB collects both the basic statistics and the corresponding extended statistics of the created objects. > **Warning:** > @@ -42,7 +42,7 @@ The default value of this variable is `OFF`. The setting of this system variable The creation of extended statistics objects is not a one-time task. You need to repeat the creation for each extended statistics object. -To create extended statistics objects, use the SQL statement `ALTER TABLE ADD STATS_EXTENDED`. The syntax is as follows: +To create an extended statistics object, use the SQL statement `ALTER TABLE ADD STATS_EXTENDED`. The syntax is as follows: ```sql ALTER TABLE table_name ADD STATS_EXTENDED IF NOT EXISTS stats_name stats_type(column_name, column_name...); From 4d2077a14a1e26e10cb6644f2c2efc685329ac0c Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Thu, 27 Jun 2024 15:07:21 +0800 Subject: [PATCH 5/6] Apply suggestions from code review Co-authored-by: Grace Cai --- extended-statistics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/extended-statistics.md b/extended-statistics.md index 62ff363071a04..2a757e2b8fe1e 100644 --- a/extended-statistics.md +++ b/extended-statistics.md @@ -7,8 +7,8 @@ summary: Learn how to use extended statistics to guide the optimizer. TiDB can collect the following two types of statistics. This documents describes how to use extended statistics to guide the optimizer. Before reading this document, it is recommended that you read [Introduction to Statistics](/statistics.md) first. -- Basic statistics: statistics such as histograms and Count-Min Sketch. See [Introduction to Statistics](/statistics.md) for details. -- Extended statistics: statistics filtered by tables and columns. +- Basic statistics: statistics such as histograms and Count-Min Sketch, which primarily focus on individual columns. They are essential for the optimizer to estimate the query cost. See [Introduction to Statistics](/statistics.md) for details. +- Extended statistics: statistics that focus on data correlations between specified columns, which guide the optimizer to estimate the query cost more precisely when the queried columns are correlated. When the `ANALYZE` statement is executed manually or automatically, TiDB by default only collects the basic statistics and does not collect the extended statistics. This is because the extended statistics are only used for optimizer estimates in specific scenarios, and collecting them requires additional overhead. From 73bc70cc7dccef8875d8081965ae3a8c28502208 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 27 Jun 2024 15:22:40 +0800 Subject: [PATCH 6/6] fix typo --- extended-statistics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extended-statistics.md b/extended-statistics.md index 2a757e2b8fe1e..93501eead3836 100644 --- a/extended-statistics.md +++ b/extended-statistics.md @@ -5,7 +5,7 @@ summary: Learn how to use extended statistics to guide the optimizer. # Introduction to Extended Statistics -TiDB can collect the following two types of statistics. This documents describes how to use extended statistics to guide the optimizer. Before reading this document, it is recommended that you read [Introduction to Statistics](/statistics.md) first. +TiDB can collect the following two types of statistics. This document describes how to use extended statistics to guide the optimizer. Before reading this document, it is recommended that you read [Introduction to Statistics](/statistics.md) first. - Basic statistics: statistics such as histograms and Count-Min Sketch, which primarily focus on individual columns. They are essential for the optimizer to estimate the query cost. See [Introduction to Statistics](/statistics.md) for details. - Extended statistics: statistics that focus on data correlations between specified columns, which guide the optimizer to estimate the query cost more precisely when the queried columns are correlated.