From f8c77d7618c2c66d5ea99845181daf40079e39b5 Mon Sep 17 00:00:00 2001 From: Zhang Jian Date: Tue, 30 Jun 2020 12:29:07 +0800 Subject: [PATCH 1/5] add distinct optimization --- TOC.md | 1 + agg-distinct-optimization.md | 58 ++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 agg-distinct-optimization.md diff --git a/TOC.md b/TOC.md index d06064b0dbab1..ace34c266983e 100644 --- a/TOC.md +++ b/TOC.md @@ -102,6 +102,7 @@ + [Join Reorder](/join-reorder.md) + Physical Optimization + [Statistics](/statistics.md) + + [Distinct Optimization](/agg-distinct-optimization.md) + Control Execution Plan + [Optimizer Hints](/optimizer-hints.md) + [SQL Plan Management](/sql-plan-management.md) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md new file mode 100644 index 0000000000000..682608f69f42a --- /dev/null +++ b/agg-distinct-optimization.md @@ -0,0 +1,58 @@ +--- +title: Distinct Optimization +category: performance +--- + +# Distinct Optimization + +This document introduces the `distinct` optimization in the TiDB query optimizer. Including `SELECT DISTINCT` and `DISTINCT` in the aggregate functions. + +## `DISTINCT` modifier in `SELECT` statements + +The `DISTINCT` modifier specifies removal of duplicate rows from the result set. `SELECT DISTINCT` is transformed to `GROUP BY`, for example: + +```sql +mysql> explain SELECT DISTINCT a from t; ++--------------------------+---------+-----------+---------------+-------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++--------------------------+---------+-----------+---------------+-------------------------------------------------------+ +| HashAgg_6 | 2.40 | root | | group by:test.t.a, funcs:firstrow(test.t.a)->test.t.a | +| └─TableReader_11 | 3.00 | root | | data:TableFullScan_10 | +| └─TableFullScan_10 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++--------------------------+---------+-----------+---------------+-------------------------------------------------------+ +3 rows in set (0.00 sec) +``` + +## `DISTINCT` option in aggregate function + +Usually, aggregate function with `DISTINCT` option is executed in the TiDB layer in a single threded execution model. + +The system variable [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) config item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor. + +Take the following queries as an example of this optimization. `tidb_opt_distinct_agg_push_down` is disabled by default, which means the aggregate functions is executed in the TiDB layer. After enableing this optimization by setting its value to `1`, `count(distinct a)` is pushed to TiKV/TiFlash Coprocessor: there is a `HashAgg_5` in the TiKV Coprocessor to dedpulicate column `a` in the TiKV Coprocessor. It may helps to reduce the compution overhead of `HashAgg_8` in the TiDB layer. + +```sql +mysql> desc select count(distinct a) from test.t; ++-------------------------+----------+-----------+---------------+------------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+------------------------------------------+ +| StreamAgg_6 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#4 | +| └─TableReader_10 | 10000.00 | root | | data:TableFullScan_9 | +| └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+------------------------------------------+ +3 rows in set (0.01 sec) + +mysql> set session tidb_opt_distinct_agg_push_down = 1; +Query OK, 0 rows affected (0.00 sec) + +mysql> desc select count(distinct a) from test.t; ++---------------------------+----------+-----------+---------------+------------------------------------------+ +| id | estRows | task | access object | operator info | ++---------------------------+----------+-----------+---------------+------------------------------------------+ +| HashAgg_8 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#3 | +| └─TableReader_9 | 1.00 | root | | data:HashAgg_5 | +| └─HashAgg_5 | 1.00 | cop[tikv] | | group by:test.t.a, | +| └─TableFullScan_7 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++---------------------------+----------+-----------+---------------+------------------------------------------+ +4 rows in set (0.00 sec) +``` From 94912f95a562c95bfabbf9d346d362782f56aeb1 Mon Sep 17 00:00:00 2001 From: Zhang Jian Date: Mon, 6 Jul 2020 23:08:40 +0800 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: Feng Liyuan --- agg-distinct-optimization.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index 682608f69f42a..f5588df186469 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -5,7 +5,7 @@ category: performance # Distinct Optimization -This document introduces the `distinct` optimization in the TiDB query optimizer. Including `SELECT DISTINCT` and `DISTINCT` in the aggregate functions. +This document introduces the `distinct` optimization in the TiDB query optimizer, including `SELECT DISTINCT` and `DISTINCT` in the aggregate functions. ## `DISTINCT` modifier in `SELECT` statements @@ -23,11 +23,11 @@ mysql> explain SELECT DISTINCT a from t; 3 rows in set (0.00 sec) ``` -## `DISTINCT` option in aggregate function +## `DISTINCT` option in aggregate functions -Usually, aggregate function with `DISTINCT` option is executed in the TiDB layer in a single threded execution model. +Usually, aggregate functions with the `DISTINCT` option is executed in the TiDB layer in a single-threaded execution model. -The system variable [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) config item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor. +The [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) system variable or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) configuration item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor. Take the following queries as an example of this optimization. `tidb_opt_distinct_agg_push_down` is disabled by default, which means the aggregate functions is executed in the TiDB layer. After enableing this optimization by setting its value to `1`, `count(distinct a)` is pushed to TiKV/TiFlash Coprocessor: there is a `HashAgg_5` in the TiKV Coprocessor to dedpulicate column `a` in the TiKV Coprocessor. It may helps to reduce the compution overhead of `HashAgg_8` in the TiDB layer. From 07e77919b51d00f6e2c46443d08a774a2af8b916 Mon Sep 17 00:00:00 2001 From: Zhang Jian Date: Sat, 11 Jul 2020 10:43:20 +0800 Subject: [PATCH 3/5] Update agg-distinct-optimization.md Co-authored-by: Feng Liyuan --- agg-distinct-optimization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index f5588df186469..d5838b9fb38c5 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -29,7 +29,7 @@ Usually, aggregate functions with the `DISTINCT` option is executed in the TiDB The [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) system variable or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) configuration item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor. -Take the following queries as an example of this optimization. `tidb_opt_distinct_agg_push_down` is disabled by default, which means the aggregate functions is executed in the TiDB layer. After enableing this optimization by setting its value to `1`, `count(distinct a)` is pushed to TiKV/TiFlash Coprocessor: there is a `HashAgg_5` in the TiKV Coprocessor to dedpulicate column `a` in the TiKV Coprocessor. It may helps to reduce the compution overhead of `HashAgg_8` in the TiDB layer. +Take the following queries as an example of this optimization. `tidb_opt_distinct_agg_push_down` is disabled by default, which means the aggregate functions are executed in the TiDB layer. After enabling this optimization by setting its value to `1`, the `distinct a` part of `count(distinct a)` is pushed to TiKV/TiFlash Coprocessor: there is a HashAgg_5 to remove the duplicated values on column a in the TiKV Coprocessor. It might reduce the computation overhead of `HashAgg_8` in the TiDB layer. ```sql mysql> desc select count(distinct a) from test.t; From 92e10679bc27e9893c1e70b6229ed76bf5f1a459 Mon Sep 17 00:00:00 2001 From: Keke Yi <40977455+yikeke@users.noreply.github.com> Date: Mon, 13 Jul 2020 17:30:11 +0800 Subject: [PATCH 4/5] category meta is not needed any longer --- agg-distinct-optimization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index d5838b9fb38c5..607a1e3fdce65 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -1,6 +1,6 @@ --- title: Distinct Optimization -category: performance +summary: Introduce the `distinct` optimization in the TiDB query optimizer. --- # Distinct Optimization From 1d5061e9ca41279ef9454bdbe4cb771919666897 Mon Sep 17 00:00:00 2001 From: yikeke Date: Thu, 16 Jul 2020 14:40:24 +0800 Subject: [PATCH 5/5] fix a link --- agg-distinct-optimization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index 607a1e3fdce65..3c137ed3869b9 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -27,7 +27,7 @@ mysql> explain SELECT DISTINCT a from t; Usually, aggregate functions with the `DISTINCT` option is executed in the TiDB layer in a single-threaded execution model. -The [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) system variable or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) configuration item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor. +The [`tidb_opt_distinct_agg_push_down`](/system-variables.md#tidb_opt_distinct_agg_push_down) system variable or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) configuration item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor. Take the following queries as an example of this optimization. `tidb_opt_distinct_agg_push_down` is disabled by default, which means the aggregate functions are executed in the TiDB layer. After enabling this optimization by setting its value to `1`, the `distinct a` part of `count(distinct a)` is pushed to TiKV/TiFlash Coprocessor: there is a HashAgg_5 to remove the duplicated values on column a in the TiKV Coprocessor. It might reduce the computation overhead of `HashAgg_8` in the TiDB layer.