From 72a37d3a75184e6d3e23379149a3f79e64da98fd Mon Sep 17 00:00:00 2001 From: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com> Date: Thu, 26 Aug 2021 15:04:06 +0800 Subject: [PATCH 1/4] This is an automated cherry-pick of #6188 Signed-off-by: ti-chi-bot --- tiflash/use-tiflash.md | 54 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md index 2474bd665c622..5be511d7a11f4 100644 --- a/tiflash/use-tiflash.md +++ b/tiflash/use-tiflash.md @@ -217,9 +217,63 @@ You can configure this parameter in one of the following ways: ## Supported push-down calculations +<<<<<<< HEAD > **Note:** > > Before v4.0.2, TiDB does not support the new framework for collations, so in those previous versions, if you enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations), none of the expressions can be pushed down. This restriction is removed in v4.0.2 and later versions. +======= +TiFlash supports the push-down of the following operators: + +* TableScan: Reads data from tables. +* Selection: Filters data. +* HashAgg: Performs data aggregation based on the [Hash Aggregation](/explain-aggregation.md#hash-aggregation) algorithm. +* StreamAgg: Performs data aggregation based on the [Stream Aggregation](/explain-aggregation.md#stream-aggregation) algorithm. SteamAgg only supports the aggregation without the `GROUP BY` condition. +* TopN: Performs the TopN calculation. +* Limit: Performs the limit calculation. +* Project: Performs the projection calculation. +* HashJoin (Equi Join): Performs the join calculation based on the [Hash Join](/explain-joins.md#hash-join) algorithm, but with the following conditions: + * The operator can be pushed down only in the [MPP mode](#use-the-mpp-mode). + * The push-down of `Full Outer Join` is not supported. +* HashJoin (Non-Equi Join): Performs the Cartesian Join algorithm, but with the following conditions: + * The operator can be pushed down only in the [MPP mode](#use-the-mpp-mode). + * Cartesian Join is supported only in Broadcast Join. + +In TiDB, operators are organized in a tree structure. For an operator to be pushed down to TiFlash, all of the following prerequisites must be met: + ++ All of its child operators can be pushed down to TiFlash. ++ If an operator contains expressions (most of the operators contain expressions), all expressions of the operator can be pushed down to TiFlash. + +Currently, TiFlash supports the following push-down expressions: + +* Mathematical functions: `+, -, /, *, %, >=, <=, =, !=, <, >, round(int), round(double), round(decimal), abs, floor(int), ceil(int), ceiling(int), sqrt, log, log2, log10, ln, exp, pow, sign, radians, degrees, conv, crc32` +* Logical functions: `and, or, not, case when, if, ifnull, isnull, in, like, coalesce` +* Bitwise operations: `bitand, bitor, bigneg, bitxor` +* String functions: `substr, char_length, replace, concat, concat_ws, left, right, ascii, length, trim, position` +* Date functions: `date_format, timestampdiff, from_unixtime, unix_timestamp(int), unix_timestamp(decimal), str_to_date(date), str_to_date(datetime), datediff, year, month, day, extract(datetime), date` +* JSON function: `json_length` +* Conversion functions: `cast(int as double), cast(int as decimal), cast(int as string), cast(int as time), cast(double as int), cast(double as decimal), cast(double as string), cast(double as time), cast(string as int), cast(string as double), cast(string as decimal), cast(string as time), cast(decimal as int), cast(decimal as string), cast(decimal as time), cast(time as int), cast(time as decimal), cast(time as string)` +* Aggregate functions: `min, max, sum, count, avg, approx_count_distinct` +* Miscellaneous functions: `inetntoa, inetaton, inet6ntoa, inet6aton` + +In addition, expressions that contain the Time/Bit/Set/Enum/Geometry type cannot be pushed down to TiFlash. + +If a query encounters unsupported push-down calculations, TiDB needs to complete the remaining calculations, which might greatly affect the TiFlash acceleration effect. The currently unsupported operators and expressions might be supported in future versions. + +## Use the MPP mode + +TiFlash supports using the MPP mode to execute queries, which introduces cross-node data exchange (data shuffle process) into the computation. TiDB automatically determines whether to select the MPP mode using the optimizer's cost estimation. You can change the selection strategy by modifying the values of [`tidb_allow_mpp`](/system-variables.md#tidb_allow_mpp-new-in-v50) and [`tidb_enforce_mpp`](/system-variables.md#tidb_enforce_mpp-new-in-v51). + +### Control whether to select the MPP mode + +The `tidb_allow_mpp` variable controls whether TiDB can select the MPP mode to execute queries. The `tidb_enforce_mpp` variable controls whether the optimizer's cost estimation is ignored and the MPP mode of TiFlash is forcibly used to execute queries. + +The results corresponding to all values of these two variables are as follows: + +| | tidb_allow_mpp=off | tidb_allow_mpp=on (by default) | +| ---------------------- | -------------------- | -------------------------------- | +| tidb_enforce_mpp=off (by default) | The MPP mode is not used. | The optimizer selects the MPP mode based on cost estimation. (by default)| +| tidb_enforce_mpp=on | The MPP mode is not used. | TiDB ignores the cost estimation and selects the MPP mode. | +>>>>>>> 36ef0f5a0 (tiflash: Remove the description about expr_blacklist in use_tiflash.md (#6188)) TiFlash supports predicate, aggregate push-down calculations, and table joins. Push-down calculations can help TiDB perform distributed acceleration. Currently, `Full Outer Join` and `DISTINCT COUNT` are not the supported calculation types, which will be optimized in later versions. From 48a6fde6df00332f4bf27c64e76d66d20f33a56d Mon Sep 17 00:00:00 2001 From: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com> Date: Mon, 30 Aug 2021 11:47:55 +0800 Subject: [PATCH 2/4] Update tiflash/use-tiflash.md --- tiflash/use-tiflash.md | 53 ------------------------------------------ 1 file changed, 53 deletions(-) diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md index 5be511d7a11f4..0192c9afa4b7a 100644 --- a/tiflash/use-tiflash.md +++ b/tiflash/use-tiflash.md @@ -221,59 +221,6 @@ You can configure this parameter in one of the following ways: > **Note:** > > Before v4.0.2, TiDB does not support the new framework for collations, so in those previous versions, if you enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations), none of the expressions can be pushed down. This restriction is removed in v4.0.2 and later versions. -======= -TiFlash supports the push-down of the following operators: - -* TableScan: Reads data from tables. -* Selection: Filters data. -* HashAgg: Performs data aggregation based on the [Hash Aggregation](/explain-aggregation.md#hash-aggregation) algorithm. -* StreamAgg: Performs data aggregation based on the [Stream Aggregation](/explain-aggregation.md#stream-aggregation) algorithm. SteamAgg only supports the aggregation without the `GROUP BY` condition. -* TopN: Performs the TopN calculation. -* Limit: Performs the limit calculation. -* Project: Performs the projection calculation. -* HashJoin (Equi Join): Performs the join calculation based on the [Hash Join](/explain-joins.md#hash-join) algorithm, but with the following conditions: - * The operator can be pushed down only in the [MPP mode](#use-the-mpp-mode). - * The push-down of `Full Outer Join` is not supported. -* HashJoin (Non-Equi Join): Performs the Cartesian Join algorithm, but with the following conditions: - * The operator can be pushed down only in the [MPP mode](#use-the-mpp-mode). - * Cartesian Join is supported only in Broadcast Join. - -In TiDB, operators are organized in a tree structure. For an operator to be pushed down to TiFlash, all of the following prerequisites must be met: - -+ All of its child operators can be pushed down to TiFlash. -+ If an operator contains expressions (most of the operators contain expressions), all expressions of the operator can be pushed down to TiFlash. - -Currently, TiFlash supports the following push-down expressions: - -* Mathematical functions: `+, -, /, *, %, >=, <=, =, !=, <, >, round(int), round(double), round(decimal), abs, floor(int), ceil(int), ceiling(int), sqrt, log, log2, log10, ln, exp, pow, sign, radians, degrees, conv, crc32` -* Logical functions: `and, or, not, case when, if, ifnull, isnull, in, like, coalesce` -* Bitwise operations: `bitand, bitor, bigneg, bitxor` -* String functions: `substr, char_length, replace, concat, concat_ws, left, right, ascii, length, trim, position` -* Date functions: `date_format, timestampdiff, from_unixtime, unix_timestamp(int), unix_timestamp(decimal), str_to_date(date), str_to_date(datetime), datediff, year, month, day, extract(datetime), date` -* JSON function: `json_length` -* Conversion functions: `cast(int as double), cast(int as decimal), cast(int as string), cast(int as time), cast(double as int), cast(double as decimal), cast(double as string), cast(double as time), cast(string as int), cast(string as double), cast(string as decimal), cast(string as time), cast(decimal as int), cast(decimal as string), cast(decimal as time), cast(time as int), cast(time as decimal), cast(time as string)` -* Aggregate functions: `min, max, sum, count, avg, approx_count_distinct` -* Miscellaneous functions: `inetntoa, inetaton, inet6ntoa, inet6aton` - -In addition, expressions that contain the Time/Bit/Set/Enum/Geometry type cannot be pushed down to TiFlash. - -If a query encounters unsupported push-down calculations, TiDB needs to complete the remaining calculations, which might greatly affect the TiFlash acceleration effect. The currently unsupported operators and expressions might be supported in future versions. - -## Use the MPP mode - -TiFlash supports using the MPP mode to execute queries, which introduces cross-node data exchange (data shuffle process) into the computation. TiDB automatically determines whether to select the MPP mode using the optimizer's cost estimation. You can change the selection strategy by modifying the values of [`tidb_allow_mpp`](/system-variables.md#tidb_allow_mpp-new-in-v50) and [`tidb_enforce_mpp`](/system-variables.md#tidb_enforce_mpp-new-in-v51). - -### Control whether to select the MPP mode - -The `tidb_allow_mpp` variable controls whether TiDB can select the MPP mode to execute queries. The `tidb_enforce_mpp` variable controls whether the optimizer's cost estimation is ignored and the MPP mode of TiFlash is forcibly used to execute queries. - -The results corresponding to all values of these two variables are as follows: - -| | tidb_allow_mpp=off | tidb_allow_mpp=on (by default) | -| ---------------------- | -------------------- | -------------------------------- | -| tidb_enforce_mpp=off (by default) | The MPP mode is not used. | The optimizer selects the MPP mode based on cost estimation. (by default)| -| tidb_enforce_mpp=on | The MPP mode is not used. | TiDB ignores the cost estimation and selects the MPP mode. | ->>>>>>> 36ef0f5a0 (tiflash: Remove the description about expr_blacklist in use_tiflash.md (#6188)) TiFlash supports predicate, aggregate push-down calculations, and table joins. Push-down calculations can help TiDB perform distributed acceleration. Currently, `Full Outer Join` and `DISTINCT COUNT` are not the supported calculation types, which will be optimized in later versions. From ba07b82ce9aa175b0b93ce15d2e94801437de98b Mon Sep 17 00:00:00 2001 From: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com> Date: Mon, 30 Aug 2021 11:48:12 +0800 Subject: [PATCH 3/4] Update tiflash/use-tiflash.md --- tiflash/use-tiflash.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md index 0192c9afa4b7a..2474bd665c622 100644 --- a/tiflash/use-tiflash.md +++ b/tiflash/use-tiflash.md @@ -217,7 +217,6 @@ You can configure this parameter in one of the following ways: ## Supported push-down calculations -<<<<<<< HEAD > **Note:** > > Before v4.0.2, TiDB does not support the new framework for collations, so in those previous versions, if you enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations), none of the expressions can be pushed down. This restriction is removed in v4.0.2 and later versions. From 6db7f8c8080543702fbac6462f3a531e0459e30e Mon Sep 17 00:00:00 2001 From: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com> Date: Mon, 30 Aug 2021 11:50:07 +0800 Subject: [PATCH 4/4] Update use-tiflash.md --- tiflash/use-tiflash.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md index 2474bd665c622..962d9d8793f7c 100644 --- a/tiflash/use-tiflash.md +++ b/tiflash/use-tiflash.md @@ -232,12 +232,9 @@ set @@session.tidb_opt_broadcast_join=1 Currently, TiFlash supports pushing down a limited number of expressions, including: ``` -+, -, /, *, >=, <=, =, !=, <, >, ifnull, isnull, bitor, in, bitand, or, and, like, not, case when, month, substr, timestampdiff, date_format, from_unixtime, json_length, if, bitneg, bitxor, -round without fraction, cast(int as decimal), date_add(datetime, int), date_add(datetime, string), min, max, sum, count, avg, approx_count_distinct ++, -, /, *, >=, <=, =, !=, <, >, ifnull, isnull, bitor, in, bitand, or, and, like, not, case when, month, substr, timestampdiff, date_format, from_unixtime, json_length, if, bitneg, bitxor, round without fraction, cast(int as decimal), min, max, sum, count, avg, approx_count_distinct ``` -Among them, the push-down of `cast` and `date_add` is not enabled by default. To enable it, refer to [Blocklist of Optimization Rules and Expression Pushdown](/blocklist-control-plan.md). - TiFlash does not support push-down calculations in the following situations: - Expressions that contain the `Time` type cannot be pushed down.