From c597e9cee0922cd86627c24da7e28b374a47fcc4 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Thu, 25 Mar 2021 14:29:03 +0800
Subject: [PATCH 01/13] contents from docs-cn #5771
---
clustered-indexes.md | 293 ++++++++++++++++++++++++-------------------
1 file changed, 161 insertions(+), 132 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 9ee8bedc50980..0fb8a8895f700 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -1,188 +1,217 @@
---
title: Clustered Indexes
-summary: Learn how clustered indexes apply to TiDB.
+summary: Learn the concept, user scenario, usages, limitations, and compatibility of clustered indexes.
---
# Clustered Indexes
-The clustered index is an experimental feature introduced in TiDB 5.0.0-rc. This document provides multiple examples to explain how this feature makes a difference to the query performance of TiDB. To enable this feature and see the detailed operation guide, see [tidb_enable_clustered_index](/system-variables.md#tidb_enable_clustered_index-new-in-v500-rc).
+TiDB supports the clustered indexes feature since v5.0. This feature controls how data is stored in tables containing primary keys. It provides TiDB the ability to organize tables in a way that can improve the performance of certain queries.
-Clustered indexes provide TiDB the ability to organize tables in a way that can improve the performance of certain queries. The term _clustered_ in this context refers to the _organization of how data is stored_ and not _a group of database servers working together_. Some database management systems refer to clustered indexes as _index-organized tables_ (IOT).
+The term _clustered_ in this context refers to the _organization of how data is stored_ and not _a group of database servers working together_. Some database management systems refer to clustered indexes as _index-organized tables_ (IOT).
-TiDB supports clustering only by a table's `PRIMARY KEY`. With clustered indexes enabled, the terms _the_ `PRIMARY KEY` and _the clustered index_ might be used interchangeably. `PRIMARY KEY` refers to the constraint (a logical property), and clustered index describes the physical implementation of how the data is stored.
+Currently, tables containing primary keys in TiDB are divided into the following two categories:
-## Limited support before TiDB v5.0
+- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data are consists of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
+ - `_tidb_rowid` (key) - row data (value)
+ - Primary key data (key) - `_tidb_rowid` (value)
+- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data are consists of primary keys consists of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
+ - Primary key data (key) - row data (value)
-Before v5.0, TiDB has only limited support for clustered indexes, provided the following criteria are true:
+> **Note:**
+>
+> TiDB supports clustering only by a table's `PRIMARY KEY`. With clustered indexes enabled, the terms _the_ `PRIMARY KEY` and _the clustered index_ might be used interchangeably. `PRIMARY KEY` refers to the constraint (a logical property), and clustered index describes the physical implementation of how the data is stored.
-- The table contains a `PRIMARY KEY`
-- The `PRIMARY KEY` is an `INTEGER` or `BIGINT`
-- The `PRIMARY KEY` consists of only one column
+## User scenario
-When any of these criteria are not met, TiDB will create a hidden 64-bit `handle` value to organize the table. Querying table rows by a clustered index is more efficient than by a non-clustered index because the query can be completed in a single step. In the following `EXPLAIN` outputs, a table that supports clustered indexes is compared with one that does not:
+Compared to tables with non-clustered indexes, tables with clustered indexes offer greater performance and throughput advantages in following scenarios:
+
++ When data is inserted, the clustered index reduces one write of the index data from the network.
++ When a query with an equivalent condition only involves the primary key, the clustered index reduces one read of index data from the network.
++ When a query with a range condition only involves the primary key, the clustered index reduces multiple reads of index data from the network.
++ When a query with an equivalent or range condition involves the primary key prefix, the clustered index reduces multiple reads of index data from the network.
+
+On the other hand, tables with clustered indexes have certain disadvantages. See the following:
+
+- There might be write hotspot issues when inserting a large number of primary keys with close values.
+- The table itself takes up more storage space if the data type of the primary key is larger than 64 bits, especially when there are multiple secondary indexes.
+
+## Usages
+
+## Create a table with clustered indexes
+
+Since TiDB v5.0, you can add non-reserved keywords `CLUSTERED` or `NONCLUSTERED` after `PRIMARY KEY` in a `CREATE TABLE` statement to specify whether the table's primary key is a clustered index. For example:
```sql
-CREATE TABLE always_clusters_in_all_versions (
- id BIGINT NOT NULL PRIMARY KEY auto_increment,
- b CHAR(100),
- INDEX(b)
-);
-
-CREATE TABLE does_not_cluster_by_default (
- guid CHAR(32) NOT NULL PRIMARY KEY,
- b CHAR(100),
- INDEX(b)
-);
-
-INSERT INTO always_clusters_in_all_versions VALUES (1, 'aaa'), (2, 'bbb');
-INSERT INTO does_not_cluster_by_default VALUES ('02dd050a978756da0aff6b1d1d7c8aef', 'aaa'), ('35bfbc09cb3c93d8ef032642521ac042', 'bbb');
-
-EXPLAIN SELECT * FROM always_clusters_in_all_versions WHERE id = 1;
-EXPLAIN SELECT * FROM does_not_cluster_by_default WHERE guid = '02dd050a978756da0aff6b1d1d7c8aef';
+CREATE TABLE t (a BIGINT PRIMARY KEY CLUSTERED, b VARCHAR(255));
+CREATE TABLE t (a BIGINT PRIMARY KEY NONCLUSTERED, b VARCHAR(255));
+CREATE TABLE t (a BIGINT KEY CLUSTERED, b VARCHAR(255));
+CREATE TABLE t (a BIGINT KEY NONCLUSTERED, b VARCHAR(255));
+CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) CLUSTERED);
+CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) NONCLUSTERED);
```
-```sql
-Query OK, 0 rows affected (0.09 sec)
+Note that keywords `KEY` and `PRIMARY KEY` have the same meaning in the column definition.
-Query OK, 0 rows affected (0.10 sec)
+You can also use [supported comment syntax](/comment-syntax.md) in TiDB to specify the type of the primary key. For example:
-Records: 2 Duplicates: 0 Warnings: 0
+```sql
+CREATE TABLE t (a BIGINT PRIMARY KEY /*T![clustered_index] CLUSTERED */, b VARCHAR(255));
+CREATE TABLE t (a BIGINT PRIMARY KEY /*T![clustered_index] NONCLUSTERED */, b VARCHAR(255));
+CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index] CLUSTERED */,);
+CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index] NONCLUSTERED */);
+```
-Records: 2 Duplicates: 0 Warnings: 0
+### Add or drop clustered indexes
-+-------------+---------+------+---------------------------------------+---------------+
-| id | estRows | task | access object | operator info |
-+-------------+---------+------+---------------------------------------+---------------+
-| Point_Get_1 | 1.00 | root | table:always_clusters_in_all_versions | handle:1 |
-+-------------+---------+------+---------------------------------------+---------------+
-1 row in set (0.00 sec)
+TiDB does not support adding or dropping clustered indexes after tables are created. Nor does it support the mutual conversion of clustered indexes and non-clustered indexes. For example:
-+-------------+---------+------+--------------------------------------------------------+---------------+
-| id | estRows | task | access object | operator info |
-+-------------+---------+------+--------------------------------------------------------+---------------+
-| Point_Get_1 | 1.00 | root | table:does_not_cluster_by_default, index:PRIMARY(guid) | |
-+-------------+---------+------+--------------------------------------------------------+---------------+
-1 row in set (0.00 sec)
+```sql
+ALTER TABLE t ADD PRIMARY KEY(b, a) CLUSTERED; -- Currently not supported.
+ALTER TABLE t DROP PRIMARY KEY; -- If the primary key is a clustered index, then not supported.
+ALTER TABLE t DROP INDEX `PRIMARY`; -- If the primary key is a clustered index, then not supported.
```
-The two `EXPLAIN` results above look similar, but in the second example, TiDB must first read the `PRIMARY KEY` index on the `guid` column in order to find the `handle` value. This is more obvious in the following example where the `PRIMARY KEY` value is not in the index on `does_not_cluster_by_default.b`. TiDB must perform an extra lookup on the table rows (`└─TableFullScan_5`) to convert the `handle` value to the `PRIMARY KEY` value of `guid`:
+### Add or drop non-clustered indexes
+
+TiDB supports adding or dropping non-clustered indexes after tables are created. You can explicitly specify the keyword `NONCLUSTERED` or omit it. For example:
```sql
-EXPLAIN SELECT id FROM always_clusters_in_all_versions WHERE b = 'aaaa';
-EXPLAIN SELECT guid FROM does_not_cluster_by_default WHERE b = 'aaaa';
+ALTER TABLE t ADD PRIMARY KEY(b, a) NONCLUSTERED;
+ALTER TABLE t ADD PRIMARY KEY(b, a); -- If you omit the keyword, the primary key is a non-clustered index by default.
+ALTER TABLE t DROP PRIMARY KEY;
+ALTER TABLE t DROP INDEX `PRIMARY`;
```
+### Check whether the primary key is a clustered index
+
+You can check whether the primary key of a table is a clustered index using any of the following methods:
+
+- Execute the command `SHOW CREATE TABLE`.
+- Execute the command `SHOW INDEX FROM`.
+- Query information in `information_schema.tables`.
+
+By running the command `SHOW CREATE TABLE`, you can see whether the attribute of `PRIMARY KEY` is `CLUSTERED` or `NONCLUSTERED`. For example:
+
```sql
-+--------------------------+---------+-----------+---------------------------------------------------+-------------------------------------------------------+
-| id | estRows | task | access object | operator info |
-+--------------------------+---------+-----------+---------------------------------------------------+-------------------------------------------------------+
-| Projection_4 | 0.00 | root | | test.always_clusters_in_all_versions.id |
-| └─IndexReader_6 | 0.00 | root | | index:IndexRangeScan_5 |
-| └─IndexRangeScan_5 | 0.00 | cop[tikv] | table:always_clusters_in_all_versions, index:b(b) | range:["aaaa","aaaa"], keep order:false, stats:pseudo |
-+--------------------------+---------+-----------+---------------------------------------------------+-------------------------------------------------------+
-3 rows in set (0.01 sec)
-
-+---------------------------+---------+-----------+-----------------------------------+------------------------------------------------+
-| id | estRows | task | access object | operator info |
-+---------------------------+---------+-----------+-----------------------------------+------------------------------------------------+
-| Projection_4 | 0.00 | root | | test.does_not_cluster_by_default.guid |
-| └─TableReader_7 | 0.00 | root | | data:Selection_6 |
-| └─Selection_6 | 0.00 | cop[tikv] | | eq(test.does_not_cluster_by_default.b, "aaaa") |
-| └─TableFullScan_5 | 2.00 | cop[tikv] | table:does_not_cluster_by_default | keep order:false, stats:pseudo |
-+---------------------------+---------+-----------+-----------------------------------+------------------------------------------------+
-4 rows in set (0.00 sec)
+mysql> SHOW CREATE TABLE t;
++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Table | Create Table |
++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| t | CREATE TABLE `t` (
+ `a` bigint(20) NOT NULL,
+ `b` varchar(255) DEFAULT NULL,
+ PRIMARY KEY (`a`) /*T![clustered_index] CLUSTERED */
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+1 row in set (0.01 sec)
```
-## Full support since TiDB v5.0
-
-Since v5.0, TiDB provides full support for clustered indexes by any `PRIMARY KEY`. The following `EXPLAIN` output shows the previous example with clustered indexes enabled:
+By running the command `SHOW INDEX FROM`, you can check whether the result in the column `Clustered` shows `YES` or `NO`. For example:
```sql
-SET tidb_enable_clustered_index = 1;
-CREATE TABLE will_now_cluster (
- guid CHAR(32) NOT NULL PRIMARY KEY,
- b CHAR(100),
- INDEX(b)
-);
-
-INSERT INTO will_now_cluster VALUES (1, 'aaa'), (2, 'bbb');
-INSERT INTO will_now_cluster VALUES ('02dd050a978756da0aff6b1d1d7c8aef', 'aaa'), ('35bfbc09cb3c93d8ef032642521ac042', 'bbb');
-
-EXPLAIN SELECT * FROM will_now_cluster WHERE guid = '02dd050a978756da0aff6b1d1d7c8aef';
-EXPLAIN SELECT guid FROM will_now_cluster WHERE b = 'aaaa';
+mysql> SHOW INDEX FROM t;
++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+-----------+
+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression | Clustered |
++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+-----------+
+| t | 0 | PRIMARY | 1 | a | A | 0 | NULL | NULL | | BTREE | | | YES | NULL | YES |
++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+-----------+
+1 row in set (0.01 sec)
```
+You can also query the column `TIDB_PK_TYPE` in the system table `information_schema.tables` to see whether the result is `CLUSTERED` or `NONCLUSTERED`. For example:
+
```sql
-Query OK, 0 rows affected (0.00 sec)
+mysql> SELECT TIDB_PK_TYPE FROM information_schema.tables WHERE table_schema = 'test' AND table_name = 't';
++--------------+
+| TIDB_PK_TYPE |
++--------------+
+| CLUSTERED |
++--------------+
+1 row in set (0.03 sec)
+```
-Query OK, 0 rows affected (0.11 sec)
+## Limitations
-Query OK, 2 rows affected (0.02 sec)
-Records: 2 Duplicates: 0 Warnings: 0
+Currently, there are two types of limitations for the clustered indexes feature. See the following:
-Query OK, 2 rows affected (0.01 sec)
-Records: 2 Duplicates: 0 Warnings: 0
+- Situations that are not supported and not in the support plan:
+ - It is not supported to use the clustered indexes feature together with TiDB Binlog. After TiDB Binlog is started, TiDB does not allow to create a single integer primary key as a clustered index. TiDB Binlog does not replicate data changes of existing tables with clustered indexes to the downstream. If you need to replicate tables with clustered indexes, use [TiCDC](/ticdc/ticdc-overview.md) instead.
+ - It is not supported to use clustered indexes together with the attribute [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md). Also, the attribute [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) does not take effect on tables with clustered indexes.
+ - It is not supported to degrade tables with clustered indexes. If you need to do so, use logical backup tools to migrate data instead.
+- Situations that are not supported yet but in the support plan:
+ - It is not supported to add, drop, and alter clustered indexes using `ALTER TABLE` statements.
-+-------------+---------+------+-------------------------------------------------------+---------------+
-| id | estRows | task | access object | operator info |
-+-------------+---------+------+-------------------------------------------------------+---------------+
-| Point_Get_1 | 1.00 | root | table:will_now_cluster, clustered index:PRIMARY(guid) | |
-+-------------+---------+------+-------------------------------------------------------+---------------+
-1 row in set (0.00 sec)
+After TiDB Binlog is enabled, if you create a single integer primary key as a clustered index, TiDB returns the following error:
-+--------------------------+---------+-----------+------------------------------------+-------------------------------------------------------+
-| id | estRows | task | access object | operator info |
-+--------------------------+---------+-----------+------------------------------------+-------------------------------------------------------+
-| Projection_4 | 10.00 | root | | test.will_now_cluster.guid |
-| └─IndexReader_6 | 10.00 | root | | index:IndexRangeScan_5 |
-| └─IndexRangeScan_5 | 10.00 | cop[tikv] | table:will_now_cluster, index:b(b) | range:["aaaa","aaaa"], keep order:false, stats:pseudo |
-+--------------------------+---------+-----------+------------------------------------+-------------------------------------------------------+
-3 rows in set (0.00 sec)
+```sql
+mysql> CREATE TABLE t (a VARCHAR(255) PRIMARY KEY CLUSTERED);
+ERROR 8200 (HY000): Cannot create clustered index table when the binlog is ON
```
-Clustering by a composite `PRIMARY KEY` is also supported:
+If you use clustered indexes together with the attribute `SHARD_ROW_ID_BITS`, TiDB reports the following error:
```sql
-SET tidb_enable_clustered_index = 1;
-CREATE TABLE composite_primary_key (
- key_a INT NOT NULL,
- key_b INT NOT NULL,
- b CHAR(100),
- PRIMARY KEY (key_a, key_b)
-);
-
-INSERT INTO composite_primary_key VALUES (1, 1, 'aaa'), (2, 2, 'bbb');
-EXPLAIN SELECT * FROM composite_primary_key WHERE key_a = 1 AND key_b = 2;
+mysql> CREATE TABLE t (a VARCHAR(255) PRIMARY KEY CLUSTERED) SHARD_ROW_ID_BITS = 3;
+ERROR 8200 (HY000): Unsupported shard_row_id_bits for table with primary key as row id
```
-```sql
-Query OK, 0 rows affected (0.00 sec)
+## Compatibility
-Query OK, 0 rows affected (0.09 sec)
+### Compatibility with higher and lower TiDB versions
-Query OK, 2 rows affected (0.02 sec)
-Records: 2 Duplicates: 0 Warnings: 0
+TiDB supports upgrading tables with clustered indexes but not degrading them, meaning that data in tables with clustered indexes is available on a higher TiDB version, but not on a lower one.
-+-------------+---------+------+--------------------------------------------------------------------+---------------+
-| id | estRows | task | access object | operator info |
-+-------------+---------+------+--------------------------------------------------------------------+---------------+
-| Point_Get_1 | 1.00 | root | table:composite_primary_key, clustered index:PRIMARY(key_a, key_b) | |
-+-------------+---------+------+--------------------------------------------------------------------+---------------+
-1 row in set (0.00 sec)
-```
+The clustered indexes feature is partially supported in TiDB v3.0 and v4.0. It is enabled by default when the following requirements are fully met:
+
+- The table contains a `PRIMARY KEY`.
+- The `PRIMARY KEY` consists of only one column.
+- The `PRIMARY KEY` is an `INTEGER`.
+
+However, since v5.0, TiDB creates all types of primary keys as non-clustered indexes by default. Because this behavior change may cause TiDB in the default configuration to perform worse in some scenarios, you can consider explicitly specifying clustered indexes.
-This behavior is consistent with MySQL, where the InnoDB storage engine will by default cluster by any `PRIMARY KEY`.
+### Compatibility with MySQL
-## Storage considerations
+TiDB specific comment syntax supports wrapping the keywords `CLUSTERED` and `NONCLUSTERED` in a comment. The result of `SHOW CREATE TABLE` also contains TiDB specific SQL comments. MySQL databases and TiDB databases of a lower version can identify and execute DDL statements with these comments.
-Because the `PRIMARY KEY` replaces a 64-bit `handle` value as the internal pointer to table rows, using clustered indexes might increase storage requirements. This is particularly impactful on tables that contain many secondary indexes. Consider the following example:
+### Compatibility with TiDB ecosystem tools
+
+The clustered indexes feature is only compatible with the following ecosystem tools in v5.0 and later versions:
+
+- Backup and recovery tools: BR, Dumpling, and TiDB Lightning.
+- Data migration and replication tools: DM and TiCDC.
+
+However, you cannot convert a table with non-clustered indexes to a table with clustered indexes by using the v5.0 BR to backup and recover tables, and vice versa.
+
+### Compatibility with other TiDB features
+
+For a table with a combined primary key or a single non-integer primary key, if you change the primary key from a non-clustered index to a clustered index, the keys of its row data change as well. Therefore, `SPLIT TABLE BY/BETWEEN` statements that are executable on TiDB v5.0 and lower versions, are no longer workable in v5.0 and higher versions of TiDB. If you want to split a table with clustered indexes using `SPLIT TABLE BY/BETWEEN`, you need to provide the value of the primary key column, instead of specifying an integer value. See the following example:
```sql
-CREATE TABLE t1 (
- guid CHAR(32) NOT NULL PRIMARY KEY,
- b BIGINT,
- INDEX(b)
-);
+mysql> create table t (a int, b varchar(255), primary key(a, b) clustered);
+Query OK, 0 rows affected (0.01 sec)
+mysql> split table t between (0) and (1000000) regions 5;
+ERROR 1105 (HY000): Split table region lower value count should be 2
+mysql> split table t by (0), (50000), (100000);
+ERROR 1136 (21S01): Column count doesn't match value count at row 0
+mysql> split table t between (0, 'aaa') and (1000000, 'zzz') regions 5;
++--------------------+----------------------+
+| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
++--------------------+----------------------+
+| 4 | 1 |
++--------------------+----------------------+
+1 row in set (0.00 sec)
+mysql> split table t by (0, ''), (50000, ''), (100000, '');
++--------------------+----------------------+
+| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
++--------------------+----------------------+
+| 3 | 1 |
++--------------------+----------------------+
+1 row in set (0.01 sec)
```
-Because the pointer to the `guid` is a `char(32)`, each index value for `b` will now require approximately `8 + 32 = 40 bytes` (a `BIGINT` value requires 8 bytes for storage). This compares to `8 + 8 = 16 bytes` for non-clustered tables. The exact storage requirements will differ after compression has been applied.
+The attribute [`AUTO_RANDOM`](/auto-random.md) can only be used in tables with clustered indexes. Otherwise, TiDB returns the following error:
+
+```sql
+mysql> create table t (a bigint primary key nonclustered auto_random);
+ERROR 8216 (HY000): Invalid auto random: column a is not the integer primary key, or the primary key is nonclustered
+```
\ No newline at end of file
From f041c26b158d632debae50b76e6eab49fa7f6189 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Fri, 26 Mar 2021 14:59:44 +0800
Subject: [PATCH 02/13] contents from docs-cn #5834
---
clustered-indexes.md | 8 ++++++++
system-variables.md | 20 ++++++--------------
2 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 0fb8a8895f700..11f0d3c31a5cd 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -61,6 +61,14 @@ CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index
CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index] NONCLUSTERED */);
```
+For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is affected by the the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
+
+- `OFF` indicates that primary keys are created as non-clustered indexes by default.
+ - `ON` indicates that primary keys are created as clustered indexes by default.
+ - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, single integer primary keys are created as clustered indexes and others are created as non-clustered indexes.
+
+The default value of `@@global.tidb_enable_clustered_index` is `INT_ONLY`.
+
### Add or drop clustered indexes
TiDB does not support adding or dropping clustered indexes after tables are created. Nor does it support the mutual conversion of clustered indexes and non-clustered indexes. For example:
diff --git a/system-variables.md b/system-variables.md
index 30e58f2b436d1..b44d08ab76837 100644
--- a/system-variables.md
+++ b/system-variables.md
@@ -388,20 +388,12 @@ Constraint checking is always performed in place for pessimistic transactions (d
### tidb_enable_clustered_index New in v5.0.0-rc
-- Scope: SESSION | GLOBAL
-- Default value: OFF
-- This variable is used to control whether to enable the [clustered index](/clustered-indexes.md) feature.
- - This feature is only applicable to newly created tables and does not affect the existing old tables.
- - This feature is only applicable to tables whose primary key is the single-column non-integer type or the multi-column type. It does not affect the tables without a primary key or tables with the primary key of the single-column integer type.
- - You can execute `select tidb_pk_type from information_schema.tables where table_name ='{table_name}'` to check whether the clustered index feature has been enabled on a table.
-- After you enable this feature, rows are stored directly on the primary key instead of on the internally allocated `rows_id` to which the extra primary key index is created to point.
-
- This feature impacts performance in the following aspects:
-
- - For each `INSERT` operation, there is one less index key written into each row.
- - When you make a query using the primary key as the equivalent condition, one read request can be saved.
- - When you make a query using the primary key as the range condition, multiple read requests can be saved.
- - When you make a query using the prefix of the multi-column primary key as the equivalent condition or range condition, multiple read requests can be saved.
+- Scope: GLOBAL
+- Default value: INT_ONLY
+- This variable is used to control whether to create the primary key as a [clustered index]((/clustered-indexes.md))by default, which is when the statement does not explicitly specifies the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
+ - `OFF` indicates that primary keys are created as non-clustered indexes by default.
+ - `ON` indicates that primary keys are created as clustered indexes by default.
+ - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, single integer primary keys are created as clustered indexes and others are created as non-clustered indexes.
### tidb_enable_collect_execution_info
From ec637f9421492dbea8f440a4b491b8d8c3ec99ac Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Fri, 26 Mar 2021 15:09:54 +0800
Subject: [PATCH 03/13] fix dead link
---
system-variables.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/system-variables.md b/system-variables.md
index b44d08ab76837..71d13f3416617 100644
--- a/system-variables.md
+++ b/system-variables.md
@@ -390,7 +390,7 @@ Constraint checking is always performed in place for pessimistic transactions (d
- Scope: GLOBAL
- Default value: INT_ONLY
-- This variable is used to control whether to create the primary key as a [clustered index]((/clustered-indexes.md))by default, which is when the statement does not explicitly specifies the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
+- This variable is used to control whether to create the primary key as a [clustered index](/clustered-indexes.md))by default, which is when the statement does not explicitly specifies the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
- `OFF` indicates that primary keys are created as non-clustered indexes by default.
- `ON` indicates that primary keys are created as clustered indexes by default.
- `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, single integer primary keys are created as clustered indexes and others are created as non-clustered indexes.
From ba5a45d6351d10dc372f348a63443a6d9cf12d31 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Fri, 26 Mar 2021 18:34:10 +0800
Subject: [PATCH 04/13] fix grammar
---
clustered-indexes.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 11f0d3c31a5cd..3d1461f11839f 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -11,10 +11,10 @@ The term _clustered_ in this context refers to the _organization of how data is
Currently, tables containing primary keys in TiDB are divided into the following two categories:
-- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data are consists of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
+- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data are consist of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
- `_tidb_rowid` (key) - row data (value)
- Primary key data (key) - `_tidb_rowid` (value)
-- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data are consists of primary keys consists of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
+- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data are consist of primary keys consists of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
- Primary key data (key) - row data (value)
> **Note:**
@@ -23,7 +23,7 @@ Currently, tables containing primary keys in TiDB are divided into the following
## User scenario
-Compared to tables with non-clustered indexes, tables with clustered indexes offer greater performance and throughput advantages in following scenarios:
+Compared to tables with non-clustered indexes, tables with clustered indexes offer greater performance and throughput advantages in the following scenarios:
+ When data is inserted, the clustered index reduces one write of the index data from the network.
+ When a query with an equivalent condition only involves the primary key, the clustered index reduces one read of index data from the network.
@@ -61,7 +61,7 @@ CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index
CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index] NONCLUSTERED */);
```
-For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is affected by the the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
+For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is affected by the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
- `OFF` indicates that primary keys are created as non-clustered indexes by default.
- `ON` indicates that primary keys are created as clustered indexes by default.
@@ -143,7 +143,7 @@ mysql> SELECT TIDB_PK_TYPE FROM information_schema.tables WHERE table_schema = '
Currently, there are two types of limitations for the clustered indexes feature. See the following:
- Situations that are not supported and not in the support plan:
- - It is not supported to use the clustered indexes feature together with TiDB Binlog. After TiDB Binlog is started, TiDB does not allow to create a single integer primary key as a clustered index. TiDB Binlog does not replicate data changes of existing tables with clustered indexes to the downstream. If you need to replicate tables with clustered indexes, use [TiCDC](/ticdc/ticdc-overview.md) instead.
+ - It is not supported using the clustered indexes feature together with TiDB Binlog. After TiDB Binlog is started, TiDB does not allow to create a single integer primary key as a clustered index. TiDB Binlog does not replicate data changes of existing tables with clustered indexes to the downstream. If you need to replicate tables with clustered indexes, use [TiCDC](/ticdc/ticdc-overview.md) instead.
- It is not supported to use clustered indexes together with the attribute [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md). Also, the attribute [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) does not take effect on tables with clustered indexes.
- It is not supported to degrade tables with clustered indexes. If you need to do so, use logical backup tools to migrate data instead.
- Situations that are not supported yet but in the support plan:
From 1197314b75fccf47d212b3e1a66a7502e9a846a9 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 10:26:40 +0800
Subject: [PATCH 05/13] Apply suggestions from code review
Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
Co-authored-by: tangenta
---
clustered-indexes.md | 36 ++++++++++++++++++------------------
system-variables.md | 4 ++--
2 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 3d1461f11839f..4d14775d0a650 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -5,7 +5,7 @@ summary: Learn the concept, user scenario, usages, limitations, and compatibilit
# Clustered Indexes
-TiDB supports the clustered indexes feature since v5.0. This feature controls how data is stored in tables containing primary keys. It provides TiDB the ability to organize tables in a way that can improve the performance of certain queries.
+TiDB supports the clustered index feature since v5.0. This feature controls how data is stored in tables containing primary keys. It provides TiDB the ability to organize tables in a way that can improve the performance of certain queries.
The term _clustered_ in this context refers to the _organization of how data is stored_ and not _a group of database servers working together_. Some database management systems refer to clustered indexes as _index-organized tables_ (IOT).
@@ -21,14 +21,14 @@ Currently, tables containing primary keys in TiDB are divided into the following
>
> TiDB supports clustering only by a table's `PRIMARY KEY`. With clustered indexes enabled, the terms _the_ `PRIMARY KEY` and _the clustered index_ might be used interchangeably. `PRIMARY KEY` refers to the constraint (a logical property), and clustered index describes the physical implementation of how the data is stored.
-## User scenario
+## User scenarios
Compared to tables with non-clustered indexes, tables with clustered indexes offer greater performance and throughput advantages in the following scenarios:
+ When data is inserted, the clustered index reduces one write of the index data from the network.
+ When a query with an equivalent condition only involves the primary key, the clustered index reduces one read of index data from the network.
+ When a query with a range condition only involves the primary key, the clustered index reduces multiple reads of index data from the network.
-+ When a query with an equivalent or range condition involves the primary key prefix, the clustered index reduces multiple reads of index data from the network.
++ When a query with an equivalent or range condition only involves the primary key prefix, the clustered index reduces multiple reads of index data from the network.
On the other hand, tables with clustered indexes have certain disadvantages. See the following:
@@ -52,7 +52,7 @@ CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) NONCLUSTERED);
Note that keywords `KEY` and `PRIMARY KEY` have the same meaning in the column definition.
-You can also use [supported comment syntax](/comment-syntax.md) in TiDB to specify the type of the primary key. For example:
+You can also use the [comment syntax](/comment-syntax.md) in TiDB to specify the type of the primary key. For example:
```sql
CREATE TABLE t (a BIGINT PRIMARY KEY /*T![clustered_index] CLUSTERED */, b VARCHAR(255));
@@ -61,17 +61,17 @@ CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index
CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index] NONCLUSTERED */);
```
-For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is affected by the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
+For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is controlled by the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
- `OFF` indicates that primary keys are created as non-clustered indexes by default.
- `ON` indicates that primary keys are created as clustered indexes by default.
- - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, single integer primary keys are created as clustered indexes and others are created as non-clustered indexes.
+ - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys consist of an integer column are created as clustered indexes.
The default value of `@@global.tidb_enable_clustered_index` is `INT_ONLY`.
### Add or drop clustered indexes
-TiDB does not support adding or dropping clustered indexes after tables are created. Nor does it support the mutual conversion of clustered indexes and non-clustered indexes. For example:
+TiDB does not support adding or dropping clustered indexes after tables are created. Nor does it support the mutual conversion between clustered indexes and non-clustered indexes. For example:
```sql
ALTER TABLE t ADD PRIMARY KEY(b, a) CLUSTERED; -- Currently not supported.
@@ -92,7 +92,7 @@ ALTER TABLE t DROP INDEX `PRIMARY`;
### Check whether the primary key is a clustered index
-You can check whether the primary key of a table is a clustered index using any of the following methods:
+You can check whether the primary key of a table is a clustered index using one of the following methods:
- Execute the command `SHOW CREATE TABLE`.
- Execute the command `SHOW INDEX FROM`.
@@ -140,14 +140,14 @@ mysql> SELECT TIDB_PK_TYPE FROM information_schema.tables WHERE table_schema = '
## Limitations
-Currently, there are two types of limitations for the clustered indexes feature. See the following:
+Currently, there are two types of limitations for the clustered index feature. See the following:
- Situations that are not supported and not in the support plan:
- It is not supported using the clustered indexes feature together with TiDB Binlog. After TiDB Binlog is started, TiDB does not allow to create a single integer primary key as a clustered index. TiDB Binlog does not replicate data changes of existing tables with clustered indexes to the downstream. If you need to replicate tables with clustered indexes, use [TiCDC](/ticdc/ticdc-overview.md) instead.
- - It is not supported to use clustered indexes together with the attribute [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md). Also, the attribute [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) does not take effect on tables with clustered indexes.
- - It is not supported to degrade tables with clustered indexes. If you need to do so, use logical backup tools to migrate data instead.
+ - Using clustered indexes together with the attribute [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) is not supported. Also, the attribute [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) does not take effect on tables with clustered indexes.
+ - Downgrading tables with clustered indexes is not supported. If you need to downgrade such tables, use logical backup tools to migrate data instead.
- Situations that are not supported yet but in the support plan:
- - It is not supported to add, drop, and alter clustered indexes using `ALTER TABLE` statements.
+ - Adding, dropping, and altering clustered indexes using `ALTER TABLE` statements are not supported.
After TiDB Binlog is enabled, if you create a single integer primary key as a clustered index, TiDB returns the following error:
@@ -169,7 +169,7 @@ ERROR 8200 (HY000): Unsupported shard_row_id_bits for table with primary key as
TiDB supports upgrading tables with clustered indexes but not degrading them, meaning that data in tables with clustered indexes is available on a higher TiDB version, but not on a lower one.
-The clustered indexes feature is partially supported in TiDB v3.0 and v4.0. It is enabled by default when the following requirements are fully met:
+The clustered index feature is partially supported in TiDB v3.0 and v4.0. It is enabled by default when the following requirements are fully met:
- The table contains a `PRIMARY KEY`.
- The `PRIMARY KEY` consists of only one column.
@@ -183,16 +183,16 @@ TiDB specific comment syntax supports wrapping the keywords `CLUSTERED` and `NON
### Compatibility with TiDB ecosystem tools
-The clustered indexes feature is only compatible with the following ecosystem tools in v5.0 and later versions:
+The clustered index feature is only compatible with the following ecosystem tools in v5.0 and later versions:
-- Backup and recovery tools: BR, Dumpling, and TiDB Lightning.
+- Backup and restore tools: BR, Dumpling, and TiDB Lightning.
- Data migration and replication tools: DM and TiCDC.
However, you cannot convert a table with non-clustered indexes to a table with clustered indexes by using the v5.0 BR to backup and recover tables, and vice versa.
### Compatibility with other TiDB features
-For a table with a combined primary key or a single non-integer primary key, if you change the primary key from a non-clustered index to a clustered index, the keys of its row data change as well. Therefore, `SPLIT TABLE BY/BETWEEN` statements that are executable on TiDB v5.0 and lower versions, are no longer workable in v5.0 and higher versions of TiDB. If you want to split a table with clustered indexes using `SPLIT TABLE BY/BETWEEN`, you need to provide the value of the primary key column, instead of specifying an integer value. See the following example:
+For a table with a combined primary key or a single non-integer primary key, if you change the primary key from a non-clustered index to a clustered index, the keys of its row data change as well. Therefore, `SPLIT TABLE BY/BETWEEN` statements that are executable in TiDB versions earlier than v5.0 are no longer workable in v5.0 and later versions of TiDB. If you want to split a table with clustered indexes using `SPLIT TABLE BY/BETWEEN`, you need to provide the value of the primary key column, instead of specifying an integer value. See the following example:
```sql
mysql> create table t (a int, b varchar(255), primary key(a, b) clustered);
@@ -217,9 +217,9 @@ mysql> split table t by (0, ''), (50000, ''), (100000, '');
1 row in set (0.01 sec)
```
-The attribute [`AUTO_RANDOM`](/auto-random.md) can only be used in tables with clustered indexes. Otherwise, TiDB returns the following error:
+The attribute [`AUTO_RANDOM`](/auto-random.md) can only be used on clustered indexes. Otherwise, TiDB returns the following error:
```sql
mysql> create table t (a bigint primary key nonclustered auto_random);
ERROR 8216 (HY000): Invalid auto random: column a is not the integer primary key, or the primary key is nonclustered
-```
\ No newline at end of file
+```
diff --git a/system-variables.md b/system-variables.md
index 71d13f3416617..07f2271ac02b4 100644
--- a/system-variables.md
+++ b/system-variables.md
@@ -390,10 +390,10 @@ Constraint checking is always performed in place for pessimistic transactions (d
- Scope: GLOBAL
- Default value: INT_ONLY
-- This variable is used to control whether to create the primary key as a [clustered index](/clustered-indexes.md))by default, which is when the statement does not explicitly specifies the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
+- This variable is used to control whether to create the primary key as a [clustered index](/clustered-indexes.md) by default. "By default" here means that the statement does not explicitly specifies the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
- `OFF` indicates that primary keys are created as non-clustered indexes by default.
- `ON` indicates that primary keys are created as clustered indexes by default.
- - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, single integer primary keys are created as clustered indexes and others are created as non-clustered indexes.
+ - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, all primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys consist of an integer column are created as clustered indexes.
### tidb_enable_collect_execution_info
From 70ded370e4d95dd4fd8c304d2db623edc2122327 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 11:13:25 +0800
Subject: [PATCH 06/13] Apply suggestions from code review
Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
---
clustered-indexes.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 4d14775d0a650..2fbed3851d3d2 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -1,6 +1,6 @@
---
title: Clustered Indexes
-summary: Learn the concept, user scenario, usages, limitations, and compatibility of clustered indexes.
+summary: Learn the concept, user scenarios, usages, limitations, and compatibility of clustered indexes.
---
# Clustered Indexes
@@ -143,13 +143,13 @@ mysql> SELECT TIDB_PK_TYPE FROM information_schema.tables WHERE table_schema = '
Currently, there are two types of limitations for the clustered index feature. See the following:
- Situations that are not supported and not in the support plan:
- - It is not supported using the clustered indexes feature together with TiDB Binlog. After TiDB Binlog is started, TiDB does not allow to create a single integer primary key as a clustered index. TiDB Binlog does not replicate data changes of existing tables with clustered indexes to the downstream. If you need to replicate tables with clustered indexes, use [TiCDC](/ticdc/ticdc-overview.md) instead.
+ - Using the clustered index feature together with TiDB Binlog is not supported. After TiDB Binlog is enabled, TiDB only allows creating a single integer primary key as a clustered index. TiDB Binlog does not replicate data changes of existing tables with clustered indexes to the downstream. If you need to replicate tables with clustered indexes, use [TiCDC](/ticdc/ticdc-overview.md) instead.
- Using clustered indexes together with the attribute [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) is not supported. Also, the attribute [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) does not take effect on tables with clustered indexes.
- Downgrading tables with clustered indexes is not supported. If you need to downgrade such tables, use logical backup tools to migrate data instead.
- Situations that are not supported yet but in the support plan:
- Adding, dropping, and altering clustered indexes using `ALTER TABLE` statements are not supported.
-After TiDB Binlog is enabled, if you create a single integer primary key as a clustered index, TiDB returns the following error:
+After TiDB Binlog is enabled, if the primary key you create as a clustered index does not consist of only one integer column, TiDB returns the following error:
```sql
mysql> CREATE TABLE t (a VARCHAR(255) PRIMARY KEY CLUSTERED);
@@ -167,7 +167,7 @@ ERROR 8200 (HY000): Unsupported shard_row_id_bits for table with primary key as
### Compatibility with higher and lower TiDB versions
-TiDB supports upgrading tables with clustered indexes but not degrading them, meaning that data in tables with clustered indexes is available on a higher TiDB version, but not on a lower one.
+TiDB supports upgrading tables with clustered indexes but not downgrading such tables, which means that data in tables with clustered indexes on a later TiDB version is not available on an earlier one.
The clustered index feature is partially supported in TiDB v3.0 and v4.0. It is enabled by default when the following requirements are fully met:
@@ -175,7 +175,7 @@ The clustered index feature is partially supported in TiDB v3.0 and v4.0. It is
- The `PRIMARY KEY` consists of only one column.
- The `PRIMARY KEY` is an `INTEGER`.
-However, since v5.0, TiDB creates all types of primary keys as non-clustered indexes by default. Because this behavior change may cause TiDB in the default configuration to perform worse in some scenarios, you can consider explicitly specifying clustered indexes.
+However, since v5.0, TiDB creates all types of primary keys as non-clustered indexes by default. This behavior change might cause TiDB in the default configuration to perform worse in some scenarios. You can consider explicitly specifying a primary key as a clustered index.
### Compatibility with MySQL
@@ -188,7 +188,7 @@ The clustered index feature is only compatible with the following ecosystem tool
- Backup and restore tools: BR, Dumpling, and TiDB Lightning.
- Data migration and replication tools: DM and TiCDC.
-However, you cannot convert a table with non-clustered indexes to a table with clustered indexes by using the v5.0 BR to backup and recover tables, and vice versa.
+However, you cannot convert a table with non-clustered indexes to a table with clustered indexes by backing up and restoring the table using the v5.0 BR tool, and vice versa.
### Compatibility with other TiDB features
From 1b58914e821a1157c1d9a27866b7815b42e6d5cf Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 11:16:31 +0800
Subject: [PATCH 07/13] Apply suggestions from code review
---
clustered-indexes.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 2fbed3851d3d2..b5cb0eb3f1638 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -14,7 +14,7 @@ Currently, tables containing primary keys in TiDB are divided into the following
- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data are consist of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
- `_tidb_rowid` (key) - row data (value)
- Primary key data (key) - `_tidb_rowid` (value)
-- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data are consist of primary keys consists of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
+- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data consist of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
- Primary key data (key) - row data (value)
> **Note:**
From 63990771e65a59f356f87780bbcbd3e0141fb090 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 11:34:02 +0800
Subject: [PATCH 08/13] Update clustered-indexes.md
Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
---
clustered-indexes.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index b5cb0eb3f1638..d07a33b7cd963 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -11,7 +11,7 @@ The term _clustered_ in this context refers to the _organization of how data is
Currently, tables containing primary keys in TiDB are divided into the following two categories:
-- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data are consist of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
+- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data consist of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
- `_tidb_rowid` (key) - row data (value)
- Primary key data (key) - `_tidb_rowid` (value)
- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data consist of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
From daaa6725bc9f7695616485b8bd6108e698a5c4be Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 13:39:50 +0800
Subject: [PATCH 09/13] Apply suggestions from code review
Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
---
clustered-indexes.md | 2 +-
system-variables.md | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index d07a33b7cd963..7472057e1392b 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -165,7 +165,7 @@ ERROR 8200 (HY000): Unsupported shard_row_id_bits for table with primary key as
## Compatibility
-### Compatibility with higher and lower TiDB versions
+### Compatibility with earlier and later TiDB versions
TiDB supports upgrading tables with clustered indexes but not downgrading such tables, which means that data in tables with clustered indexes on a later TiDB version is not available on an earlier one.
diff --git a/system-variables.md b/system-variables.md
index 07f2271ac02b4..f8a93bd9499a4 100644
--- a/system-variables.md
+++ b/system-variables.md
@@ -390,7 +390,7 @@ Constraint checking is always performed in place for pessimistic transactions (d
- Scope: GLOBAL
- Default value: INT_ONLY
-- This variable is used to control whether to create the primary key as a [clustered index](/clustered-indexes.md) by default. "By default" here means that the statement does not explicitly specifies the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
+- This variable is used to control whether to create the primary key as a [clustered index](/clustered-indexes.md) by default. "By default" here means that the statement does not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`:
- `OFF` indicates that primary keys are created as non-clustered indexes by default.
- `ON` indicates that primary keys are created as clustered indexes by default.
- `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, all primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys consist of an integer column are created as clustered indexes.
From ed20aba4a6cb62f026dab36ed4eb43e5cd8c5e3f Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 16:59:07 +0800
Subject: [PATCH 10/13] contents from docs-cn #5864
---
clustered-indexes.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index 7472057e1392b..dfdbc9709c353 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -14,7 +14,7 @@ Currently, tables containing primary keys in TiDB are divided into the following
- `NONCLUSTERD`: The primary key of the table is non-clustered index. In tables with non-clustered indexes, the keys for row data consist of internal `_tidb_rowid` implicitly assigned by TiDB. Because primary keys are essentially unique indexes, tables with non-clustered indexes need at least two key-value pairs to store a row, which are:
- `_tidb_rowid` (key) - row data (value)
- Primary key data (key) - `_tidb_rowid` (value)
-- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data consist of primary key data given by the user. There is no need to simulate unique indexes, so tables with clustered indexes need only one key-value pair to store a row, which is:
+- `CLUSTERED`: The primary key of the table is clustered index. In tables with clustered indexes, the keys for row data consist of primary key data given by the user. Therefore, tables with clustered indexes need only one key-value pair to store a row, which is:
- Primary key data (key) - row data (value)
> **Note:**
@@ -33,7 +33,7 @@ Compared to tables with non-clustered indexes, tables with clustered indexes off
On the other hand, tables with clustered indexes have certain disadvantages. See the following:
- There might be write hotspot issues when inserting a large number of primary keys with close values.
-- The table itself takes up more storage space if the data type of the primary key is larger than 64 bits, especially when there are multiple secondary indexes.
+- The table data takes up more storage space if the data type of the primary key is larger than 64 bits, especially when there are multiple secondary indexes.
## Usages
@@ -96,7 +96,7 @@ You can check whether the primary key of a table is a clustered index using one
- Execute the command `SHOW CREATE TABLE`.
- Execute the command `SHOW INDEX FROM`.
-- Query information in `information_schema.tables`.
+- Query the `TIDB_PK_TYPE` column in the system table `information_schema.tables`.
By running the command `SHOW CREATE TABLE`, you can see whether the attribute of `PRIMARY KEY` is `CLUSTERED` or `NONCLUSTERED`. For example:
@@ -149,7 +149,7 @@ Currently, there are two types of limitations for the clustered index feature. S
- Situations that are not supported yet but in the support plan:
- Adding, dropping, and altering clustered indexes using `ALTER TABLE` statements are not supported.
-After TiDB Binlog is enabled, if the primary key you create as a clustered index does not consist of only one integer column, TiDB returns the following error:
+After TiDB Binlog is enabled, if the clustered index you create is not a single integer primary key, TiDB returns the following error:
```sql
mysql> CREATE TABLE t (a VARCHAR(255) PRIMARY KEY CLUSTERED);
@@ -175,11 +175,11 @@ The clustered index feature is partially supported in TiDB v3.0 and v4.0. It is
- The `PRIMARY KEY` consists of only one column.
- The `PRIMARY KEY` is an `INTEGER`.
-However, since v5.0, TiDB creates all types of primary keys as non-clustered indexes by default. This behavior change might cause TiDB in the default configuration to perform worse in some scenarios. You can consider explicitly specifying a primary key as a clustered index.
+Since TiDB v5.0, the clustered index feature is fully supported for all types of primary keys, but the default behavior is consistent with TiDB v3.0 and v4.0. To change the default behavior, you can configure the system variable `@@tidb_enable_clustered_index` to `ON` or `OFF`. For more details, see [Create a table with clustered indexes](#create-a-table-with-clustered-indexes) .
### Compatibility with MySQL
-TiDB specific comment syntax supports wrapping the keywords `CLUSTERED` and `NONCLUSTERED` in a comment. The result of `SHOW CREATE TABLE` also contains TiDB specific SQL comments. MySQL databases and TiDB databases of a lower version can identify and execute DDL statements with these comments.
+TiDB specific comment syntax supports wrapping the keywords `CLUSTERED` and `NONCLUSTERED` in a comment. The result of `SHOW CREATE TABLE` also contains TiDB specific SQL comments. MySQL databases and TiDB databases of an earlier version will ignore these comments.
### Compatibility with TiDB ecosystem tools
From 6d8ac5b4a7448c02a74176f75dde76f7bb2e6fae Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 17:17:38 +0800
Subject: [PATCH 11/13] Update clustered-indexes.md
Co-authored-by: tangenta
---
clustered-indexes.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index dfdbc9709c353..a208026f23e81 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -175,7 +175,7 @@ The clustered index feature is partially supported in TiDB v3.0 and v4.0. It is
- The `PRIMARY KEY` consists of only one column.
- The `PRIMARY KEY` is an `INTEGER`.
-Since TiDB v5.0, the clustered index feature is fully supported for all types of primary keys, but the default behavior is consistent with TiDB v3.0 and v4.0. To change the default behavior, you can configure the system variable `@@tidb_enable_clustered_index` to `ON` or `OFF`. For more details, see [Create a table with clustered indexes](#create-a-table-with-clustered-indexes) .
+Since TiDB v5.0, the clustered index feature is fully supported for all types of primary keys, but the default behavior is consistent with TiDB v3.0 and v4.0. To change the default behavior, you can configure the system variable `@@tidb_enable_clustered_index` to `ON` or `OFF`. For more details, see [Create a table with clustered indexes](#create-a-table-with-clustered-indexes).
### Compatibility with MySQL
From 6de802f2c663c3045ddd34d7cf4578c6557b1fc5 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 18:07:59 +0800
Subject: [PATCH 12/13] Update clustered-indexes.md
---
clustered-indexes.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index a208026f23e81..faa0c2330439e 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -63,7 +63,7 @@ CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index
For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is controlled by the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
-- `OFF` indicates that primary keys are created as non-clustered indexes by default.
+ - `OFF` indicates that primary keys are created as non-clustered indexes by default.
- `ON` indicates that primary keys are created as clustered indexes by default.
- `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys consist of an integer column are created as clustered indexes.
From 402511ef3459f0352683cc1e2e6a2bffcd5e27a4 Mon Sep 17 00:00:00 2001
From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com>
Date: Mon, 29 Mar 2021 18:10:08 +0800
Subject: [PATCH 13/13] fix list format
---
clustered-indexes.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/clustered-indexes.md b/clustered-indexes.md
index faa0c2330439e..0040c461d2052 100644
--- a/clustered-indexes.md
+++ b/clustered-indexes.md
@@ -63,9 +63,9 @@ CREATE TABLE t (a BIGINT, b VARCHAR(255), PRIMARY KEY(a, b) /*T![clustered_index
For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`, the default behavior is controlled by the global variable `@@global.tidb_enable_clustered_index`. Supported values for this variable are as follows:
- - `OFF` indicates that primary keys are created as non-clustered indexes by default.
- - `ON` indicates that primary keys are created as clustered indexes by default.
- - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys consist of an integer column are created as clustered indexes.
+- `OFF` indicates that primary keys are created as non-clustered indexes by default.
+- `ON` indicates that primary keys are created as clustered indexes by default.
+- `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys consist of an integer column are created as clustered indexes.
The default value of `@@global.tidb_enable_clustered_index` is `INT_ONLY`.