reference: update documents for new collation#2350
Conversation
|
Sorry, I'll review this PR later today. |
| ``` | ||
|
|
||
| The `alter_specification` option specifies the `CHARACTER SET` and `COLLATE` of a specified database. Currently, TiDB only supports some character sets and collations. See [Character Set Support](/reference/sql/character-set.md) for details. | ||
| The `alter_specification` option specifies the `CHARACTER SET` and `COLLATE` of a specified database. Currently, TiDB only supports some character sets and collations. See [Character Set Support](/reference/sql/characterset-and-collation.md) for details. |
There was a problem hiding this comment.
| The `alter_specification` option specifies the `CHARACTER SET` and `COLLATE` of a specified database. Currently, TiDB only supports some character sets and collations. See [Character Set Support](/reference/sql/characterset-and-collation.md) for details. | |
| The `alter_specification` option specifies the `CHARACTER SET` and `COLLATE` of a specified database. Currently, TiDB only supports some character sets and collations. See [Character Set and Collation Support](/reference/sql/characterset-and-collation.md) for details. |
| If you create an existing database and does not specify `IF NOT EXISTS`, an error is displayed. | ||
|
|
||
| The `create_specification` option is used to specify the specific `CHARACTER SET` and `COLLATE` in the database. Currently, TiDB only supports some of the character sets and collations. For details, see [Character Set Support](/reference/sql/character-set.md). | ||
| The `create_specification` option is used to specify the specific `CHARACTER SET` and `COLLATE` in the database. Currently, TiDB only supports some of the character sets and collations. For details, see [Character Set and Collation Supports](/reference/sql/characterset-and-collation.md). |
There was a problem hiding this comment.
| The `create_specification` option is used to specify the specific `CHARACTER SET` and `COLLATE` in the database. Currently, TiDB only supports some of the character sets and collations. For details, see [Character Set and Collation Supports](/reference/sql/characterset-and-collation.md). | |
| The `create_specification` option is used to specify the specific `CHARACTER SET` and `COLLATE` in the database. Currently, TiDB only supports some of the character sets and collations. For details, see [Character Set and Collation Support](/reference/sql/characterset-and-collation.md). |
| * [SHOW \[GLOBAL|SESSION\] VARIABLES](/reference/sql/statements/show-variables.md) | ||
| * [SET <variable>](/reference/sql/statements/set-variable.md) | ||
| * [Character Set Support](/reference/sql/character-set.md) | ||
| * [Character Set and Collation Supports](/reference/sql/characterset-and-collation.md) |
There was a problem hiding this comment.
| * [Character Set and Collation Supports](/reference/sql/characterset-and-collation.md) | |
| * [Character Set and Collation Support](/reference/sql/characterset-and-collation.md) |
| 6. Check the Character Set. | ||
|
|
||
| TiDB differs from MySQL in [Character Set](/reference/sql/character-set.md). | ||
| TiDB differs from MySQL in [Character Set](/reference/sql/characterset-and-collation.md). |
There was a problem hiding this comment.
| TiDB differs from MySQL in [Character Set](/reference/sql/characterset-and-collation.md). | |
| TiDB differs from MySQL in [character sets](/reference/sql/characterset-and-collation.md). |
| @@ -0,0 +1,374 @@ | |||
| --- | |||
| title: Character Set and Collation | |||
| summary: Learn about the supported character sets in TiDB. | |||
There was a problem hiding this comment.
| summary: Learn about the supported character sets in TiDB. | |
| summary: Learn about the supported character sets and collations in TiDB. |
| > | ||
| > Each character set might correspond to multiple collations, but by default each character set corresponds to only one collation. | ||
|
|
||
| You can use the following statement to view the collation (under the [new framework for collations](#new-framework-for-collations)) that corresponds to the character set. |
There was a problem hiding this comment.
| You can use the following statement to view the collation (under the [new framework for collations](#new-framework-for-collations)) that corresponds to the character set. | |
| You can use the following statement to view the collations (under the [new framework for collations](#new-framework-for-collations)) that corresponds to the character set. |
| Query OK, 0 rows affected (0.08 sec) | ||
| ``` | ||
|
|
||
| The database character set and collation are used as the default values for table definitions if the table character set and collation are not specified in individual column definitions. |
There was a problem hiding this comment.
| The database character set and collation are used as the default values for table definitions if the table character set and collation are not specified in individual column definitions. | |
| If the table character set and collation are not specified, the database character set and collation are used as their default values. |
|
|
||
| ## Column character set and collation | ||
|
|
||
| See the following table for the character set and collation syntax for columns: |
There was a problem hiding this comment.
| See the following table for the character set and collation syntax for columns: | |
| You can use the following statement to specify the character set and collation for columns: |
| [COLLATE collation_name] | ||
| ``` | ||
|
|
||
| The table character set and collation are used as the default values for column definitions if the column character set and collation are not specified in individual column definitions. |
There was a problem hiding this comment.
| The table character set and collation are used as the default values for column definitions if the column character set and collation are not specified in individual column definitions. | |
| If the column character set and collation are not specified, the table character set and collation are used as their default values. |
|
|
||
| ## String character sets and collation | ||
|
|
||
| Each character literal in a string has a character set and a collation. When you use a string, this option is available: |
There was a problem hiding this comment.
| Each character literal in a string has a character set and a collation. When you use a string, this option is available: | |
| Each string corresponds to a character set and a collation. When you use a string, this option is available: |
|
|
||
| ## String character sets and collation | ||
|
|
||
| Each character literal in a string has a character set and a collation. When you use a string, this option is available: |
There was a problem hiding this comment.
| Each character literal in a string has a character set and a collation. When you use a string, this option is available: | |
| Each string corresponds to a character set and a collation. When you use a string, this option is available: |
|
|
||
| Rules: | ||
|
|
||
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `CHARACTER SET charset_name` and `COLLATE collation_name` are used directly. |
There was a problem hiding this comment.
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `CHARACTER SET charset_name` and `COLLATE collation_name` are used directly. | |
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `charset_name` and `collation_name` are used directly. |
I changed the zh doc in pingcap/docs-cn#3056
| Rules: | ||
|
|
||
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `CHARACTER SET charset_name` and `COLLATE collation_name` are used directly. | ||
| + Rule 2: If you specify `CHARACTER SET charset_name` but do not specify `COLLATE collation_name`, `CHARACTER SET charset_name` and the default collation of `CHARACTER SET charset_name` are used. |
There was a problem hiding this comment.
| + Rule 2: If you specify `CHARACTER SET charset_name` but do not specify `COLLATE collation_name`, `CHARACTER SET charset_name` and the default collation of `CHARACTER SET charset_name` are used. | |
| + Rule 2: If you specify `CHARACTER SET charset_name` but do not specify `COLLATE collation_name`, `charset_name` and the default collation of `charset_name` are used. |
|
|
||
| ## General rules on selecting character sets and collation | ||
|
|
||
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `CHARACTER SET charset_name` and `COLLATE collation_name` are used directly. |
There was a problem hiding this comment.
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `CHARACTER SET charset_name` and `COLLATE collation_name` are used directly. | |
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `charset_name` and `collation_name` are used directly. |
| ## General rules on selecting character sets and collation | ||
|
|
||
| + Rule 1: If you specify `CHARACTER SET charset_name` and `COLLATE collation_name`, then `CHARACTER SET charset_name` and `COLLATE collation_name` are used directly. | ||
| + Rule 2: If you specify `CHARACTER SET charset_name` and do not specify `COLLATE collation_name`, then `CHARACTER SET charset_name` and the default comparison collation of `CHARACTER SET charset_name` are used. |
There was a problem hiding this comment.
| + Rule 2: If you specify `CHARACTER SET charset_name` and do not specify `COLLATE collation_name`, then `CHARACTER SET charset_name` and the default comparison collation of `CHARACTER SET charset_name` are used. | |
| + Rule 2: If you specify `CHARACTER SET charset_name` and do not specify `COLLATE collation_name`, then `charset_name` and the default collation of `charset_name` are used. |
| + Rule 2: If you specify `CHARACTER SET charset_name` but do not specify `COLLATE collation_name`, `CHARACTER SET charset_name` and the default collation of `CHARACTER SET charset_name` are used. | ||
| + Rule 3: If you specify neither `CHARACTER SET charset_name` nor `COLLATE collation_name`, the character set and collation given by the system variables `character_set_connection` and `collation_connection` are used. | ||
|
|
||
| ## Connection character sets and collations |
There was a problem hiding this comment.
| ## Connection character sets and collations | |
| ## Client connection character set and collation |
|
|
||
| + The server character set and collation are the values of the `character_set_server` and `collation_server` system variables. | ||
|
|
||
| + The character set and collation of the default database are the values of the `character_set_database` and `collation_database` system variables. You can use `character_set_connection` and `collation_connection` to specify the character set and collation for each connection. The `character_set_client` variable is to set the client character set. Before returning the result, the `character_set_results` system variable indicates the character set in which the server returns query results to the client, including the metadata of the result. |
There was a problem hiding this comment.
| + The character set and collation of the default database are the values of the `character_set_database` and `collation_database` system variables. You can use `character_set_connection` and `collation_connection` to specify the character set and collation for each connection. The `character_set_client` variable is to set the client character set. Before returning the result, the `character_set_results` system variable indicates the character set in which the server returns query results to the client, including the metadata of the result. | |
| + The character set and collation of the default database are the values of the `character_set_database` and `collation_database` environment variables. | |
| You can use `character_set_connection` and `collation_connection` to specify the character set and collation for each client connection. | |
| The `character_set_client` variable is to set the client character set. Before returning the result, the server converts the query result to the character set corresponding to the `character_set_results` variable, including the metadata of the result. |
|
|
||
| + The character set and collation of the default database are the values of the `character_set_database` and `collation_database` system variables. You can use `character_set_connection` and `collation_connection` to specify the character set and collation for each connection. The `character_set_client` variable is to set the client character set. Before returning the result, the `character_set_results` system variable indicates the character set in which the server returns query results to the client, including the metadata of the result. | ||
|
|
||
| You can use the following statement to specify a particular collation that is related to the client: |
There was a problem hiding this comment.
| You can use the following statement to specify a particular collation that is related to the client: | |
| You can use the following statement to set the character set and collation that is related to the client: |
|
|
||
| ## Validity check of characters | ||
|
|
||
| For the specified `utf8` or `utf8mb4` character set, TiDB only supports the valid `utf8` character, and reports the `incorrect utf8 value` error when the character is invalid. This validity check of characters in TiDB is compatible with MySQL 8.0 but incompatible with MySQL 5.7 or earlier versions. |
There was a problem hiding this comment.
| For the specified `utf8` or `utf8mb4` character set, TiDB only supports the valid `utf8` character, and reports the `incorrect utf8 value` error when the character is invalid. This validity check of characters in TiDB is compatible with MySQL 8.0 but incompatible with MySQL 5.7 or earlier versions. | |
| If the specified character set is `utf8` or `utf8mb4`, TiDB only supports the valid `utf8` characters. For invalid characters, TiDB reports the `incorrect utf8 value` error. This validity check of characters in TiDB is compatible with MySQL 8.0 but incompatible with MySQL 5.7 or earlier versions. |
| For more information, see [Connection Character Sets and Collations in MySQL](https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html). | ||
|
|
There was a problem hiding this comment.
| For more information, see [Connection Character Sets and Collations in MySQL](https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html). |
|
|
||
| The syntax support and semantic support for the collation are influenced by the [`new_collations_enabled_on_first_bootstrap`](/reference/configuration/tidb-server/configuration-file.md#new_collations_enabled_on_first_bootstrap) configuration item. The syntax support and semantic support are different. The former indicates that TiDB can parse and set collations. The latter indicates that TiDB can correctly use collations when comparing strings. | ||
|
|
||
| Before v4.0, TiDB only supports syntactically parsing most of the MySQL collations but semantically takes all collations as binary collations, which is the [old framework for collations](#old-framework-for-collations). |
There was a problem hiding this comment.
| Before v4.0, TiDB only supports syntactically parsing most of the MySQL collations but semantically takes all collations as binary collations, which is the [old framework for collations](#old-framework-for-collations). | |
| Before v4.0, TiDB provides only the [old framework for collations](#old-framework-for-collations). In this framework, TiDB supports syntactically parsing most of the MySQL collations but semantically takes all collations as binary collations. |
|
|
||
| Before v4.0, TiDB only supports syntactically parsing most of the MySQL collations but semantically takes all collations as binary collations, which is the [old framework for collations](#old-framework-for-collations). | ||
|
|
||
| Since v4.0, TiDB supports semantically parsing different collations and strictly following the collations when comparing strings, which is the [new framework for collations](#new-framework-for-collations). |
There was a problem hiding this comment.
| Since v4.0, TiDB supports semantically parsing different collations and strictly following the collations when comparing strings, which is the [new framework for collations](#new-framework-for-collations). | |
| Since v4.0, TiDB supports a [new framework for collations](#new-framework-for-collations). In this framework, TiDB semantically parses different collations and strictly follows the collations when comparing strings. |
|
@yikeke All comments are addressed, PTAL again, thanks! |
yikeke
left a comment
There was a problem hiding this comment.
To align new commits in pingcap/docs-cn#3056
Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com>
|
/merge |
|
/run-all-tests |
|
cherry pick to release-4.0 in PR #2541 |
What is changed, added or deleted? (Required)
Update documents for new collation.
Which TiDB version(s) do your changes apply to? (Required)
If you select two or more versions from above, to trigger the bot to cherry-pick this PR to your desired release version branch(es), you must add corresponding labels such as needs-cherry-pick-4.0, needs-cherry-pick-3.1, needs-cherry-pick-3.0, and needs-cherry-pick-2.1.
What is the related PR or file link(s)?