From c6f918140e8f7f1af1369605b2d1adf36aa5d4fb Mon Sep 17 00:00:00 2001 From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com> Date: Fri, 18 Jun 2021 15:43:55 +0800 Subject: [PATCH 1/7] add stale read docs --- TOC.md | 7 +- as-of-timestamp.md | 261 ++++++++++++++++++ best-practices/three-dc-local-read.md | 29 ++ pd-control.md | 2 +- read-historical-data.md | 16 +- sql-statements/sql-statement-select.md | 8 +- .../sql-statement-set-transaction.md | 21 +- .../sql-statement-start-transaction.md | 7 +- stale-read.md | 25 ++ 9 files changed, 357 insertions(+), 19 deletions(-) create mode 100644 as-of-timestamp.md create mode 100644 best-practices/three-dc-local-read.md create mode 100644 stale-read.md diff --git a/TOC.md b/TOC.md index ad8375f603096..57376c2da63f8 100644 --- a/TOC.md +++ b/TOC.md @@ -64,7 +64,11 @@ + [BR Use Cases](/br/backup-and-restore-use-cases.md) + [External Storages](/br/backup-and-restore-storages.md) + [BR FAQ](/br/backup-and-restore-faq.md) - + [Read Historical Data](/read-historical-data.md) + + Read Historical Data + + Read Historical Data Using Stale Read (Recommended) + + [Usage Scenarios of Stale Read](/stale-read.md) + + [Read Historical Data Using SQL Statments](/as-of-timestamp.md) + + [Read Historical Data Using System Variables](/read-historical-data.md) + [Configure Time Zone](/configure-time-zone.md) + [Daily Checklist](/daily-check.md) + [Maintain TiFlash](/tiflash/maintain-tiflash.md) @@ -150,6 +154,7 @@ + [PD Scheduling](/best-practices/pd-scheduling-best-practices.md) + [TiKV Performance Tuning with Massive Regions](/best-practices/massive-regions-best-practices.md) + [Three-node Hybrid Deployment](/best-practices/three-nodes-hybrid-deployment.md) + + [Local Read Under Three Data Centers Deployment](/best-practices/three-dc-local-read.md) + [Use Placement Rules](/configure-placement-rules.md) + [Use Load Base Split](/configure-load-base-split.md) + [Use Store Limit](/configure-store-limit.md) diff --git a/as-of-timestamp.md b/as-of-timestamp.md new file mode 100644 index 0000000000000..619cdf8259baf --- /dev/null +++ b/as-of-timestamp.md @@ -0,0 +1,261 @@ +--- +title: Read Historical Data Using the `AS OF TIMESTAMP` Clause +summary: Learn how to read historical data using the `AS OF TIMESTAMP` statement clause. +--- + +# Read Historical Data Using the `AS OF TIMESTAMP` Clause + +This document describes how to use the [Stale Read](/stale-read.md) feature with the `AS OF TIMESTAMP` clause to read historical data in TiDB, including specific usage examples and strategies for saving historical data. + +TiDB supports reading historical data through a standard SQL interface, which is the `AS OF TIMESTAMP` SQL clause, without the need for special clients or drivers. After data is updated or deleted, you can read the historical data before the update or deletion using this SQL interface. + +> **Note:** +> +> When reading historical data, TiDB returns the data with the old table structure even if the current table structure is different. + +## Syntax + +You can use the `AS OF TIMESTAMP` clause in the following three ways: + +- [`SELECT ... FROM ... AS OF TIMESTAMP`](/sql-statements/sql-statement-select.md) +- [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) +- [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) + +If you want to specify an exact point of time, you can set a datetime value or use a time function in the `AS OF TIMESTAMP` clause. The format of datetime is like "2016-10-08 16:45:26.999", with millisecond as the minimum time unit, but most of the time, the second unit is enough for specifying a datetime, such as "2016-10-08 16:45:26". You can also get the current time to the millisecond with the `NOW(3)` function. If you want to read data from several seconds ago, it is **recommended** to use an expression such as `NOW() - INTERVAL 10 SECOND`. + +If you want to specify a time range, you can use the `TIDB_BOUNDED_STALENESS()` function in the clause. When this function is used, TiDB selects a suitable timestamp within the specified time range. "Suitable" means there are no transactions that start before this timestamp and have not been committed on the accessed replica, that is, TiDB can perform read operations on the accessed replica and the read operations are not blocked. You need to use `TIDB_BOUNDED_STALENESS(t1, t2)` to call this function. `t1` and `t2` are the two ends of the time range, which can be specified using either datetime values or time functions. + +Here are some examples of the `AS OF TIMESTAMP` clause: + +- `AS OF TIMESTAMP '2016-10-08 16:45:26'`: Tells TiDB to read the latest data stored at 16:45:26 on October 8, 2016. +- `AS OF TIMESTAMP NOW() - INTERVAL 10 SECOND`: Tells TiDB to read the latest data stored 10 seconds ago. +- `AS OF TIMESTAMP TIDB_BOUNDED_STALENESS('2016-10-08 16:45:26', '2016-10-08 16:45:29')`: Tells TiDB to read the data as new as possible within the time range of 16:45:26 to 16:45:29 on October 8, 2016. +- `AS OF TIMESTAMP TIDB_BOUNDED_STALENESS(NOW() - INTERVAL 20 SECOND, NOW())`: Tells TiDB to read the data as new as possible within the time range of 20 seconds ago to the present. + +Note that in addition to specifying a timestamp, the most common use of the `AS OF TIMESTAMP` clause is to read data that is several seconds old. If this approach is used, it is recommended to read historical data older than 5 seconds. + +## Usage examples + +This section describes the different ways to use the `AS OF TIMESTAMP` clause with several examples. It first introduces how to prepare the data for recovery, and then shows how to use `AS OF TIMESTAMP` with `SELECT`, `START TRANSACTION READ ONLY AS OF TIMESTAMP`, `SET TRANSACTION READ ONLY AS OF TIMESTAMP`, and `SELECT` statements respectively. + +### Prepare data sample + +To prepare data for recovery, create a table first and insert several rows of data: + +```sql +create table t (c int); +``` + +``` +Query OK, 0 rows affected (0.01 sec) +``` + +```sql +insert into t values (1), (2), (3); +``` + +``` +Query OK, 3 rows affected (0.00 sec) +``` + +View the data in the table: + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +View current time: + +```sql +select now(); +``` + +``` ++---------------------+ +| now() | ++---------------------+ +| 2021-05-26 16:45:26 | ++---------------------+ +1 row in set (0.00 sec) +``` + +Update the data in a row: + +```sql +update t set c=22 where c=2; +``` + +``` +Query OK, 1 row affected (0.00 sec) +``` + +Confirm that the data of the row is updated: + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 22 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +### Read historical data using the `SELECT` statement + +You can use the [`SELECT ... FROM ... AS OF TIMESTAMP`](/sql-statements/sql-statement-select.md) statement to read data from a time point in the past. + +```sql +select * from t as of timestamp '2021-05-26 16:45:26'; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +> **Note:** +> +> When reading multiple tables with one `SELECT` statement, you need to make sure that the format of TIMESTAMP EXPRESSIONs is consistent. For example, `select * from t as of timestamp NOW() - INTERVAL 2 SECOND, c as of timestamp NOW() - INTERVAL 2 SECOND;`. In addition, you must specify the as of information for the relevant table in the `SELECT` statement; otherwise, the `SELECT` statement reads the latest data by default. + +### Read historical data using the `START TRANSACTION READ ONLY AS OF TIMESTAMP` statement + +You can use the [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) statement to start a read-only transaction based on a time point in the past. The transaction reads historical data from the given history time. + +```sql +start transaction read only as of timestamp '2021-05-26 16:45:26'; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +```sql +commit; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +After the transaction is committed, you can read the historical data. + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 22 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +> **Note:** +> +> If you start a transaction with the statement `START TRANSACTION READ ONLY AS OF TIMESTAMP`, it is a read-only transaction. Write operations are rejected in this transaction. + +### Read historical data using the `SET TRANSACTION READ ONLY AS OF TIMESTAMP` statement + +You can use the [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) statement to set the next transaction as a read-only transaction based on a specified time point in the past. The transaction reads historical data from the given history time. + +```sql +set transaction read only as of timestamp '2021-05-26 16:45:26'; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +```sql +begin; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +```sql +commit; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +After the transaction is committed, you can read the historical data. + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 22 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +> **Note:** +> +> If you start a transaction with the statement `SET TRANSACTION READ ONLY AS OF TIMESTAMP`, it is a read-only transaction. Write operations are rejected in this transaction. \ No newline at end of file diff --git a/best-practices/three-dc-local-read.md b/best-practices/three-dc-local-read.md new file mode 100644 index 0000000000000..bb64a2711b125 --- /dev/null +++ b/best-practices/three-dc-local-read.md @@ -0,0 +1,29 @@ +--- +title: Local Read under Three Data Centers Deployment +summary: Learn how to use the Stale Read feature to read local data under three DCs deployment and thus reduce cross-center requests. +--- + +# Local Read under Three Data Centers Deployment + +In the model of three data centers, a Region has three replicas which are isolated in each data center. However, due to the requirement of strongly consistent read, TiDB must access the Leader replica of the corresponding data for every query. If the query is generated in a data center different from that of the Leader replica, TiDB needs to read data from another data center, thus causing the access latency to increase. + +This document describes how to use the [Stale Read](/stale-read.md) feature to avoid cross-center access and reduce the access latency at the expense of real-time data availability. + +## Deploy a TiDB cluster of three data centers + +For how to deploy three data centers, refer to [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) + +Note that if both the TiKV and TiDB nodes have the configuration item `labels` configured, the TiKV and TiDB nodes in the same data center must have the same value for the `zone` label. For example, if a TiKV node and a TiDB node are both in the data center `dc-1`, then the two nodes need to be configured with the following label: + +``` +[labels] +zone=dc-1 +``` + +## Perform local read using stale reads + +[Stale Read](/stale-read.md) is a mechanism that TiDB provides for the users to read historical data. With this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. When using stale reads in some scenarios of geo-distributed deployment, TiDB sacrifices some real-time performance but accesses the replica in the current data center to read the corresponding data, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query process. + +When TiDB receives a stale read query, if the `zone` label of that TiDB node is configured, then TiDB sends the request to the TiKV node with the same `zone` label where the corresponding data replica resides. + +For how to perform stale reads, see [Read historical data using the `AS OF TIMESTAMP` clause](/as-of-timestamp.md). diff --git a/pd-control.md b/pd-control.md index afb4358087ce8..1a3999f0b556b 100644 --- a/pd-control.md +++ b/pd-control.md @@ -326,7 +326,7 @@ Usage: - `enable-debug-metrics` is used to enable the metrics for debugging. When you set it to `true`, PD enables some metrics such as `balance-tolerant-size`. -- `enable-placement-rules` is used to enable placement rules. +- `enable-placement-rules` is used to enable placement rules, which is enabled by default in v5.0 and later versions. - `store-limit-mode` is used to control the mode of limiting the store speed. The optional modes are `auto` and `manual`. In `auto` mode, the stores are automatically balanced according to the load (experimental). diff --git a/read-historical-data.md b/read-historical-data.md index 45a9837a5526f..919f7272afe39 100644 --- a/read-historical-data.md +++ b/read-historical-data.md @@ -4,16 +4,22 @@ summary: Learn about how TiDB reads data from history versions. aliases: ['/docs/dev/read-historical-data/','/docs/dev/how-to/get-started/read-historical-data/'] --- -# Read Historical Data +# Read Historical Data Using the System Variable `tidb_snapshot` -This document describes how TiDB reads data from the history versions, how TiDB manages the data versions, as well as an example to show how to use the feature. +This document describes how to read data from the history versions using the system variable `tidb_snapshot`, including specific usage examples and strategies for saving historical data. + +> **Note:** +> +> You can also use the [Stale Read](/stale-read.md) feature to read historical data, which is more recommended. ## Feature description -TiDB implements a feature to read history data using the standard SQL interface directly without special clients or drivers. By using this feature: +TiDB implements a feature to read history data using the standard SQL interface directly without special clients or drivers. -- Even when data is updated or removed, its history versions can be read using the SQL interface. -- Even if the table structure changes after the data is updated, TiDB can use the old structure to read the history data. +> **Note:** +> +> - Even when data is updated or removed, its history versions can be read using the SQL interface. +> - When reading historical data, TiDB returns the data with the old table structure even if the current table structure is different. ## How TiDB reads data from history versions diff --git a/sql-statements/sql-statement-select.md b/sql-statements/sql-statement-select.md index 40a387400fe77..eda534259c846 100644 --- a/sql-statements/sql-statement-select.md +++ b/sql-statements/sql-statement-select.md @@ -30,9 +30,13 @@ The `SELECT` statement is used to read data from TiDB. ![SelectStmtFieldList](/media/sqlgram/SelectStmtFieldList.png) -**TableRefsClause:** +```ebnf+diagram +TableRefsClause ::= + TableRef AsOfClause? ( ',' TableRef AsOfClause? )* -![TableRefsClause](/media/sqlgram/TableRefsClause.png) +AsOfClause ::= + 'AS' 'OF' 'TIMESTAMP' Expression +``` **WhereClauseOptional:** diff --git a/sql-statements/sql-statement-set-transaction.md b/sql-statements/sql-statement-set-transaction.md index b01acd98fe029..56c6354d533bb 100644 --- a/sql-statements/sql-statement-set-transaction.md +++ b/sql-statements/sql-statement-set-transaction.md @@ -10,17 +10,22 @@ The `SET TRANSACTION` statement can be used to change the current isolation leve ## Synopsis -**SetStmt:** +```ebnf+diagram +SetStmt ::= + 'SET' ( VariableAssignmentList | + 'PASSWORD' ('FOR' Username)? '=' PasswordOpt | + ( 'GLOBAL'| 'SESSION' )? 'TRANSACTION' TransactionChars | + 'CONFIG' ( Identifier | stringLit) ConfigItemName EqOrAssignmentEq SetExpr ) -![SetStmt](/media/sqlgram/SetStmt.png) +TransactionChars ::= + ( 'ISOLATION' 'LEVEL' IsolationLevel | 'READ' 'WRITE' | 'READ' 'ONLY' AsOfClause? ) -**TransactionChar:** +IsolationLevel ::= + ( 'REPEATABLE' 'READ' | 'READ' ( 'COMMITTED' | 'UNCOMMITTED' ) | 'SERIALIZABLE' ) -![TransactionChar](/media/sqlgram/TransactionChar.png) - -**IsolationLevel:** - -![IsolationLevel](/media/sqlgram/IsolationLevel.png) +AsOfClause ::= + ( 'AS' 'OF' 'TIMESTAMP' Expression) +``` ## Examples diff --git a/sql-statements/sql-statement-start-transaction.md b/sql-statements/sql-statement-start-transaction.md index 69eaad3afc241..b943d8f2c69dd 100644 --- a/sql-statements/sql-statement-start-transaction.md +++ b/sql-statements/sql-statement-start-transaction.md @@ -15,9 +15,12 @@ In the absence of a `START TRANSACTION` statement, every statement will by defau **BeginTransactionStmt:** ```ebnf+diagram -BeginTransactionStmt ::= +BeginTransactionStmt ::= 'BEGIN' ( 'PESSIMISTIC' | 'OPTIMISTIC' )? -| 'START' 'TRANSACTION' ( 'READ' ( 'WRITE' | 'ONLY' ( 'WITH' 'TIMESTAMP' 'BOUND' TimestampBound )? ) | 'WITH' 'CONSISTENT' 'SNAPSHOT' | 'WITH' 'CAUSAL' 'CONSISTENCY' 'ONLY' )? +| 'START' 'TRANSACTION' ( 'READ' ( 'WRITE' | 'ONLY' ( ( 'WITH' 'TIMESTAMP' 'BOUND' TimestampBound )? | AsOfClause ) ) | 'WITH' 'CONSISTENT' 'SNAPSHOT' | 'WITH' 'CAUSAL' 'CONSISTENCY' 'ONLY' )? + +AsOfClause ::= + ( 'AS' 'OF' 'TIMESTAMP' Expression) ``` ## Examples diff --git a/stale-read.md b/stale-read.md new file mode 100644 index 0000000000000..85da3dd1b2c8e --- /dev/null +++ b/stale-read.md @@ -0,0 +1,25 @@ +--- +title: Usage Scenarios of Stale Read +summary: Learn about Stale Read and its usage scenarios. +--- + +# Usage Scenarios of Stale Read + +This document describes the usage scenarios of stale read. Stale read is a mechanism that TiDB applies to read historical versions of data stored in TiDB. With this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. + +Internally, stale read allows TiDB to read from any replica the data of the specified point in time or the data as new as possible within the specified time range, and to always ensure the data consistency constraint during the reading process. + +## Scenario examples + ++ Scenario one: If a transaction only involves read operations and is tolerant of data staleness to some extent, you can use stale reads to get historical data. With stale read, by sacrificing some real-time performance, TiDB sends the query requests to any replica of the corresponding data and thus increases the throughput of query executions. + ++ Scenario two: In some scenarios where small tables are queried, if strongly consistent reads are used, data may be concentrated on a certain storage node, causing the query pressure to be concentrated on that node as well. Therefore, the node may become a bottleneck for the whole query. With stale read, TiDB distributes the query requests to each replica of the corresponding data, which can improve the overall query throughput and significantly improve the query performance. + ++ Scenario three: In some scenarios of geo-distributed deployment, if strongly consistent follower reads are used, in order to make sure that the data read from the Follower is consistent with that stored in the Leader, TiDB requests `Readindex` from different data centers for verification, which increases the access latency for the whole query process. With stale read, by sacrificing some real-time performance, TiDB accesses the replica in the current data center to read the corresponding data, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query. For more information, see [Local Read under Three Data Centers Deployment](/best-practices/three-dc-local-read.md). + +## Usages + +In TiDB, you can specify either an exact point in time or a time range when performing stale reads: + +- Specifying an exact point in time (recommended): If you need TiDB to read data that follows the global transaction consistency from a specific point in time without damaging the isolation level, you can specify the corresponding timestamp of that point in time in the query statement. For detailed usage, see [`AS OF TIMESTAMP` Clause](/as-of-timestamp.md#syntax). +- Specifying a time range: If you need TiDB to read data as new as possible within a time range without damaging the isolation level, you can specify the time range in the query statement. Then, TiDB selects a suitable timestamp within the specified time range to read the corresponding data. "Suitable" means there are no transactions that start before this timestamp and have not been committed on the accessed replica, that is, TiDB can perform read operations on the accessed replica and the read operations are not blocked. For detailed usage, refer to the introduction of the [`AS OF TIMESTAMP` clause](/as-of-timestamp.md#syntax) and the [`TIDB_BOUNDED_STALENESS` function](/as-of-timestamp.md#syntax). From c2722ce1e41eae91a5e6b55328295de1a2f1b066 Mon Sep 17 00:00:00 2001 From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com> Date: Tue, 22 Jun 2021 11:41:39 +0800 Subject: [PATCH 2/7] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- TOC.md | 6 +++--- as-of-timestamp.md | 16 ++++++++-------- best-practices/three-dc-local-read.md | 2 +- sql-statements/sql-statement-select.md | 2 ++ 4 files changed, 14 insertions(+), 12 deletions(-) diff --git a/TOC.md b/TOC.md index 57376c2da63f8..326d7dafce380 100644 --- a/TOC.md +++ b/TOC.md @@ -65,10 +65,10 @@ + [External Storages](/br/backup-and-restore-storages.md) + [BR FAQ](/br/backup-and-restore-faq.md) + Read Historical Data - + Read Historical Data Using Stale Read (Recommended) + + Use Stale Read (Recommended) + [Usage Scenarios of Stale Read](/stale-read.md) - + [Read Historical Data Using SQL Statments](/as-of-timestamp.md) - + [Read Historical Data Using System Variables](/read-historical-data.md) + + [Perform Stale Read Using `As OF TIMESTAMP`](/as-of-timestamp.md) + + [Use the `tidb_snapshot` System Variable](/read-historical-data.md) + [Configure Time Zone](/configure-time-zone.md) + [Daily Checklist](/daily-check.md) + [Maintain TiFlash](/tiflash/maintain-tiflash.md) diff --git a/as-of-timestamp.md b/as-of-timestamp.md index 619cdf8259baf..edd083199affa 100644 --- a/as-of-timestamp.md +++ b/as-of-timestamp.md @@ -5,7 +5,7 @@ summary: Learn how to read historical data using the `AS OF TIMESTAMP` statement # Read Historical Data Using the `AS OF TIMESTAMP` Clause -This document describes how to use the [Stale Read](/stale-read.md) feature with the `AS OF TIMESTAMP` clause to read historical data in TiDB, including specific usage examples and strategies for saving historical data. +This document describes how to perform the [Stale Read](/stale-read.md) feature using the `AS OF TIMESTAMP` clause to read historical data in TiDB, including specific usage examples and strategies for saving historical data. TiDB supports reading historical data through a standard SQL interface, which is the `AS OF TIMESTAMP` SQL clause, without the need for special clients or drivers. After data is updated or deleted, you can read the historical data before the update or deletion using this SQL interface. @@ -21,7 +21,7 @@ You can use the `AS OF TIMESTAMP` clause in the following three ways: - [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) - [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) -If you want to specify an exact point of time, you can set a datetime value or use a time function in the `AS OF TIMESTAMP` clause. The format of datetime is like "2016-10-08 16:45:26.999", with millisecond as the minimum time unit, but most of the time, the second unit is enough for specifying a datetime, such as "2016-10-08 16:45:26". You can also get the current time to the millisecond with the `NOW(3)` function. If you want to read data from several seconds ago, it is **recommended** to use an expression such as `NOW() - INTERVAL 10 SECOND`. +If you want to specify an exact point of time, you can set a datetime value or use a time function in the `AS OF TIMESTAMP` clause. The format of datetime is like "2016-10-08 16:45:26.999", with millisecond as the minimum time unit, but for most of the time, the time unit of second is enough for specifying a datetime, such as "2016-10-08 16:45:26". You can also get the current time to the millisecond using the `NOW(3)` function. If you want to read the data of several seconds ago, it is **recommended** to use an expression such as `NOW() - INTERVAL 10 SECOND`. If you want to specify a time range, you can use the `TIDB_BOUNDED_STALENESS()` function in the clause. When this function is used, TiDB selects a suitable timestamp within the specified time range. "Suitable" means there are no transactions that start before this timestamp and have not been committed on the accessed replica, that is, TiDB can perform read operations on the accessed replica and the read operations are not blocked. You need to use `TIDB_BOUNDED_STALENESS(t1, t2)` to call this function. `t1` and `t2` are the two ends of the time range, which can be specified using either datetime values or time functions. @@ -36,7 +36,7 @@ Note that in addition to specifying a timestamp, the most common use of the `AS ## Usage examples -This section describes the different ways to use the `AS OF TIMESTAMP` clause with several examples. It first introduces how to prepare the data for recovery, and then shows how to use `AS OF TIMESTAMP` with `SELECT`, `START TRANSACTION READ ONLY AS OF TIMESTAMP`, `SET TRANSACTION READ ONLY AS OF TIMESTAMP`, and `SELECT` statements respectively. +This section describes different ways to use the `AS OF TIMESTAMP` clause with several examples. It first introduces how to prepare the data for recovery, and then shows how to use `AS OF TIMESTAMP` in `SELECT`, `START TRANSACTION READ ONLY AS OF TIMESTAMP`, and `SET TRANSACTION READ ONLY AS OF TIMESTAMP` respectively. ### Prepare data sample @@ -75,7 +75,7 @@ select * from t; 3 rows in set (0.00 sec) ``` -View current time: +View the current time: ```sql select now(); @@ -138,11 +138,11 @@ select * from t as of timestamp '2021-05-26 16:45:26'; > **Note:** > -> When reading multiple tables with one `SELECT` statement, you need to make sure that the format of TIMESTAMP EXPRESSIONs is consistent. For example, `select * from t as of timestamp NOW() - INTERVAL 2 SECOND, c as of timestamp NOW() - INTERVAL 2 SECOND;`. In addition, you must specify the as of information for the relevant table in the `SELECT` statement; otherwise, the `SELECT` statement reads the latest data by default. +> When reading multiple tables using one `SELECT` statement, you need to make sure that the format of TIMESTAMP EXPRESSIONs is consistent. For example, `select * from t as of timestamp NOW() - INTERVAL 2 SECOND, c as of timestamp NOW() - INTERVAL 2 SECOND;`. In addition, you must specify the `AS OF` information for the relevant table in the `SELECT` statement; otherwise, the `SELECT` statement reads the latest data by default. ### Read historical data using the `START TRANSACTION READ ONLY AS OF TIMESTAMP` statement -You can use the [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) statement to start a read-only transaction based on a time point in the past. The transaction reads historical data from the given history time. +You can use the [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) statement to start a read-only transaction based on a time point in the past. The transaction reads historical data of the given time. ```sql start transaction read only as of timestamp '2021-05-26 16:45:26'; @@ -198,7 +198,7 @@ select * from t; ### Read historical data using the `SET TRANSACTION READ ONLY AS OF TIMESTAMP` statement -You can use the [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) statement to set the next transaction as a read-only transaction based on a specified time point in the past. The transaction reads historical data from the given history time. +You can use the [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) statement to set the next transaction as a read-only transaction based on a specified time point in the past. The transaction reads historical data of the given time. ```sql set transaction read only as of timestamp '2021-05-26 16:45:26'; @@ -258,4 +258,4 @@ select * from t; > **Note:** > -> If you start a transaction with the statement `SET TRANSACTION READ ONLY AS OF TIMESTAMP`, it is a read-only transaction. Write operations are rejected in this transaction. \ No newline at end of file +> If you start a transaction with the statement `SET TRANSACTION READ ONLY AS OF TIMESTAMP`, it is a read-only transaction. Write operations are rejected in this transaction. diff --git a/best-practices/three-dc-local-read.md b/best-practices/three-dc-local-read.md index bb64a2711b125..6bd25f511acd7 100644 --- a/best-practices/three-dc-local-read.md +++ b/best-practices/three-dc-local-read.md @@ -11,7 +11,7 @@ This document describes how to use the [Stale Read](/stale-read.md) feature to a ## Deploy a TiDB cluster of three data centers -For how to deploy three data centers, refer to [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) +For the three-data-center deployment method, refer to [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md). Note that if both the TiKV and TiDB nodes have the configuration item `labels` configured, the TiKV and TiDB nodes in the same data center must have the same value for the `zone` label. For example, if a TiKV node and a TiDB node are both in the data center `dc-1`, then the two nodes need to be configured with the following label: diff --git a/sql-statements/sql-statement-select.md b/sql-statements/sql-statement-select.md index eda534259c846..fc2811a18b604 100644 --- a/sql-statements/sql-statement-select.md +++ b/sql-statements/sql-statement-select.md @@ -30,6 +30,8 @@ The `SELECT` statement is used to read data from TiDB. ![SelectStmtFieldList](/media/sqlgram/SelectStmtFieldList.png) +**TableRefsClause:** + ```ebnf+diagram TableRefsClause ::= TableRef AsOfClause? ( ',' TableRef AsOfClause? )* From ffc2f0f1edf5cbec5f0e6b843dfb3cdbd9d67f97 Mon Sep 17 00:00:00 2001 From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com> Date: Tue, 22 Jun 2021 11:50:37 +0800 Subject: [PATCH 3/7] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- best-practices/three-dc-local-read.md | 8 ++++---- stale-read.md | 10 +++++----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/best-practices/three-dc-local-read.md b/best-practices/three-dc-local-read.md index 6bd25f511acd7..b404757df4ed6 100644 --- a/best-practices/three-dc-local-read.md +++ b/best-practices/three-dc-local-read.md @@ -20,10 +20,10 @@ Note that if both the TiKV and TiDB nodes have the configuration item `labels` c zone=dc-1 ``` -## Perform local read using stale reads +## Perform local read using Stale Read -[Stale Read](/stale-read.md) is a mechanism that TiDB provides for the users to read historical data. With this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. When using stale reads in some scenarios of geo-distributed deployment, TiDB sacrifices some real-time performance but accesses the replica in the current data center to read the corresponding data, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query process. +[Stale Read](/stale-read.md) is a mechanism that TiDB provides for the users to read historical data. Using this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. When using Stale Read in some scenarios of geo-distributed deployment, TiDB accesses the replica in the current data center to read the corresponding data at the expense of some real-time performance, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query process. -When TiDB receives a stale read query, if the `zone` label of that TiDB node is configured, then TiDB sends the request to the TiKV node with the same `zone` label where the corresponding data replica resides. +When TiDB receives a Stale Read query, if the `zone` label of that TiDB node is configured, then TiDB sends the request to the TiKV node with the same `zone` label where the corresponding data replica resides. -For how to perform stale reads, see [Read historical data using the `AS OF TIMESTAMP` clause](/as-of-timestamp.md). +For how to perform Stale Read, see [Perform Stale Read using the `AS OF TIMESTAMP` clause](/as-of-timestamp.md). diff --git a/stale-read.md b/stale-read.md index 85da3dd1b2c8e..00ee16e10e35c 100644 --- a/stale-read.md +++ b/stale-read.md @@ -5,17 +5,17 @@ summary: Learn about Stale Read and its usage scenarios. # Usage Scenarios of Stale Read -This document describes the usage scenarios of stale read. Stale read is a mechanism that TiDB applies to read historical versions of data stored in TiDB. With this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. +This document describes the usage scenarios of Stale Read. Stale Read is a mechanism that TiDB applies to read historical versions of data stored in TiDB. Using this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. -Internally, stale read allows TiDB to read from any replica the data of the specified point in time or the data as new as possible within the specified time range, and to always ensure the data consistency constraint during the reading process. +In terms of the internal implementation, Stale Read allows TiDB to read from any replica the data of the specified point in time or the data as new as possible within the specified time range, and to always ensure the data consistency constraint during the reading process. ## Scenario examples -+ Scenario one: If a transaction only involves read operations and is tolerant of data staleness to some extent, you can use stale reads to get historical data. With stale read, by sacrificing some real-time performance, TiDB sends the query requests to any replica of the corresponding data and thus increases the throughput of query executions. ++ Scenario one: If a transaction only involves read operations and is tolerant of data staleness to some extent, you can use Stale Read to get historical data. Using Stale Read, TiDB sends the query requests to any replica of the corresponding data at the expense of some real-time performance, and thus increases the throughput of query executions. -+ Scenario two: In some scenarios where small tables are queried, if strongly consistent reads are used, data may be concentrated on a certain storage node, causing the query pressure to be concentrated on that node as well. Therefore, the node may become a bottleneck for the whole query. With stale read, TiDB distributes the query requests to each replica of the corresponding data, which can improve the overall query throughput and significantly improve the query performance. ++ Scenario two: In some scenarios where small tables are queried, if strongly consistent reads are used, data might be concentrated on a certain storage node, causing the query pressure to be concentrated on that node as well. Therefore, that node might become a bottleneck for the whole query. With Stale Read, TiDB distributes the query requests to each replica of the corresponding data, which can improve the overall query throughput and significantly improve the query performance. -+ Scenario three: In some scenarios of geo-distributed deployment, if strongly consistent follower reads are used, in order to make sure that the data read from the Follower is consistent with that stored in the Leader, TiDB requests `Readindex` from different data centers for verification, which increases the access latency for the whole query process. With stale read, by sacrificing some real-time performance, TiDB accesses the replica in the current data center to read the corresponding data, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query. For more information, see [Local Read under Three Data Centers Deployment](/best-practices/three-dc-local-read.md). ++ Scenario three: In some scenarios of geo-distributed deployment, if strongly consistent follower reads are used, to make sure that the data read from the Followers is consistent with that stored in the Leader, TiDB requests `Readindex` from different data centers for verification, which increases the access latency for the whole query process. With Stale Read, TiDB accesses the replica in the current data center to read the corresponding data at the expense of some real-time performance, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query. For more information, see [Local Read under Three Data Centers Deployment](/best-practices/three-dc-local-read.md). ## Usages From 167871e59deee647ff99e2b3a8b87bd4b65f7f0e Mon Sep 17 00:00:00 2001 From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com> Date: Tue, 22 Jun 2021 11:52:15 +0800 Subject: [PATCH 4/7] Update read-historical-data.md --- read-historical-data.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/read-historical-data.md b/read-historical-data.md index 919f7272afe39..8ed2e7b70a138 100644 --- a/read-historical-data.md +++ b/read-historical-data.md @@ -1,6 +1,6 @@ --- -title: Read Historical Data -summary: Learn about how TiDB reads data from history versions. +title: Read Historical Data Using the System Variable `tidb_snapshot` +summary: Learn about how TiDB reads data from history versions using the system varialbe `tidb_snapshot`. aliases: ['/docs/dev/read-historical-data/','/docs/dev/how-to/get-started/read-historical-data/'] --- From ccd3b05fb50237960c338ec2acacd8ce8371c566 Mon Sep 17 00:00:00 2001 From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com> Date: Tue, 22 Jun 2021 11:53:08 +0800 Subject: [PATCH 5/7] fix typo --- read-historical-data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/read-historical-data.md b/read-historical-data.md index 8ed2e7b70a138..340deaf2b28a1 100644 --- a/read-historical-data.md +++ b/read-historical-data.md @@ -1,6 +1,6 @@ --- title: Read Historical Data Using the System Variable `tidb_snapshot` -summary: Learn about how TiDB reads data from history versions using the system varialbe `tidb_snapshot`. +summary: Learn about how TiDB reads data from history versions using the system variable `tidb_snapshot`. aliases: ['/docs/dev/read-historical-data/','/docs/dev/how-to/get-started/read-historical-data/'] --- From 67975544a0f4b48ef0b7ab33d654e6c4d6c8bc2a Mon Sep 17 00:00:00 2001 From: Charlotte Liu <37295236+CharLotteiu@users.noreply.github.com> Date: Tue, 22 Jun 2021 14:22:53 +0800 Subject: [PATCH 6/7] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- as-of-timestamp.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/as-of-timestamp.md b/as-of-timestamp.md index edd083199affa..58d851d0cbd89 100644 --- a/as-of-timestamp.md +++ b/as-of-timestamp.md @@ -175,7 +175,7 @@ commit; Query OK, 0 rows affected (0.00 sec) ``` -After the transaction is committed, you can read the historical data. +After the transaction is committed, you can read the latest data. ```sql select * from t; @@ -239,7 +239,7 @@ commit; Query OK, 0 rows affected (0.00 sec) ``` -After the transaction is committed, you can read the historical data. +After the transaction is committed, you can read the latest data. ```sql select * from t; From e9eb8c41e6d8af9b7759bd2789e315a7c09d706e Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Tue, 22 Jun 2021 16:04:11 +0800 Subject: [PATCH 7/7] Update TOC.md --- TOC.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/TOC.md b/TOC.md index 326d7dafce380..01e73d5057a16 100644 --- a/TOC.md +++ b/TOC.md @@ -64,11 +64,6 @@ + [BR Use Cases](/br/backup-and-restore-use-cases.md) + [External Storages](/br/backup-and-restore-storages.md) + [BR FAQ](/br/backup-and-restore-faq.md) - + Read Historical Data - + Use Stale Read (Recommended) - + [Usage Scenarios of Stale Read](/stale-read.md) - + [Perform Stale Read Using `As OF TIMESTAMP`](/as-of-timestamp.md) - + [Use the `tidb_snapshot` System Variable](/read-historical-data.md) + [Configure Time Zone](/configure-time-zone.md) + [Daily Checklist](/daily-check.md) + [Maintain TiFlash](/tiflash/maintain-tiflash.md) @@ -145,6 +140,11 @@ + Tutorials + [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) + [Three Data Centers in Two Cities Deployment](/three-data-centers-in-two-cities-deployment.md) + + Read Historical Data + + Use Stale Read (Recommended) + + [Usage Scenarios of Stale Read](/stale-read.md) + + [Perform Stale Read Using `As OF TIMESTAMP`](/as-of-timestamp.md) + + [Use the `tidb_snapshot` System Variable](/read-historical-data.md) + Best Practices + [Use TiDB](/best-practices/tidb-best-practices.md) + [Java Application Development](/best-practices/java-app-best-practices.md)