diff --git a/TOC.md b/TOC.md index 8268439165f6c..9e14fa83323bb 100644 --- a/TOC.md +++ b/TOC.md @@ -64,7 +64,6 @@ + [BR Use Cases](/br/backup-and-restore-use-cases.md) + [External Storages](/br/backup-and-restore-storages.md) + [BR FAQ](/br/backup-and-restore-faq.md) - + [Read Historical Data](/read-historical-data.md) + [Configure Time Zone](/configure-time-zone.md) + [Daily Checklist](/daily-check.md) + [Maintain TiFlash](/tiflash/maintain-tiflash.md) @@ -141,6 +140,11 @@ + Tutorials + [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) + [Three Data Centers in Two Cities Deployment](/three-data-centers-in-two-cities-deployment.md) + + Read Historical Data + + Use Stale Read (Recommended) + + [Usage Scenarios of Stale Read](/stale-read.md) + + [Perform Stale Read Using `As OF TIMESTAMP`](/as-of-timestamp.md) + + [Use the `tidb_snapshot` System Variable](/read-historical-data.md) + Best Practices + [Use TiDB](/best-practices/tidb-best-practices.md) + [Java Application Development](/best-practices/java-app-best-practices.md) @@ -150,6 +154,7 @@ + [PD Scheduling](/best-practices/pd-scheduling-best-practices.md) + [TiKV Performance Tuning with Massive Regions](/best-practices/massive-regions-best-practices.md) + [Three-node Hybrid Deployment](/best-practices/three-nodes-hybrid-deployment.md) + + [Local Read Under Three Data Centers Deployment](/best-practices/three-dc-local-read.md) + [Use Placement Rules](/configure-placement-rules.md) + [Use Load Base Split](/configure-load-base-split.md) + [Use Store Limit](/configure-store-limit.md) diff --git a/as-of-timestamp.md b/as-of-timestamp.md new file mode 100644 index 0000000000000..58d851d0cbd89 --- /dev/null +++ b/as-of-timestamp.md @@ -0,0 +1,261 @@ +--- +title: Read Historical Data Using the `AS OF TIMESTAMP` Clause +summary: Learn how to read historical data using the `AS OF TIMESTAMP` statement clause. +--- + +# Read Historical Data Using the `AS OF TIMESTAMP` Clause + +This document describes how to perform the [Stale Read](/stale-read.md) feature using the `AS OF TIMESTAMP` clause to read historical data in TiDB, including specific usage examples and strategies for saving historical data. + +TiDB supports reading historical data through a standard SQL interface, which is the `AS OF TIMESTAMP` SQL clause, without the need for special clients or drivers. After data is updated or deleted, you can read the historical data before the update or deletion using this SQL interface. + +> **Note:** +> +> When reading historical data, TiDB returns the data with the old table structure even if the current table structure is different. + +## Syntax + +You can use the `AS OF TIMESTAMP` clause in the following three ways: + +- [`SELECT ... FROM ... AS OF TIMESTAMP`](/sql-statements/sql-statement-select.md) +- [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) +- [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) + +If you want to specify an exact point of time, you can set a datetime value or use a time function in the `AS OF TIMESTAMP` clause. The format of datetime is like "2016-10-08 16:45:26.999", with millisecond as the minimum time unit, but for most of the time, the time unit of second is enough for specifying a datetime, such as "2016-10-08 16:45:26". You can also get the current time to the millisecond using the `NOW(3)` function. If you want to read the data of several seconds ago, it is **recommended** to use an expression such as `NOW() - INTERVAL 10 SECOND`. + +If you want to specify a time range, you can use the `TIDB_BOUNDED_STALENESS()` function in the clause. When this function is used, TiDB selects a suitable timestamp within the specified time range. "Suitable" means there are no transactions that start before this timestamp and have not been committed on the accessed replica, that is, TiDB can perform read operations on the accessed replica and the read operations are not blocked. You need to use `TIDB_BOUNDED_STALENESS(t1, t2)` to call this function. `t1` and `t2` are the two ends of the time range, which can be specified using either datetime values or time functions. + +Here are some examples of the `AS OF TIMESTAMP` clause: + +- `AS OF TIMESTAMP '2016-10-08 16:45:26'`: Tells TiDB to read the latest data stored at 16:45:26 on October 8, 2016. +- `AS OF TIMESTAMP NOW() - INTERVAL 10 SECOND`: Tells TiDB to read the latest data stored 10 seconds ago. +- `AS OF TIMESTAMP TIDB_BOUNDED_STALENESS('2016-10-08 16:45:26', '2016-10-08 16:45:29')`: Tells TiDB to read the data as new as possible within the time range of 16:45:26 to 16:45:29 on October 8, 2016. +- `AS OF TIMESTAMP TIDB_BOUNDED_STALENESS(NOW() - INTERVAL 20 SECOND, NOW())`: Tells TiDB to read the data as new as possible within the time range of 20 seconds ago to the present. + +Note that in addition to specifying a timestamp, the most common use of the `AS OF TIMESTAMP` clause is to read data that is several seconds old. If this approach is used, it is recommended to read historical data older than 5 seconds. + +## Usage examples + +This section describes different ways to use the `AS OF TIMESTAMP` clause with several examples. It first introduces how to prepare the data for recovery, and then shows how to use `AS OF TIMESTAMP` in `SELECT`, `START TRANSACTION READ ONLY AS OF TIMESTAMP`, and `SET TRANSACTION READ ONLY AS OF TIMESTAMP` respectively. + +### Prepare data sample + +To prepare data for recovery, create a table first and insert several rows of data: + +```sql +create table t (c int); +``` + +``` +Query OK, 0 rows affected (0.01 sec) +``` + +```sql +insert into t values (1), (2), (3); +``` + +``` +Query OK, 3 rows affected (0.00 sec) +``` + +View the data in the table: + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +View the current time: + +```sql +select now(); +``` + +``` ++---------------------+ +| now() | ++---------------------+ +| 2021-05-26 16:45:26 | ++---------------------+ +1 row in set (0.00 sec) +``` + +Update the data in a row: + +```sql +update t set c=22 where c=2; +``` + +``` +Query OK, 1 row affected (0.00 sec) +``` + +Confirm that the data of the row is updated: + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 22 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +### Read historical data using the `SELECT` statement + +You can use the [`SELECT ... FROM ... AS OF TIMESTAMP`](/sql-statements/sql-statement-select.md) statement to read data from a time point in the past. + +```sql +select * from t as of timestamp '2021-05-26 16:45:26'; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +> **Note:** +> +> When reading multiple tables using one `SELECT` statement, you need to make sure that the format of TIMESTAMP EXPRESSIONs is consistent. For example, `select * from t as of timestamp NOW() - INTERVAL 2 SECOND, c as of timestamp NOW() - INTERVAL 2 SECOND;`. In addition, you must specify the `AS OF` information for the relevant table in the `SELECT` statement; otherwise, the `SELECT` statement reads the latest data by default. + +### Read historical data using the `START TRANSACTION READ ONLY AS OF TIMESTAMP` statement + +You can use the [`START TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-start-transaction.md) statement to start a read-only transaction based on a time point in the past. The transaction reads historical data of the given time. + +```sql +start transaction read only as of timestamp '2021-05-26 16:45:26'; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +```sql +commit; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +After the transaction is committed, you can read the latest data. + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 22 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +> **Note:** +> +> If you start a transaction with the statement `START TRANSACTION READ ONLY AS OF TIMESTAMP`, it is a read-only transaction. Write operations are rejected in this transaction. + +### Read historical data using the `SET TRANSACTION READ ONLY AS OF TIMESTAMP` statement + +You can use the [`SET TRANSACTION READ ONLY AS OF TIMESTAMP`](/sql-statements/sql-statement-set-transaction.md) statement to set the next transaction as a read-only transaction based on a specified time point in the past. The transaction reads historical data of the given time. + +```sql +set transaction read only as of timestamp '2021-05-26 16:45:26'; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +```sql +begin; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +```sql +commit; +``` + +``` +Query OK, 0 rows affected (0.00 sec) +``` + +After the transaction is committed, you can read the latest data. + +```sql +select * from t; +``` + +``` ++------+ +| c | ++------+ +| 1 | +| 22 | +| 3 | ++------+ +3 rows in set (0.00 sec) +``` + +> **Note:** +> +> If you start a transaction with the statement `SET TRANSACTION READ ONLY AS OF TIMESTAMP`, it is a read-only transaction. Write operations are rejected in this transaction. diff --git a/best-practices/three-dc-local-read.md b/best-practices/three-dc-local-read.md new file mode 100644 index 0000000000000..b404757df4ed6 --- /dev/null +++ b/best-practices/three-dc-local-read.md @@ -0,0 +1,29 @@ +--- +title: Local Read under Three Data Centers Deployment +summary: Learn how to use the Stale Read feature to read local data under three DCs deployment and thus reduce cross-center requests. +--- + +# Local Read under Three Data Centers Deployment + +In the model of three data centers, a Region has three replicas which are isolated in each data center. However, due to the requirement of strongly consistent read, TiDB must access the Leader replica of the corresponding data for every query. If the query is generated in a data center different from that of the Leader replica, TiDB needs to read data from another data center, thus causing the access latency to increase. + +This document describes how to use the [Stale Read](/stale-read.md) feature to avoid cross-center access and reduce the access latency at the expense of real-time data availability. + +## Deploy a TiDB cluster of three data centers + +For the three-data-center deployment method, refer to [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md). + +Note that if both the TiKV and TiDB nodes have the configuration item `labels` configured, the TiKV and TiDB nodes in the same data center must have the same value for the `zone` label. For example, if a TiKV node and a TiDB node are both in the data center `dc-1`, then the two nodes need to be configured with the following label: + +``` +[labels] +zone=dc-1 +``` + +## Perform local read using Stale Read + +[Stale Read](/stale-read.md) is a mechanism that TiDB provides for the users to read historical data. Using this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. When using Stale Read in some scenarios of geo-distributed deployment, TiDB accesses the replica in the current data center to read the corresponding data at the expense of some real-time performance, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query process. + +When TiDB receives a Stale Read query, if the `zone` label of that TiDB node is configured, then TiDB sends the request to the TiKV node with the same `zone` label where the corresponding data replica resides. + +For how to perform Stale Read, see [Perform Stale Read using the `AS OF TIMESTAMP` clause](/as-of-timestamp.md). diff --git a/read-historical-data.md b/read-historical-data.md index 5d1a6f8651299..823955239f294 100644 --- a/read-historical-data.md +++ b/read-historical-data.md @@ -1,18 +1,24 @@ --- -title: Read Historical Data -summary: Learn about how TiDB reads data from history versions. +title: Read Historical Data Using the System Variable `tidb_snapshot` +summary: Learn about how TiDB reads data from history versions using the system variable `tidb_snapshot`. --- -# Read Historical Data +# Read Historical Data Using the System Variable `tidb_snapshot` -This document describes how TiDB reads data from the history versions, how TiDB manages the data versions, as well as an example to show how to use the feature. +This document describes how to read data from the history versions using the system variable `tidb_snapshot`, including specific usage examples and strategies for saving historical data. + +> **Note:** +> +> You can also use the [Stale Read](/stale-read.md) feature to read historical data, which is more recommended. ## Feature description -TiDB implements a feature to read history data using the standard SQL interface directly without special clients or drivers. By using this feature: +TiDB implements a feature to read history data using the standard SQL interface directly without special clients or drivers. -- Even when data is updated or removed, its history versions can be read using the SQL interface. -- Even if the table structure changes after the data is updated, TiDB can use the old structure to read the history data. +> **Note:** +> +> - Even when data is updated or removed, its history versions can be read using the SQL interface. +> - When reading historical data, TiDB returns the data with the old table structure even if the current table structure is different. ## How TiDB reads data from history versions diff --git a/sql-statements/sql-statement-select.md b/sql-statements/sql-statement-select.md index 37492d3c4330b..395ddc78bb245 100644 --- a/sql-statements/sql-statement-select.md +++ b/sql-statements/sql-statement-select.md @@ -31,7 +31,13 @@ The `SELECT` statement is used to read data from TiDB. **TableRefsClause:** -![TableRefsClause](/media/sqlgram/TableRefsClause.png) +```ebnf+diagram +TableRefsClause ::= + TableRef AsOfClause? ( ',' TableRef AsOfClause? )* + +AsOfClause ::= + 'AS' 'OF' 'TIMESTAMP' Expression +``` **WhereClauseOptional:** diff --git a/sql-statements/sql-statement-set-transaction.md b/sql-statements/sql-statement-set-transaction.md index e3801abfb0fac..acb08f235404a 100644 --- a/sql-statements/sql-statement-set-transaction.md +++ b/sql-statements/sql-statement-set-transaction.md @@ -9,17 +9,22 @@ The `SET TRANSACTION` statement can be used to change the current isolation leve ## Synopsis -**SetStmt:** +```ebnf+diagram +SetStmt ::= + 'SET' ( VariableAssignmentList | + 'PASSWORD' ('FOR' Username)? '=' PasswordOpt | + ( 'GLOBAL'| 'SESSION' )? 'TRANSACTION' TransactionChars | + 'CONFIG' ( Identifier | stringLit) ConfigItemName EqOrAssignmentEq SetExpr ) -![SetStmt](/media/sqlgram/SetStmt.png) +TransactionChars ::= + ( 'ISOLATION' 'LEVEL' IsolationLevel | 'READ' 'WRITE' | 'READ' 'ONLY' AsOfClause? ) -**TransactionChar:** +IsolationLevel ::= + ( 'REPEATABLE' 'READ' | 'READ' ( 'COMMITTED' | 'UNCOMMITTED' ) | 'SERIALIZABLE' ) -![TransactionChar](/media/sqlgram/TransactionChar.png) - -**IsolationLevel:** - -![IsolationLevel](/media/sqlgram/IsolationLevel.png) +AsOfClause ::= + ( 'AS' 'OF' 'TIMESTAMP' Expression) +``` ## Examples diff --git a/sql-statements/sql-statement-start-transaction.md b/sql-statements/sql-statement-start-transaction.md index b1d2260e2d1ed..a885dc24912b2 100644 --- a/sql-statements/sql-statement-start-transaction.md +++ b/sql-statements/sql-statement-start-transaction.md @@ -14,9 +14,12 @@ In the absence of a `START TRANSACTION` statement, every statement will by defau **BeginTransactionStmt:** ```ebnf+diagram -BeginTransactionStmt ::= +BeginTransactionStmt ::= 'BEGIN' ( 'PESSIMISTIC' | 'OPTIMISTIC' )? -| 'START' 'TRANSACTION' ( 'READ' ( 'WRITE' | 'ONLY' ( 'WITH' 'TIMESTAMP' 'BOUND' TimestampBound )? ) | 'WITH' 'CONSISTENT' 'SNAPSHOT' | 'WITH' 'CAUSAL' 'CONSISTENCY' 'ONLY' )? +| 'START' 'TRANSACTION' ( 'READ' ( 'WRITE' | 'ONLY' ( ( 'WITH' 'TIMESTAMP' 'BOUND' TimestampBound )? | AsOfClause ) ) | 'WITH' 'CONSISTENT' 'SNAPSHOT' | 'WITH' 'CAUSAL' 'CONSISTENCY' 'ONLY' )? + +AsOfClause ::= + ( 'AS' 'OF' 'TIMESTAMP' Expression) ``` ## Examples diff --git a/stale-read.md b/stale-read.md new file mode 100644 index 0000000000000..00ee16e10e35c --- /dev/null +++ b/stale-read.md @@ -0,0 +1,25 @@ +--- +title: Usage Scenarios of Stale Read +summary: Learn about Stale Read and its usage scenarios. +--- + +# Usage Scenarios of Stale Read + +This document describes the usage scenarios of Stale Read. Stale Read is a mechanism that TiDB applies to read historical versions of data stored in TiDB. Using this mechanism, you can read the corresponding historical data of a specific point in time or within a specified time range, and thus save the latency brought by data replication between storage nodes. + +In terms of the internal implementation, Stale Read allows TiDB to read from any replica the data of the specified point in time or the data as new as possible within the specified time range, and to always ensure the data consistency constraint during the reading process. + +## Scenario examples + ++ Scenario one: If a transaction only involves read operations and is tolerant of data staleness to some extent, you can use Stale Read to get historical data. Using Stale Read, TiDB sends the query requests to any replica of the corresponding data at the expense of some real-time performance, and thus increases the throughput of query executions. + ++ Scenario two: In some scenarios where small tables are queried, if strongly consistent reads are used, data might be concentrated on a certain storage node, causing the query pressure to be concentrated on that node as well. Therefore, that node might become a bottleneck for the whole query. With Stale Read, TiDB distributes the query requests to each replica of the corresponding data, which can improve the overall query throughput and significantly improve the query performance. + ++ Scenario three: In some scenarios of geo-distributed deployment, if strongly consistent follower reads are used, to make sure that the data read from the Followers is consistent with that stored in the Leader, TiDB requests `Readindex` from different data centers for verification, which increases the access latency for the whole query process. With Stale Read, TiDB accesses the replica in the current data center to read the corresponding data at the expense of some real-time performance, which avoids network latency brought by cross-center connection and reduces the access latency for the entire query. For more information, see [Local Read under Three Data Centers Deployment](/best-practices/three-dc-local-read.md). + +## Usages + +In TiDB, you can specify either an exact point in time or a time range when performing stale reads: + +- Specifying an exact point in time (recommended): If you need TiDB to read data that follows the global transaction consistency from a specific point in time without damaging the isolation level, you can specify the corresponding timestamp of that point in time in the query statement. For detailed usage, see [`AS OF TIMESTAMP` Clause](/as-of-timestamp.md#syntax). +- Specifying a time range: If you need TiDB to read data as new as possible within a time range without damaging the isolation level, you can specify the time range in the query statement. Then, TiDB selects a suitable timestamp within the specified time range to read the corresponding data. "Suitable" means there are no transactions that start before this timestamp and have not been committed on the accessed replica, that is, TiDB can perform read operations on the accessed replica and the read operations are not blocked. For detailed usage, refer to the introduction of the [`AS OF TIMESTAMP` clause](/as-of-timestamp.md#syntax) and the [`TIDB_BOUNDED_STALENESS` function](/as-of-timestamp.md#syntax).