diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index cdaba3200f270..a6b712cbd0869 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -34,6 +34,7 @@ - [Insert Data](/develop/dev-guide-insert-data.md) - [Update Data](/develop/dev-guide-update-data.md) - [Delete Data](/develop/dev-guide-delete-data.md) + - [Periodically Delete Expired Data Using TTL (Time to Live)](/time-to-live.md) - [Prepared Statements](/develop/dev-guide-prepared-statement.md) - Read Data - [Query Data from a Single Table](/develop/dev-guide-get-data-from-single-table.md) diff --git a/TOC.md b/TOC.md index 85151f07ee3fe..1299221a09bcc 100644 --- a/TOC.md +++ b/TOC.md @@ -38,6 +38,7 @@ - [Insert Data](/develop/dev-guide-insert-data.md) - [Update Data](/develop/dev-guide-update-data.md) - [Delete Data](/develop/dev-guide-delete-data.md) + - [Periodically Delete Data Using Time to Live](/time-to-live.md) - [Prepared Statements](/develop/dev-guide-prepared-statement.md) - Read Data - [Query Data from a Single Table](/develop/dev-guide-get-data-from-single-table.md) diff --git a/develop/dev-guide-delete-data.md b/develop/dev-guide-delete-data.md index de61788c49341..f75817b9af95a 100644 --- a/develop/dev-guide-delete-data.md +++ b/develop/dev-guide-delete-data.md @@ -5,7 +5,7 @@ summary: Learn about the SQL syntax, best practices, and examples for deleting d # Delete Data -This document describes how to use the [DELETE](/sql-statements/sql-statement-delete.md) SQL statement to delete the data in TiDB. +This document describes how to use the [DELETE](/sql-statements/sql-statement-delete.md) SQL statement to delete the data in TiDB. If you need to periodically delete expired data, use the [time to live](/time-to-live.md) feature. ## Before you start diff --git a/experimental-features.md b/experimental-features.md index b250344d1d7e8..443083ba2ea7c 100644 --- a/experimental-features.md +++ b/experimental-features.md @@ -33,6 +33,7 @@ Elastic scheduling feature. It enables the TiDB cluster to dynamically scale out + [Cascades Planner](/system-variables.md#tidb_enable_cascades_planner): a cascades framework-based top-down query optimizer (Introduced in v3.0) + [Table Lock](/tidb-configuration-file.md#enable-table-lock-new-in-v400) (Introduced in v4.0.0) + [Range INTERVAL partitioning](/partitioned-table.md#range-interval-partitioning) (Introduced in v6.3.0) ++ [Time to live](/time-to-live.md) (Introduced in v6.5.0) + [TiFlash Query Result Materialization](/tiflash/tiflash-results-materialization.md) (Introduced in v6.5.0) + [Create a binding according to a historical execution plan](/sql-plan-management.md#create-a-binding-according-to-a-historical-execution-plan) (Introduced in v6.5.0) diff --git a/glossary.md b/glossary.md index 06e1c9e8a40e7..6021ef181eb53 100644 --- a/glossary.md +++ b/glossary.md @@ -154,3 +154,7 @@ Top SQL helps locate SQL queries that contribute to a high load of a TiDB or TiK ### TSO Because TiKV is a distributed storage system, it requires a global timing service, Timestamp Oracle (TSO), to assign a monotonically increasing timestamp. In TiKV, such a feature is provided by PD, and in Google [Spanner](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf), this feature is provided by multiple atomic clocks and GPS. + +### TTL + +[Time to live (TTL)](/time-to-live.md) is a feature that allows you to manage TiDB data lifetime at the row level. For a table with the TTL attribute, TiDB automatically checks data lifetime and deletes expired data at the row level. diff --git a/grafana-tidb-dashboard.md b/grafana-tidb-dashboard.md index 6d8694eedd6dc..77721fa241cd2 100644 --- a/grafana-tidb-dashboard.md +++ b/grafana-tidb-dashboard.md @@ -169,3 +169,10 @@ To understand the key metrics displayed on the TiDB dashboard, check the followi - Pending Request Count by TiKV: the number of Batch messages that are pending processing - Batch Client Unavailable Duration 95: the unavailable time of the Batch client - No Available Connection Counter: the number of times the Batch client cannot find an available link + +- TTL + - TTL QPS By Type: the QPS information of different types of statements generated by TTL jobs. + - TTL Processed Rows Per Second: the number of expired rows processed by TTL jobs per second. + - TTL Scan/Delete Query Duration: the execution time of TTL scan/delete statements. + - TTL Scan/Delete Worker Time By Phase: the time consumed by different phases of TTL internal worker threads. + - TTL Job Count By Status: the number of TTL jobs currently being executed. diff --git a/sql-statements/sql-statement-alter-table.md b/sql-statements/sql-statement-alter-table.md index faa271fab81d4..dbca16518dc45 100644 --- a/sql-statements/sql-statement-alter-table.md +++ b/sql-statements/sql-statement-alter-table.md @@ -48,6 +48,11 @@ AlterTableSpec ::= | 'SECONDARY_UNLOAD' | ( 'AUTO_INCREMENT' | 'AUTO_ID_CACHE' | 'AUTO_RANDOM_BASE' | 'SHARD_ROW_ID_BITS' ) EqOpt LengthNum | ( 'CACHE' | 'NOCACHE' ) +| ( + 'TTL' EqOpt TimeColumnName '+' 'INTERVAL' Expression TimeUnit (TTLEnable EqOpt ( 'ON' | 'OFF' ))? + | 'REMOVE' 'TTL' + | TTLEnable EqOpt ( 'ON' | 'OFF' ) + ) | PlacementPolicyOption PlacementPolicyOption ::= diff --git a/sql-statements/sql-statement-create-table.md b/sql-statements/sql-statement-create-table.md index 2236acce815ac..e9ccebe0efd65 100644 --- a/sql-statements/sql-statement-create-table.md +++ b/sql-statements/sql-statement-create-table.md @@ -82,6 +82,7 @@ TableOption ::= | 'SECONDARY_ENGINE' EqOpt ( 'NULL' | StringName ) | 'UNION' EqOpt '(' TableNameListOpt ')' | 'ENCRYPTION' EqOpt EncryptionOpt +| 'TTL' EqOpt TimeColumnName '+' 'INTERVAL' Expression TimeUnit (TTLEnable EqOpt ( 'ON' | 'OFF' ))? | PlacementPolicyOption OnCommitOpt ::= diff --git a/system-variables.md b/system-variables.md index f487323ae2736..3f6e7b9c42d8a 100644 --- a/system-variables.md +++ b/system-variables.md @@ -3998,6 +3998,114 @@ For details, see [Identify Slow Queries](/identify-slow-queries.md). > > Suppose that the TSO RPC latency increases for reasons other than a CPU usage bottleneck of the PD leader (such as network issues). In this case, increasing the value of `tidb_tso_client_batch_max_wait_time` might increase the execution latency in TiDB and affect the QPS performance of the cluster. +### tidb_ttl_delete_rate_limit New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `0` +- Range: `[0, 9223372036854775807]` +- This variable is used to limit the rate of `DELETE` statements in TTL jobs on each TiDB node. The value represents the maximum number of `DELETE` statements allowed per second in a single node in a TTL job. When this variable is set to `0`, no limit is applied. + +### tidb_ttl_delete_batch_size New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `100` +- Range: `[1, 10240]` +- This variable is used to set the maximum number of rows that can be deleted in a single `DELETE` transaction in a TTL job. + +### tidb_ttl_delete_worker_count New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `4` +- Range: `[1, 256]` +- This variable is used to set the maximum concurrency of TTL jobs on each TiDB node. + +### tidb_ttl_job_enable New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `ON` +- Type: Boolean +- This variable is used to control whether TTL jobs are enabled. If it is set to `OFF`, all tables with TTL attributes automatically stop cleaning up expired data. + +### tidb_ttl_scan_batch_size New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `500` +- Range: `[1, 10240]` +- This variable is used to set the `LIMIT` value of each `SELECT` statement used to scan expired data in a TTL job. + +### tidb_ttl_scan_worker_count New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `4` +- Range: `[1, 256]` +- This variable is used to set the maximum concurrency of TTL scan jobs on each TiDB node. + +### tidb_ttl_job_run_interval New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Default value: `1h0m0s` +- Range: `[10m0s, 8760h0m0s]` +- This variable is used to control the scheduling interval of TTL jobs in the background. For example, if the current value is set to `1h0m0s`, each table with TTL attributes cleans up expired data once every hour. + +### tidb_ttl_job_schedule_window_start_time New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Type: Time +- Persists to cluster: Yes +- Default value: `00:00 +0000` +- This variable is used to control the start time of the scheduling window of TTL jobs in the background. When you modify the value of this variable, be cautious that a small window might cause the cleanup of expired data to fail. + +### tidb_ttl_job_schedule_window_end_time New in v6.5.0 + +> **Warning:** +> +> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases. + +- Scope: GLOBAL +- Type: Time +- Persists to cluster: Yes +- Default value: `23:59 +0000` +- This variable is used to control the end time of the scheduling window of TTL jobs in the background. When you modify the value of this variable, be cautious that a small window might cause the cleanup of expired data to fail. + ### tidb_txn_assertion_level New in v6.0.0 - Scope: SESSION | GLOBAL diff --git a/time-to-live.md b/time-to-live.md new file mode 100644 index 0000000000000..c58d941ffce60 --- /dev/null +++ b/time-to-live.md @@ -0,0 +1,186 @@ +--- +title: Periodically Delete Data Using Time to Live +summary: Use Time to Live to automatically expire and delete old data. +--- + +# Periodically Delete Expired Data Using TTL (Time to Live) + +Time to live (TTL) is a feature that allows you to manage TiDB data lifetime at the row level. For a table with the TTL attribute, TiDB automatically checks data lifetime and deletes expired data at the row level. This feature can effectively save storage space and enhance performance in some scenarios. + +The following are some common scenarios for TTL: + +* Regularly delete verification codes and short URLs. +* Regularly delete unnecessary historical orders. +* Automatically delete intermediate results of calculations. + +TTL is designed to help users clean up unnecessary data periodically and in a timely manner without affecting the online read and write workloads. TTL concurrently dispatches different jobs to different TiDB nodes to delete data in parallel in the unit of table. TTL does not guarantee that all expired data is deleted immediately, which means that even if some data is expired, the client might still read that data some time after the expiration time until that data is deleted by the background TTL job. + +> **Warning:** +> +> This is an experimental feature. It is not recommended that you use it in a production environment. +> TTL is not available for [TiDB Cloud Serverless Tier](https://docs.pingcap.com/tidbcloud/select-cluster-tier#serverless-tier-beta). + +## Syntax + +You can configure the TTL attribute of a table using the [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) or [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) statement. + +### Create a table with a TTL attribute + +- Create a table with a TTL attribute: + + ```sql + CREATE TABLE t1 ( + id int PRIMARY KEY, + created_at TIMESTAMP + ) TTL = `created_at` + INTERVAL 3 MONTH; + ``` + + The preceding example creates a table `t1` and specifies `created_at` as the TTL timestamp column, which indicates the creation time of the data. The example also sets the longest time that a row is allowed to live in the table to 3 months through `INTERVAL 3 MONTH`. Data that lives longer than this value will be deleted later. + +- Set the `TTL_ENABLE` attribute to enable or disable the feature of cleaning up expired data: + + ```sql + CREATE TABLE t1 ( + id int PRIMARY KEY, + created_at TIMESTAMP + ) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF'; + ``` + + If `TTL_ENABLE` is set to `OFF`, even if other TTL options are set, TiDB does not automatically clean up expired data in this table. For a table with the TTL attribute, `TTL_ENABLE` is `ON` by default. + +- To be compatible with MySQL, you can set a TTL attribute using a comment: + + ```sql + CREATE TABLE t1 ( + id int PRIMARY KEY, + created_at TIMESTAMP + ) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF'*/; + ``` + + In TiDB, using the table TTL attribute or using comments to configure TTL is equivalent. In MySQL, the comment is ignored and an ordinary table is created. + +### Modify the TTL attribute of a table + +- Modify the TTL attribute of a table: + + ```sql + ALTER TABLE t1 TTL = `created_at` + INTERVAL 1 MONTH; + ``` + + You can use the preceding statement to modify a table with an existing TTL attribute or to add a TTL attribute to a table without a TTL attribute. + +- Modify the value of `TTL_ENABLE` for a table with the TTL attribute: + + ```sql + ALTER TABLE t1 TTL_ENABLE = 'OFF'; + ``` + +- To remove all TTL attributes of a table: + + ```sql + ALTER TABLE t1 REMOVE TTL; + ``` + +### TTL and the default values of data types + +You can use TTL together with [default values of the data types](/data-type-default-values.md). The following are two common usage examples: + +* Use `DEFAULT CURRENT_TIMESTAMP` to specify the default value of a column as the current creation time and use this column as the TTL timestamp column. Records that were created 3 months ago are expired: + + ```sql + CREATE TABLE t1 ( + id int PRIMARY KEY, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP + ) TTL = `created_at` + INTERVAL 3 MONTH; + ``` + +* Specify the default value of a column as the creation time or the latest update time and use this column as the TTL timestamp column. Records that have not been updated for 3 months are expired: + + ```sql + CREATE TABLE t1 ( + id int PRIMARY KEY, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + ) TTL = `created_at` + INTERVAL 3 MONTH; + ``` + +### TTL and generated columns + +You can use TTL together with [generated columns](/generated-columns.md) (experimental feature) to configure complex expiration rules. For example: + +```sql +CREATE TABLE message ( + id int PRIMARY KEY, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + image bool, + expire_at TIMESTAMP AS (IF(image, + created_at + INTERVAL 5 DAY, + created_at + INTERVAL 30 DAY + )) +) TTL = `expire_at` + INTERVAL 0 DAY; +``` + +The preceding statement uses the `expire_at` column as the TTL timestamp column and sets the expiration time according to the message type. If the message is an image, it expires in 5 days. Otherwise, it expires in 30 days. + +You can use TTL together with the [JSON type](/data-type-json.md). For example: + +```sql +CREATE TABLE orders ( + id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, + order_info JSON, + created_at DATE AS (JSON_EXTRACT(order_info, '$.created_at')) VIRTUAL +) TTL = `created_at` + INTERVAL 3 month; +``` + +## TTL job + +For each table with a TTL attribute, TiDB internally schedules a background job to clean up expired data. You can customize the execution period of these jobs by setting the [`tidb_ttl_job_run_interval`](/system-variables.md#tidb_ttl_job_run_interval-new-in-v650) global variable. The following example sets the background cleanup jobs to run once every 24 hours: + +```sql +SET @@global.tidb_ttl_job_run_interval = '24h'; +``` + +To disable the execution of TTL jobs, in addition to setting the `TTL_ENABLE='OFF'` table option, you can also disable the execution of TTL jobs in the entire cluster by setting the [`tidb_ttl_job_enable`](/system-variables.md#tidb_ttl_job_enable-new-in-v650) global variable: + +```sql +SET @@global.tidb_ttl_job_enable = OFF; +``` + +In some scenarios, you might want to allow TTL jobs to run only in a certain time window. In this case, you can set the [`tidb_ttl_job_schedule_window_start_time`](/system-variables.md#tidb_ttl_job_schedule_window_start_time-new-in-v650) and [`tidb_ttl_job_schedule_window_end_time`](/system-variables.md#tidb_ttl_job_schedule_window_end_time-new-in-v650) global variables to specify the time window. For example: + +```sql +SET @@global.tidb_ttl_job_schedule_window_start_time = '01:00 +0000'; +SET @@global.tidb_ttl_job_schedule_window_end_time = '05:00 +0000'; +``` + +The preceding statement allows TTL jobs to be scheduled only between 1:00 and 5:00 UTC. By default, the time window is set to `00:00 +0000` to `23:59 +0000`, which allows the jobs to be scheduled at any time. + +## Monitoring metrics and charts + + + +> **Note:** +> +> This section is only applicable to on-premises TiDB. Currently, TiDB Cloud does not provide TTL metrics. + + + +TiDB collects runtime information about TTL periodically and provides visualized charts of these metrics in Grafana. You can see these metrics in the TiDB -> TTL panel in Grafana. + + + +For details of the metrics, see the TTL section in [TiDB Monitoring Metrics](/grafana-tidb-dashboard.md). + + + +## Compatibility with TiDB tools + +As an experimental feature, the TTL feature is not compatible with data import and export tools, including BR, TiDB Lightning, and TiCDC. + +## Limitations + +Currently, the TTL feature has the following limitations: + +* The TTL attribute cannot be set on temporary tables, including local temporary tables and global temporary tables. +* A table with the TTL attribute does not support being referenced by other tables as the primary table in a foreign key constraint. +* It is not guaranteed that all expired data is deleted immediately. The time when expired data is deleted depends on the scheduling interval and scheduling window of the background cleanup job. +* Currently, a single table can only run a cleanup job on a single TiDB node at a given time. This might cause performance bottlenecks in some scenarios (for example, when the table is extremely large). This issue will be optimized in future releases.