Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions TOC-tidb-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
- [Insert Data](/develop/dev-guide-insert-data.md)
- [Update Data](/develop/dev-guide-update-data.md)
- [Delete Data](/develop/dev-guide-delete-data.md)
- [Periodically Delete Expired Data Using TTL (Time to Live)](/time-to-live.md)
- [Prepared Statements](/develop/dev-guide-prepared-statement.md)
- Read Data
- [Query Data from a Single Table](/develop/dev-guide-get-data-from-single-table.md)
Expand Down
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
- [Insert Data](/develop/dev-guide-insert-data.md)
- [Update Data](/develop/dev-guide-update-data.md)
- [Delete Data](/develop/dev-guide-delete-data.md)
- [Periodically Delete Data Using Time to Live](/time-to-live.md)
- [Prepared Statements](/develop/dev-guide-prepared-statement.md)
- Read Data
- [Query Data from a Single Table](/develop/dev-guide-get-data-from-single-table.md)
Expand Down
2 changes: 1 addition & 1 deletion develop/dev-guide-delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ summary: Learn about the SQL syntax, best practices, and examples for deleting d

# Delete Data

This document describes how to use the [DELETE](/sql-statements/sql-statement-delete.md) SQL statement to delete the data in TiDB.
This document describes how to use the [DELETE](/sql-statements/sql-statement-delete.md) SQL statement to delete the data in TiDB. If you need to periodically delete expired data, use the [time to live](/time-to-live.md) feature.

## Before you start

Expand Down
1 change: 1 addition & 0 deletions experimental-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Elastic scheduling feature. It enables the TiDB cluster to dynamically scale out
+ [Cascades Planner](/system-variables.md#tidb_enable_cascades_planner): a cascades framework-based top-down query optimizer (Introduced in v3.0)
+ [Table Lock](/tidb-configuration-file.md#enable-table-lock-new-in-v400) (Introduced in v4.0.0)
+ [Range INTERVAL partitioning](/partitioned-table.md#range-interval-partitioning) (Introduced in v6.3.0)
+ [Time to live](/time-to-live.md) (Introduced in v6.5.0)
+ [TiFlash Query Result Materialization](/tiflash/tiflash-results-materialization.md) (Introduced in v6.5.0)
+ [Create a binding according to a historical execution plan](/sql-plan-management.md#create-a-binding-according-to-a-historical-execution-plan) (Introduced in v6.5.0)

Expand Down
4 changes: 4 additions & 0 deletions glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,7 @@ Top SQL helps locate SQL queries that contribute to a high load of a TiDB or TiK
### TSO

Because TiKV is a distributed storage system, it requires a global timing service, Timestamp Oracle (TSO), to assign a monotonically increasing timestamp. In TiKV, such a feature is provided by PD, and in Google [Spanner](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf), this feature is provided by multiple atomic clocks and GPS.

### TTL

[Time to live (TTL)](/time-to-live.md) is a feature that allows you to manage TiDB data lifetime at the row level. For a table with the TTL attribute, TiDB automatically checks data lifetime and deletes expired data at the row level.
7 changes: 7 additions & 0 deletions grafana-tidb-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,10 @@ To understand the key metrics displayed on the TiDB dashboard, check the followi
- Pending Request Count by TiKV: the number of Batch messages that are pending processing
- Batch Client Unavailable Duration 95: the unavailable time of the Batch client
- No Available Connection Counter: the number of times the Batch client cannot find an available link

- TTL
- TTL QPS By Type: the QPS information of different types of statements generated by TTL jobs.
- TTL Processed Rows Per Second: the number of expired rows processed by TTL jobs per second.
- TTL Scan/Delete Query Duration: the execution time of TTL scan/delete statements.
- TTL Scan/Delete Worker Time By Phase: the time consumed by different phases of TTL internal worker threads.
- TTL Job Count By Status: the number of TTL jobs currently being executed.
5 changes: 5 additions & 0 deletions sql-statements/sql-statement-alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ AlterTableSpec ::=
| 'SECONDARY_UNLOAD'
| ( 'AUTO_INCREMENT' | 'AUTO_ID_CACHE' | 'AUTO_RANDOM_BASE' | 'SHARD_ROW_ID_BITS' ) EqOpt LengthNum
| ( 'CACHE' | 'NOCACHE' )
| (
'TTL' EqOpt TimeColumnName '+' 'INTERVAL' Expression TimeUnit (TTLEnable EqOpt ( 'ON' | 'OFF' ))?
| 'REMOVE' 'TTL'
| TTLEnable EqOpt ( 'ON' | 'OFF' )
)
| PlacementPolicyOption

PlacementPolicyOption ::=
Expand Down
1 change: 1 addition & 0 deletions sql-statements/sql-statement-create-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ TableOption ::=
| 'SECONDARY_ENGINE' EqOpt ( 'NULL' | StringName )
| 'UNION' EqOpt '(' TableNameListOpt ')'
| 'ENCRYPTION' EqOpt EncryptionOpt
| 'TTL' EqOpt TimeColumnName '+' 'INTERVAL' Expression TimeUnit (TTLEnable EqOpt ( 'ON' | 'OFF' ))?
| PlacementPolicyOption

OnCommitOpt ::=
Expand Down
108 changes: 108 additions & 0 deletions system-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -3998,6 +3998,114 @@ For details, see [Identify Slow Queries](/identify-slow-queries.md).
>
> Suppose that the TSO RPC latency increases for reasons other than a CPU usage bottleneck of the PD leader (such as network issues). In this case, increasing the value of `tidb_tso_client_batch_max_wait_time` might increase the execution latency in TiDB and affect the QPS performance of the cluster.

### tidb_ttl_delete_rate_limit <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `0`
- Range: `[0, 9223372036854775807]`
- This variable is used to limit the rate of `DELETE` statements in TTL jobs on each TiDB node. The value represents the maximum number of `DELETE` statements allowed per second in a single node in a TTL job. When this variable is set to `0`, no limit is applied.

### tidb_ttl_delete_batch_size <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `100`
- Range: `[1, 10240]`
- This variable is used to set the maximum number of rows that can be deleted in a single `DELETE` transaction in a TTL job.

### tidb_ttl_delete_worker_count <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `4`
- Range: `[1, 256]`
- This variable is used to set the maximum concurrency of TTL jobs on each TiDB node.

### tidb_ttl_job_enable <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `ON`
- Type: Boolean
- This variable is used to control whether TTL jobs are enabled. If it is set to `OFF`, all tables with TTL attributes automatically stop cleaning up expired data.

### tidb_ttl_scan_batch_size <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `500`
- Range: `[1, 10240]`
- This variable is used to set the `LIMIT` value of each `SELECT` statement used to scan expired data in a TTL job.

### tidb_ttl_scan_worker_count <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `4`
- Range: `[1, 256]`
- This variable is used to set the maximum concurrency of TTL scan jobs on each TiDB node.

### tidb_ttl_job_run_interval <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Persists to cluster: Yes
- Default value: `1h0m0s`
- Range: `[10m0s, 8760h0m0s]`
- This variable is used to control the scheduling interval of TTL jobs in the background. For example, if the current value is set to `1h0m0s`, each table with TTL attributes cleans up expired data once every hour.

### tidb_ttl_job_schedule_window_start_time <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Type: Time
- Persists to cluster: Yes
- Default value: `00:00 +0000`
- This variable is used to control the start time of the scheduling window of TTL jobs in the background. When you modify the value of this variable, be cautious that a small window might cause the cleanup of expired data to fail.

### tidb_ttl_job_schedule_window_end_time <span class="version-mark">New in v6.5.0</span>

> **Warning:**
>
> [TTL](/time-to-live.md) is an experimental feature. This system variable might be changed or removed in future releases.

- Scope: GLOBAL
- Type: Time
- Persists to cluster: Yes
- Default value: `23:59 +0000`
- This variable is used to control the end time of the scheduling window of TTL jobs in the background. When you modify the value of this variable, be cautious that a small window might cause the cleanup of expired data to fail.

### tidb_txn_assertion_level <span class="version-mark">New in v6.0.0</span>

- Scope: SESSION | GLOBAL
Expand Down
186 changes: 186 additions & 0 deletions time-to-live.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
---
title: Periodically Delete Data Using Time to Live
summary: Use Time to Live to automatically expire and delete old data.
---

# Periodically Delete Expired Data Using TTL (Time to Live)

Time to live (TTL) is a feature that allows you to manage TiDB data lifetime at the row level. For a table with the TTL attribute, TiDB automatically checks data lifetime and deletes expired data at the row level. This feature can effectively save storage space and enhance performance in some scenarios.

The following are some common scenarios for TTL:

* Regularly delete verification codes and short URLs.
* Regularly delete unnecessary historical orders.
* Automatically delete intermediate results of calculations.

TTL is designed to help users clean up unnecessary data periodically and in a timely manner without affecting the online read and write workloads. TTL concurrently dispatches different jobs to different TiDB nodes to delete data in parallel in the unit of table. TTL does not guarantee that all expired data is deleted immediately, which means that even if some data is expired, the client might still read that data some time after the expiration time until that data is deleted by the background TTL job.

> **Warning:**
>
> This is an experimental feature. It is not recommended that you use it in a production environment.
> TTL is not available for [TiDB Cloud Serverless Tier](https://docs.pingcap.com/tidbcloud/select-cluster-tier#serverless-tier-beta).

## Syntax

You can configure the TTL attribute of a table using the [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) or [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) statement.

### Create a table with a TTL attribute

- Create a table with a TTL attribute:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP
) TTL = `created_at` + INTERVAL 3 MONTH;
```

The preceding example creates a table `t1` and specifies `created_at` as the TTL timestamp column, which indicates the creation time of the data. The example also sets the longest time that a row is allowed to live in the table to 3 months through `INTERVAL 3 MONTH`. Data that lives longer than this value will be deleted later.

- Set the `TTL_ENABLE` attribute to enable or disable the feature of cleaning up expired data:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP
) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF';
```

If `TTL_ENABLE` is set to `OFF`, even if other TTL options are set, TiDB does not automatically clean up expired data in this table. For a table with the TTL attribute, `TTL_ENABLE` is `ON` by default.

- To be compatible with MySQL, you can set a TTL attribute using a comment:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP
) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF'*/;
```

In TiDB, using the table TTL attribute or using comments to configure TTL is equivalent. In MySQL, the comment is ignored and an ordinary table is created.

### Modify the TTL attribute of a table

- Modify the TTL attribute of a table:

```sql
ALTER TABLE t1 TTL = `created_at` + INTERVAL 1 MONTH;
```

You can use the preceding statement to modify a table with an existing TTL attribute or to add a TTL attribute to a table without a TTL attribute.

- Modify the value of `TTL_ENABLE` for a table with the TTL attribute:

```sql
ALTER TABLE t1 TTL_ENABLE = 'OFF';
```

- To remove all TTL attributes of a table:

```sql
ALTER TABLE t1 REMOVE TTL;
```

### TTL and the default values of data types

You can use TTL together with [default values of the data types](/data-type-default-values.md). The following are two common usage examples:

* Use `DEFAULT CURRENT_TIMESTAMP` to specify the default value of a column as the current creation time and use this column as the TTL timestamp column. Records that were created 3 months ago are expired:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) TTL = `created_at` + INTERVAL 3 MONTH;
```

* Specify the default value of a column as the creation time or the latest update time and use this column as the TTL timestamp column. Records that have not been updated for 3 months are expired:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
) TTL = `created_at` + INTERVAL 3 MONTH;
```

### TTL and generated columns

You can use TTL together with [generated columns](/generated-columns.md) (experimental feature) to configure complex expiration rules. For example:

```sql
CREATE TABLE message (
id int PRIMARY KEY,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
image bool,
expire_at TIMESTAMP AS (IF(image,
created_at + INTERVAL 5 DAY,
created_at + INTERVAL 30 DAY
))
) TTL = `expire_at` + INTERVAL 0 DAY;
```

The preceding statement uses the `expire_at` column as the TTL timestamp column and sets the expiration time according to the message type. If the message is an image, it expires in 5 days. Otherwise, it expires in 30 days.

You can use TTL together with the [JSON type](/data-type-json.md). For example:

```sql
CREATE TABLE orders (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_info JSON,
created_at DATE AS (JSON_EXTRACT(order_info, '$.created_at')) VIRTUAL
) TTL = `created_at` + INTERVAL 3 month;
```

## TTL job

For each table with a TTL attribute, TiDB internally schedules a background job to clean up expired data. You can customize the execution period of these jobs by setting the [`tidb_ttl_job_run_interval`](/system-variables.md#tidb_ttl_job_run_interval-new-in-v650) global variable. The following example sets the background cleanup jobs to run once every 24 hours:

```sql
SET @@global.tidb_ttl_job_run_interval = '24h';
```

To disable the execution of TTL jobs, in addition to setting the `TTL_ENABLE='OFF'` table option, you can also disable the execution of TTL jobs in the entire cluster by setting the [`tidb_ttl_job_enable`](/system-variables.md#tidb_ttl_job_enable-new-in-v650) global variable:

```sql
SET @@global.tidb_ttl_job_enable = OFF;
```

In some scenarios, you might want to allow TTL jobs to run only in a certain time window. In this case, you can set the [`tidb_ttl_job_schedule_window_start_time`](/system-variables.md#tidb_ttl_job_schedule_window_start_time-new-in-v650) and [`tidb_ttl_job_schedule_window_end_time`](/system-variables.md#tidb_ttl_job_schedule_window_end_time-new-in-v650) global variables to specify the time window. For example:

```sql
SET @@global.tidb_ttl_job_schedule_window_start_time = '01:00 +0000';
SET @@global.tidb_ttl_job_schedule_window_end_time = '05:00 +0000';
```

The preceding statement allows TTL jobs to be scheduled only between 1:00 and 5:00 UTC. By default, the time window is set to `00:00 +0000` to `23:59 +0000`, which allows the jobs to be scheduled at any time.

## Monitoring metrics and charts

<CustomContent platform="tidb-cloud">

> **Note:**
>
> This section is only applicable to on-premises TiDB. Currently, TiDB Cloud does not provide TTL metrics.

</CustomContent>

TiDB collects runtime information about TTL periodically and provides visualized charts of these metrics in Grafana. You can see these metrics in the TiDB -> TTL panel in Grafana.

<CustomContent platform="tidb">

For details of the metrics, see the TTL section in [TiDB Monitoring Metrics](/grafana-tidb-dashboard.md).

</CustomContent>

## Compatibility with TiDB tools

As an experimental feature, the TTL feature is not compatible with data import and export tools, including BR, TiDB Lightning, and TiCDC.

## Limitations

Currently, the TTL feature has the following limitations:

* The TTL attribute cannot be set on temporary tables, including local temporary tables and global temporary tables.
* A table with the TTL attribute does not support being referenced by other tables as the primary table in a foreign key constraint.
* It is not guaranteed that all expired data is deleted immediately. The time when expired data is deleted depends on the scheduling interval and scheduling window of the background cleanup job.
* Currently, a single table can only run a cleanup job on a single TiDB node at a given time. This might cause performance bottlenecks in some scenarios (for example, when the table is extremely large). This issue will be optimized in future releases.