Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions import-example-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ CREATE TABLE trips (
You can import files individually using the example `LOAD DATA` command here, or import all files using the bash loop below:

```sql
SET tidb_dml_batch_size = 20000;
LOAD DATA LOCAL INFILE '2017Q1-capitalbikeshare-tripdata.csv' INTO TABLE trips
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
Expand All @@ -61,8 +62,8 @@ end_station_number, end_station, bike_number, member_type);
To import all `*.csv` files into TiDB in a bash loop:

```bash
for FILE in `ls *.csv`; do
for FILE in *.csv; do
echo "== $FILE =="
mysql bikeshare --local-infile=1 -e "LOAD DATA LOCAL INFILE '${FILE}' INTO TABLE trips FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);"
mysql bikeshare --local-infile=1 -e "SET tidb_dml_batch_size = 20000; LOAD DATA LOCAL INFILE '${FILE}' INTO TABLE trips FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);"
done;
```
7 changes: 5 additions & 2 deletions sql-statements/sql-statement-load-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,16 @@ In the above example, `x'2c'` is the hexadecimal representation of the `,` chara

## MySQL compatibility

* TiDB will by default commit every 20 000 rows. This behavior is similar to MySQL NDB Cluster, but not the default configuration with the InnoDB storage engine.
This statement is understood to be fully compatible with MySQL. Any compatibility differences should be [reported via an issue](https://github.com/pingcap/tidb/issues/new/choose) on GitHub.

> **Note:**
>
> Committing through splitting a transaction is at the expense of breaking the atomicity and isolation of the transaction. When performing this operation, you must ensure that there are **no other** ongoing operations on the table. When an error occurs, **manual intervention is required to check the consistency and integrity of the data**. Therefore, it is not recommended to use `LOAD DATA` on any tables which are actively being read from or written to.
> In earlier releases of TiDB, `LOAD DATA` committed every 20000 rows. By default, TiDB now commits all rows in one transaction. This can result in the error `ERROR 8004 (HY000) at line 1: Transaction is too large, size: 100000058` after upgrading from TiDB 4.0 or earlier versions.
>
> The recommended way to resolve this error is to increase the `txn-total-size-limit` value in your `tidb.toml` file. If you are unable to increase this limit, you can also restore the previous behavior by setting [`tidb_dml_batch_size`](/system-variables.md#tidb_dml_batch_size) to `20000`.

## See also

* [INSERT](/sql-statements/sql-statement-insert.md)
* [Import Example Database](/import-example-data.md)
* [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md)
10 changes: 9 additions & 1 deletion system-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,14 @@ Constraint checking is always performed in place for pessimistic transactions (d
- Use a bigger value in OLAP scenarios, and a smaller value in OLTP scenarios.
- For OLAP scenarios, the maximum value cannot exceed the number of CPU cores of all the TiKV nodes.

### tidb_dml_batch_size

- Scope: SESSION
- Default value: 0
- Example value: 20000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nullnotnil Could you explain what example value is? I find it confusing when reviewing the translation of this PR. From the context, 20000 does not appear in this document but in sql-statement-load-data.md . Did you mean the example in sql-statement-load-data.md ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked 20000 as an example value because this is the prior batch size for LOAD DATA before it was changed to not be batched in master.

I think it is potentially useful for other sysvars, because the min/max values may be a very wide range.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say In earlier releases of TiDB, the default batch size for LOAD DATA was 20000? Here the example seems a little confusing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove it. I don't think people will know to look up this value when they get an error, they will look up the reference for LOAD DATA, which mentions the previous default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in #3977.

- When this value is greater than `0`, TiDB will batch commit statements such as `INSERT` or `LOAD DATA` into smaller transactions. This reduces memory usage and helps ensure that the `txn-total-size-limit` is not reached by bulk modifications.
- Only the value `0` provides ACID compliance. Setting this to any other value will break the atomicity and isolation guarantees of TiDB.

### tidb_enable_cascades_planner

- Scope: SESSION | GLOBAL
Expand Down Expand Up @@ -819,7 +827,7 @@ SET tidb_slow_log_threshold = 200;

- Scope: SESSION | GLOBAL
- Default value: SYSTEM
- This variable sets the sytem time zone. Values can be specified as either an offset such as '-8:00' or a named zone 'America/Los_Angeles'.
- This variable sets the system time zone. Values can be specified as either an offset such as '-8:00' or a named zone 'America/Los_Angeles'.

### transaction_isolation

Expand Down