-
Notifications
You must be signed in to change notification settings - Fork 710
br: backup checkpoint #11459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
br: backup checkpoint #11459
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
f1b196b
br: backup checkpoint
shichun-0415 21a5e79
refine
shichun-0415 f86ff19
fix gc-safepoint
shichun-0415 fc0dbfb
fix file name
shichun-0415 1935c1f
address comment
shichun-0415 b8fac10
Apply suggestions from code review
shichun-0415 aca640b
Update br-checkpoint.md
shichun-0415 78e2f83
wording
shichun-0415 f5fdd54
Apply suggestions from code review
shichun-0415 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| --- | ||
| title: Checkpoint Backup | ||
| summary: Learn about the checkpoint backup feature, including its application scenarios, usage, and implementation details. | ||
| --- | ||
|
|
||
| # Checkpoint Backup | ||
|
|
||
| Snapshot backup might be interrupted due to recoverable errors, such as disk exhaustion and node crash. Before TiDB v6.5.0, data that is backed up before the interruption would be invalidated even after the error is addressed, and you need to start the backup from scratch. For large clusters, this incurs considerable extra cost. | ||
|
|
||
| In TiDB v6.5.0, Backup & Restore (BR) introduces the checkpoint backup feature to allow continuing an interrupted backup. This feature is enabled by default. After this feature is enabled, most data of the interrupted backup can be retained. | ||
|
|
||
| ## Application scenarios | ||
|
|
||
| If your TiDB cluster is large and cannot afford to back up again after a failure, you can use the checkpoint backup feature. The br command-line tool (hereinafter referred to as `br`) periodically records the shards that have been backed up. In this way, the next backup retry can use the backup progress close to the abnormal exit. | ||
|
|
||
| ## Usage limitations | ||
|
|
||
| Checkpoint backup relies on the GC mechanism and cannot recover all data that has been backed up. The following sections provide the details. | ||
|
|
||
| ### Backup retry must be prior to GC | ||
|
|
||
| During the backup, `br` periodically updates the `gc-safepoint` of the backup snapshot in PD to avoid data being garbage collected. When `br` exits, the `gc-safepoint` cannot be updated in time. As a result, before the next backup retry, the data might have been garbage collected. | ||
|
|
||
| To avoid this situation, `br` keeps the `gc-safepoint` for about one hour by default when `gcttl` is not specified. You can set the `gcttl` parameter to extend the retention period if needed . | ||
|
|
||
| The following example sets `gcttl` to 15 hours (54000 seconds) to extend the retention period of `gc-safepoint`: | ||
|
|
||
| ```shell | ||
| br backup full \ | ||
| --storage local:///br_data/ --pd "${PD_IP}:2379" \ | ||
| --gcttl 54000 | ||
| ``` | ||
|
|
||
| > **Note:** | ||
| > | ||
| > The `gc-safepoint` created before backup is deleted after the snapshot backup is completed. You do not need to delete it manually. | ||
|
|
||
| ### Some data needs to be backed up again | ||
|
|
||
| When `br` retries backup, some data that has been backed up might need to be backed up again, including the data being backed up and the data not recorded by the checkpoint. | ||
|
|
||
| - If the interruption is caused by an error, `br` will persist the meta information of the data backed up before exit. In this case, only the data being backed up needs to be backed up again in the next retry. | ||
|
|
||
| - If the `br` process is interrupted by the system, `br` cannot persist the meta information of the data backed up to the external storage. Since `br` persists the meta information every 30 seconds, data backed up in the last 30 seconds before interruption cannot be persisted and needs to be backed up again in the next retry. | ||
|
|
||
| ## Implementation details | ||
|
|
||
| During a snapshot backup, `br` encodes the tables into the corresponding key space, and generates backup RPC requests before sending them to TiKV nodes. After receiving the backup request, TiKV nodes back up the data within the requested range. Every time a TiKV node finishes backing up data of a Region, it returns the backup information of this range to `br`. | ||
|
|
||
| `br` records the information returned by TiKV nodes, which helps `br` get the key ranges that have been backed up. The checkpoint backup feature periodically uploads the new backup information to external storage so that the key ranges that have been backed up can be persisted. | ||
|
|
||
| When `br` retries the backup, it reads the key ranges that have been backed up from external storage, and compares them with the key ranges of the backup task. The differential data helps `br` to determine the key range that still needs to be backed up in checkpoint backup. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.