-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature] Support for cleaning the trash actively #6323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
29b72cf
parent c8c571af37193ee10b5437fd9b47b30c4b917d60
BiteTheDDDDt ce1ea45
add lock at start_trash_sweep for make it can re-entrant
BiteTheDDDDt d920b0a
Fix the bug generated when resolving conflicts.
BiteTheDDDDt 126f821
Change lock_guard to return directly when locked.
BiteTheDDDDt 9174151
use unique_lock
BiteTheDDDDt 50907b7
Merge branch 'master' into dev_clear
BiteTheDDDDt 7ea044c
Revert "use unique_lock"
BiteTheDDDDt 7508f1c
Merge branch 'dev_clear' of http://github.com/BiteTheDDDDt/incubator-…
BiteTheDDDDt 2d0cfa4
use unique_lock to avoid deadlock
BiteTheDDDDt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| --- | ||
| { | ||
| "title": "Disk Capacity Management", | ||
| "language": "en" | ||
| } | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # Disk Capacity Management | ||
|
|
||
| This document mainly introduces system parameters and processing strategies related to disk storage capacity. | ||
|
|
||
| If Doris' data disk capacity is not controlled, the process will hang because the disk is full. Therefore, we monitor the disk usage and remaining capacity, and control various operations in the Doris system by setting different warning levels, and try to avoid the situation where the disk is full. | ||
|
|
||
| ## Glossary | ||
|
|
||
| * FE:Doris Frontend Node. Responsible for metadata management and request access. | ||
| * BE:Doris Backend Node. Responsible for query execution and data storage. | ||
| * Data Dir:Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory. | ||
|
|
||
| ## Basic Principles | ||
|
|
||
| BE will report disk usage to FE on a regular basis (every minute). FE records these statistical values and restricts various operation requests based on these statistical values. | ||
|
|
||
| Two thresholds, **High Watermark** and **Flood Stage**, are set in FE. Flood Stage is higher than High Watermark. When the disk usage is higher than High Watermark, Doris will restrict the execution of certain operations (such as replica balancing, etc.). If it is higher than Flood Stage, certain operations (such as load data) will be prohibited. | ||
|
|
||
| At the same time, a **Flood Stage** is also set on the BE. Taking into account that FE cannot fully detect the disk usage on BE in a timely manner, and cannot control certain BE operations (such as Compaction). Therefore, Flood Stage on the BE is used for the BE to actively refuse and stop certain operations to achieve the purpose of self-protection. | ||
|
|
||
| ## FE Parameter | ||
|
|
||
| **High Watermark:** | ||
|
|
||
| ``` | ||
| storage_high_watermark_usage_percent: default value is 85 (85%). | ||
| storage_min_left_capacity_bytes: default value is 2GB. | ||
| ``` | ||
|
|
||
| When disk capacity **more than** `storage_high_watermark_usage_percent`, **or** disk free capacity **less than** `storage_min_left_capacity_bytes`, the disk will no longer be used as the destination path for the following operations: | ||
|
|
||
| * Tablet Balance | ||
| * Colocation Relocation | ||
| * Decommission | ||
|
|
||
| **Flood Stage:** | ||
|
|
||
| ``` | ||
| storage_flood_stage_usage_percent: default value is 95 (95%). | ||
| storage_flood_stage_left_capacity_bytes: default value is 1GB. | ||
| ``` | ||
|
|
||
| When disk capacity **more than** `storage_flood_stage_usage_percent`, **or** disk free capacity **less than** `storage_flood_stage_left_capacity_bytes`, the disk will no longer be used as the destination path for the following operations: | ||
|
|
||
| * Tablet Balance | ||
| * Colocation Relocation | ||
| * Replica make up | ||
| * Restore | ||
| * Load/Insert | ||
|
|
||
| ## BE Parameter | ||
|
|
||
| **Flood Stage:** | ||
|
|
||
| ``` | ||
| capacity_used_percent_flood_stage: default value is 95 (95%). | ||
| capacity_min_left_bytes_flood_stage: default value is 1GB. | ||
| ``` | ||
|
|
||
| When disk capacity **more than** `storage_flood_stage_usage_percent`, **and** disk free capacity **less than** `storage_flood_stage_left_capacity_bytes`, the following operations on this disk will be prohibited: | ||
|
|
||
| * Base/Cumulative Compaction | ||
| * Data load | ||
| * Clone Task (Usually occurs when the replica is repaired or balanced.) | ||
| * Push Task (Occurs during the Loading phase of Hadoop import, and the file is downloaded. ) | ||
| * Alter Task (Schema Change or Rollup Task.) | ||
| * Download Task (The Downloading phase of the recovery operation.) | ||
|
|
||
| ## Disk Capacity Release | ||
|
|
||
| When the disk capacity is higher than High Watermark or even Flood Stage, many operations will be prohibited. At this time, you can try to reduce the disk usage and restore the system in the following ways. | ||
|
|
||
| * Delete table or partition | ||
|
|
||
| By deleting tables or partitions, you can quickly reduce the disk space usage and restore the cluster. | ||
| **Note: Only the `DROP` operation can achieve the purpose of quickly reducing the disk space usage, the `DELETE` operation cannot.** | ||
|
|
||
| ``` | ||
| DROP TABLE tbl; | ||
| ALTER TABLE tbl DROP PARTITION p1; | ||
| ``` | ||
|
|
||
| * BE expansion | ||
|
|
||
| After backend expansion, data tablets will be automatically balanced to BE nodes with lower disk usage. The expansion operation will make the cluster reach a balanced state in a few hours or days depending on the amount of data and the number of nodes. | ||
|
|
||
| * Modify replica of a table or partition | ||
|
|
||
| You can reduce the number of replica of a table or partition. For example, the default 3 replica can be reduced to 2 replica. Although this method reduces the reliability of the data, it can quickly reduce the disk usage rate and restore the cluster to normal. | ||
| This method is usually used in emergency recovery systems. Please restore the number of copies to 3 after reducing the disk usage rate by expanding or deleting data after recovery. | ||
| Modifying the replica operation takes effect instantly, and the backends will automatically and asynchronously delete the redundant replica. | ||
|
|
||
| ``` | ||
| ALTER TABLE tbl MODIFY PARTITION p1 SET("replication_num" = "2"); | ||
| ``` | ||
|
|
||
| * Delete unnecessary files | ||
|
|
||
| When the BE has crashed because the disk is full and cannot be started (this phenomenon may occur due to untimely detection of FE or BE), you need to delete some temporary files in the data directory to ensure that the BE process can start. | ||
| Files in the following directories can be deleted directly: | ||
|
|
||
| * log/:Log files in the log directory. | ||
| * snapshot/: Snapshot files in the snapshot directory. | ||
| * trash/ Trash files in the trash directory. | ||
|
|
||
| **This operation will affect [Restore data from BE Recycle Bin](./tablet-restore-tool.md).** | ||
|
|
||
| If the BE can still be started, you can use `ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. | ||
|
|
||
|
|
||
| If you do not manually execute `ADMIN CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: | ||
| * If the disk usage does not reach 90% of the **Flood Stage**, expired trash files and expired snapshot files will be cleaned up. At this time, some recent files will be retained without affecting the recovery of data. | ||
| * If the disk usage has reached 90% of the **Flood Stage**, **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. | ||
|
|
||
| The time interval for automatic execution can be changed by `max_garbage_sweep_interval` and `max_garbage_sweep_interval` in the configuration items. | ||
|
|
||
| When the recovery fails due to lack of trash files, the following results may be returned: | ||
|
|
||
| ``` | ||
| {"status": "Fail","msg": "can find tablet path in trash"} | ||
| ``` | ||
|
|
||
| * Delete data file (dangerous!!!) | ||
|
|
||
| When none of the above operations can free up capacity, you need to delete data files to free up space. The data file is in the `data/` directory of the specified data directory. To delete a tablet, you must first ensure that at least one replica of the tablet is normal, otherwise **deleting the only replica will result in data loss**. | ||
|
|
||
| Suppose we want to delete the tablet with id 12345: | ||
|
|
||
| * Find the directory corresponding to Tablet, usually under `data/shard_id/tablet_id/`. like: | ||
|
|
||
| ```data/0/12345/``` | ||
|
|
||
| * Record the tablet id and schema hash. The schema hash is the name of the next-level directory of the previous step. The following is 352781111: | ||
|
|
||
| ```data/0/12345/352781111``` | ||
|
|
||
| * Delete the data directory: | ||
|
|
||
| ```rm -rf data/0/12345/``` | ||
|
|
||
| * Delete tablet metadata (refer to [Tablet metadata management tool](./tablet-meta-tool.md)) | ||
|
|
||
| ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` |
47 changes: 47 additions & 0 deletions
47
docs/en/sql-reference/sql-statements/Administration/ADMIN CLEAN TRASH.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| --- | ||
| { | ||
| "title": "ADMIN CLEAN TRASH", | ||
| "language": "en" | ||
| } | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # ADMIN CLEAN TRASH | ||
| ## description | ||
| This statement is used to clean up the trash data in the backend. | ||
| Grammar: | ||
| ADMIN CLEAN TRASH [ON ("BackendHost1:BackendHeartBeatPort1", "BackendHost2:BackendHeartBeatPort2", ...)]; | ||
|
|
||
| Explain: | ||
| Take BackendHost:BackendHeartBeatPort to indicate the backend that needs to be cleaned up, and clean up all backends without adding the on limit. | ||
|
|
||
| ## example | ||
|
|
||
| 1. Clean up the trash data of all be nodes. | ||
|
|
||
| ADMIN CLEAN TRASH; | ||
|
|
||
| 2. Clean up the trash data of '192.168.0.1:9050' and '192.168.0.2:9050'. | ||
|
|
||
| ADMIN CLEAN TRASH ON ("192.168.0.1:9050","192.168.0.2:9050"); | ||
|
|
||
| ## keyword | ||
| ADMIN, CLEAN, TRASH |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may takes a very long time to clean the trash. So I suggest to use a async call.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already async, because of I use
onewayto define the function at thrift file.gensrc/thrift/BackendService.thriftoneway void clean_trash();