[Feature] Support for cleaning the trash actively#6323
[Feature] Support for cleaning the trash actively#6323yangzhg merged 9 commits intoapache:masterfrom
Conversation
|
Please enrich your commit msg |
|
|
||
| # CLEAN TRASH | ||
| ## description | ||
| 该语句用于清理 backend 内的垃圾数据。 |
There was a problem hiding this comment.
Will this statement clean up both trash and snapshot?
Will cleaning up the snapshot involve the snapshot being restored?
There was a problem hiding this comment.
Will this statement clean up both trash and snapshot?
Will cleaning up the snapshot involve the snapshot being restored?
Yes, this statement will clean up both trash and snapshot.
This statement will call StorageEngine::start_trash_sweep.
It will only clean up expired data (define at config::snapshot_expire_time_sec/config::trash_file_expire_time_sec).
And this function will be automatically called periodically, so I think its cleanup is harmless.
docs/.vuepress/sidebar/en.js
Outdated
| "SHOW MIGRATIONS", | ||
| "SHOW PLUGINS", | ||
| "SHOW TABLE STATUS", | ||
| "CLEAN TRASH", |
There was a problem hiding this comment.
You need to add both guide and sql reference
There was a problem hiding this comment.
You need to add both guide and sql reference
c31dd3e
I add description at /administrator-guide/operation/disk-capacity.md .
At the same time, I found that there is no corresponding English version of this document.
| TNetworkAddress address = null; | ||
| boolean ok = false; | ||
| try { | ||
| long start = System.currentTimeMillis(); |
There was a problem hiding this comment.
The variable start is not used later.
There was a problem hiding this comment.
The variable start is not used later.
fixed
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
morningman
left a comment
There was a problem hiding this comment.
How to force sweep trash?
I do some change at 08df4ce |
be/src/service/backend_service.cpp
Outdated
|
|
||
| void BackendService::clean_trash() { | ||
| StorageEngine::instance()->start_trash_sweep(nullptr); // do not update usage | ||
| StorageEngine::instance()->start_trash_sweep(nullptr, true); // do not update usage, ignore guard_space |
There was a problem hiding this comment.
| StorageEngine::instance()->start_trash_sweep(nullptr, true); // do not update usage, ignore guard_space | |
| StorageEngine::instance()->start_trash_sweep(nullptr, true); // update usage, ignore guard_space |
| **This operation will affect [Restore data from BE Recycle Bin](./tablet-restore-tool.md).** | ||
|
|
||
| If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)` to actively clean up temporary files. There are two situations as follows: | ||
| If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. |
There was a problem hiding this comment.
| If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. | |
| If the BE can still be started, you can use `ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. |
| If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. | ||
|
|
||
|
|
||
| If you do not manually execute `CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: |
There was a problem hiding this comment.
| If you do not manually execute `CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: | |
| If you do not manually execute `ADMIN CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: |
| 如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)`来主动清理临时文件,这里分为如下两种情况: | ||
| * 如果磁盘占用未达到 **危险水位(Flood Stage)** 的90%,则会清理过期trash文件和过期snapshot文件,此时会保留一些近期文件而不影响恢复数据。 | ||
| * 如果磁盘占用已达到 **危险水位(Flood Stage)** 的90%,则会清理 **所有** trash文件和过期snapshot文件, **此时也会影响从回收站恢复数据的操作** 。 | ||
| 如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。 |
| 如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。 | ||
|
|
||
| 如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理。 | ||
| 如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理,这里分为两种情况: |
There was a problem hiding this comment.
same as above
I fix these problems at 3c899f6
|
|
||
| @Override | ||
| public RedirectStatus getRedirectStatus() { | ||
| return RedirectStatus.FORWARD_WITH_SYNC; |
There was a problem hiding this comment.
Do we need to forward this stmt to master?
There was a problem hiding this comment.
Do we need to forward this stmt to master?
This does not seem to modify the metadata, so I change it to NO_FORWARD at 9765d59
| } | ||
|
|
||
| void BackendService::clean_trash() { | ||
| StorageEngine::instance()->start_trash_sweep(nullptr, true); |
There was a problem hiding this comment.
It may takes a very long time to clean the trash. So I suggest to use a async call.
There was a problem hiding this comment.
It may takes a very long time to clean the trash. So I suggest to use a async call.
I think this is already async, because of I use oneway to define the function at thrift file.
gensrc/thrift/BackendService.thrift
oneway void clean_trash();
|
|
||
| TStreamLoadRecordResult get_stream_load_record(1: i64 last_stream_record_time); | ||
|
|
||
| oneway void clean_trash(); |
There was a problem hiding this comment.
Is it safe to be called multi times using oneway?
And is the method start_trash_sweep() thread safe?
author BiteTheDDDDt <952130278@qq.com> 1626945340 +0800 committer BiteTheDDDDt <952130278@qq.com> 1628491167 +0800 support for clean trash used on backends && add document of clean trash fix wrong format on CleanTrashStmt toSql Update fe/fe-core/src/main/java/org/apache/doris/qe/DdlExecutor.java fix format Co-authored-by: EmmyMiao87 <522274284@qq.com> add description about 'clean trash' at disk-capacity.md Translate document /administrator-guide/operation/disk-capacity.md to english add more document description about clean trash && remove unused variable fix blank in markdown 1. Ignore guard space when clean trash. 2. Change query format from 'clean trash' to 'admin clean trash'. 3. Update document about clean trash. 1. improve comments. 2. remove useless function (AdminCleanTrashStmt.toSql()). 3. fix document. change 'FORWARD_WITH_SYNC' to 'NO_FORWARD' at AdminCleanTrashStmt support for clean trash used on backends && add document of clean trash fix wrong format on CleanTrashStmt toSql 1. Ignore guard space when clean trash. 2. Change query format from 'clean trash' to 'admin clean trash'. 3. Update document about clean trash. 1. improve comments. 2. remove useless function (AdminCleanTrashStmt.toSql()). 3. fix document. change 'FORWARD_WITH_SYNC' to 'NO_FORWARD' at AdminCleanTrashStmt
be/src/olap/storage_engine.cpp
Outdated
| // clean unused rowset metas in OlapMeta | ||
| _clean_unused_rowset_metas(); | ||
|
|
||
| _trash_sweep_lock.unlock(); |
There was a problem hiding this comment.
The method may be returned before you unlock this lock.
you can use src//util/mutex.h to unlock automatically when deconstructing.
There was a problem hiding this comment.
The method may be returned before you unlock this lock.
you can usesrc//util/mutex.hto unlock automatically when deconstructing.
I use unique_lock to fix it.
|
PR approved by at least one committer and no changes requested. |
Proposed changes
Support for cleaning the trash actively.
User can use 'CLEAN TRASH' to clean trash.
Types of changes
What types of changes does your code introduce to Doris?
Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.