Skip to content

[Feature] Support for cleaning the trash actively#6323

Merged
yangzhg merged 9 commits intoapache:masterfrom
BiteTheDDDDt:dev_clear
Aug 12, 2021
Merged

[Feature] Support for cleaning the trash actively#6323
yangzhg merged 9 commits intoapache:masterfrom
BiteTheDDDDt:dev_clear

Conversation

@BiteTheDDDDt
Copy link
Contributor

Proposed changes

Support for cleaning the trash actively.
User can use 'CLEAN TRASH' to clean trash.

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)
  • Optimization. Including functional usability improvements and performance improvements.
  • Dependency. Such as changes related to third-party components.
  • Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix [Feature] Support for cleaning the trash actively #6322) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

@EmmyMiao87
Copy link
Contributor

Please enrich your commit msg


# CLEAN TRASH
## description
该语句用于清理 backend 内的垃圾数据。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this statement clean up both trash and snapshot?
Will cleaning up the snapshot involve the snapshot being restored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this statement clean up both trash and snapshot?
Will cleaning up the snapshot involve the snapshot being restored?

Yes, this statement will clean up both trash and snapshot.

This statement will call StorageEngine::start_trash_sweep.
It will only clean up expired data (define at config::snapshot_expire_time_sec/config::trash_file_expire_time_sec).
And this function will be automatically called periodically, so I think its cleanup is harmless.

@EmmyMiao87 EmmyMiao87 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 26, 2021
@EmmyMiao87 EmmyMiao87 self-assigned this Jul 26, 2021
"SHOW MIGRATIONS",
"SHOW PLUGINS",
"SHOW TABLE STATUS",
"CLEAN TRASH",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add both guide and sql reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add both guide and sql reference

c31dd3e
I add description at /administrator-guide/operation/disk-capacity.md .
At the same time, I found that there is no corresponding English version of this document.

TNetworkAddress address = null;
boolean ok = false;
try {
long start = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable start is not used later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable start is not used later.

fixed

EmmyMiao87
EmmyMiao87 previously approved these changes Jul 27, 2021
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 27, 2021
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

nimuyuhan
nimuyuhan previously approved these changes Jul 27, 2021
Copy link
Contributor

@nimuyuhan nimuyuhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to force sweep trash?

@BiteTheDDDDt BiteTheDDDDt dismissed stale reviews from nimuyuhan and EmmyMiao87 via 08df4ce July 28, 2021 06:17
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 28, 2021
@BiteTheDDDDt
Copy link
Contributor Author

How to force sweep trash?

I do some change at 08df4ce


void BackendService::clean_trash() {
StorageEngine::instance()->start_trash_sweep(nullptr); // do not update usage
StorageEngine::instance()->start_trash_sweep(nullptr, true); // do not update usage, ignore guard_space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StorageEngine::instance()->start_trash_sweep(nullptr, true); // do not update usage, ignore guard_space
StorageEngine::instance()->start_trash_sweep(nullptr, true); // update usage, ignore guard_space

**This operation will affect [Restore data from BE Recycle Bin](./tablet-restore-tool.md).**

If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)` to actively clean up temporary files. There are two situations as follows:
If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.
If the BE can still be started, you can use `ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.

If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.


If you do not manually execute `CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you do not manually execute `CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows:
If you do not manually execute `ADMIN CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows:

如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)`来主动清理临时文件,这里分为如下两种情况:
* 如果磁盘占用未达到 **危险水位(Flood Stage)** 的90%,则会清理过期trash文件和过期snapshot文件,此时会保留一些近期文件而不影响恢复数据。
* 如果磁盘占用已达到 **危险水位(Flood Stage)** 的90%,则会清理 **所有** trash文件和过期snapshot文件, **此时也会影响从回收站恢复数据的操作** 。
如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。

如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理。
如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理,这里分为两种情况:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

I fix these problems at 3c899f6


@Override
public RedirectStatus getRedirectStatus() {
return RedirectStatus.FORWARD_WITH_SYNC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to forward this stmt to master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to forward this stmt to master?

This does not seem to modify the metadata, so I change it to NO_FORWARD at 9765d59

}

void BackendService::clean_trash() {
StorageEngine::instance()->start_trash_sweep(nullptr, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may takes a very long time to clean the trash. So I suggest to use a async call.

Copy link
Contributor Author

@BiteTheDDDDt BiteTheDDDDt Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may takes a very long time to clean the trash. So I suggest to use a async call.

I think this is already async, because of I use oneway to define the function at thrift file.
gensrc/thrift/BackendService.thrift
oneway void clean_trash();


TStreamLoadRecordResult get_stream_load_record(1: i64 last_stream_record_time);

oneway void clean_trash();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to be called multi times using oneway?
And is the method start_trash_sweep() thread safe?

author BiteTheDDDDt <952130278@qq.com> 1626945340 +0800
committer BiteTheDDDDt <952130278@qq.com> 1628491167 +0800

support for clean trash used on backends && add document of clean trash

fix wrong format on CleanTrashStmt toSql

Update fe/fe-core/src/main/java/org/apache/doris/qe/DdlExecutor.java

fix format

Co-authored-by: EmmyMiao87 <522274284@qq.com>

add description about 'clean trash' at disk-capacity.md

Translate document /administrator-guide/operation/disk-capacity.md to english

add more document description about clean trash && remove unused variable

fix blank in markdown

1. Ignore guard space when clean trash.
2. Change query format from 'clean trash' to 'admin clean trash'.
3. Update document about clean trash.

1. improve comments.
2. remove useless function (AdminCleanTrashStmt.toSql()).
3. fix document.

change 'FORWARD_WITH_SYNC' to 'NO_FORWARD' at AdminCleanTrashStmt

support for clean trash used on backends && add document of clean trash

fix wrong format on CleanTrashStmt toSql

1. Ignore guard space when clean trash.
2. Change query format from 'clean trash' to 'admin clean trash'.
3. Update document about clean trash.

1. improve comments.
2. remove useless function (AdminCleanTrashStmt.toSql()).
3. fix document.

change 'FORWARD_WITH_SYNC' to 'NO_FORWARD' at AdminCleanTrashStmt
// clean unused rowset metas in OlapMeta
_clean_unused_rowset_metas();

_trash_sweep_lock.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method may be returned before you unlock this lock.
you can use src//util/mutex.h to unlock automatically when deconstructing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method may be returned before you unlock this lock.
you can use src//util/mutex.h to unlock automatically when deconstructing.

I use unique_lock to fix it.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 11, 2021
@yangzhg yangzhg merged commit 8a267f1 into apache:master Aug 12, 2021
@morningman morningman mentioned this pull request Oct 10, 2021
@BiteTheDDDDt BiteTheDDDDt deleted the dev_clear branch January 20, 2025 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. kind/feature Categorizes issue or PR as related to a new feature. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support for cleaning the trash actively

5 participants