Skip to content

Conversation

@hangc0276
Copy link
Contributor

Motivation

When we use TransactionalEntryLogCompactor to compact the entry log files, it will generate a lot of small entry log files, and for those files, the file usage is usually greater than 90%, which can not be compacted unless the file usage decreased.

image

Changes

We introduce the entry log file size check during compaction, and the checker is controlled by gcEntryLogSizeRatio.
If the total entry log file size is less than gcEntryLogSizeRatio * logSizeLimit, the entry log file will be compacted even though the file usage is greater than 90%. This feature is disabled by default and the gcEntryLogSizeRatio default value is 0.0

@hangc0276 hangc0276 self-assigned this Nov 10, 2022
@hangc0276 hangc0276 added this to the 4.16.0 milestone Nov 10, 2022
@zymap
Copy link
Member

zymap commented Nov 11, 2022

Open an email to discuss it?

@hangc0276
Copy link
Contributor Author

Open an email to discuss it?

@zymap Ok, I send an email.

@hangc0276
Copy link
Contributor Author

ping @merlimat @eolivelli @dlg99 @zymap Please help take a look at this PR, thanks.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

+1

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, see my comment with some ideas for the improvements

int[] compactedBuckets = new int[numBuckets];

ArrayList<LinkedList<Long>> compactableBuckets = new ArrayList<>(numBuckets);
ArrayList<LinkedList<Long>> smallFilesCompactableBuckets = new ArrayList<>(numBuckets);
Copy link
Contributor

@dlg99 dlg99 Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid addition of another type of buckets.
Have something like useTargetEntryLogSizeForGc config;

Replace

calculateUsageIndex(numBuckets, meta.getUsage());

with something like

double usage = meta.getUsage();

if (conf.getUseTargetEntryLogSizeForGc() && usage < 1.0d) {
  usage = (double) meta.getRemainingSize() / Math.max(meta.getTotalSize(), conf.getEntryLogSizeLimit());
}

calculateUsageIndex(numBuckets, usage);

and the rest will happen naturally.

As an improvement, transactional compaction can fallback to a regular one if the estimated size of entry log after compaction is below some limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlg99 Good suggestion. I updated the code with useTargetEntryLogSizeForGc flag. Please help take a look, thanks.

As an improvement, transactional compaction can fallback to a regular one if the estimated size of entry log after compaction is below some limit.

For this one, I will open a new PR to handle it.

@hangc0276 hangc0276 force-pushed the chenhang/enhence_gc_small_files branch from 0ee446e to d6a2ccc Compare March 13, 2023 07:57
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #3631 (c48bf69) into master (06c3cab) will decrease coverage by 19.68%.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master    #3631       +/-   ##
=============================================
- Coverage     68.33%   48.65%   -19.68%     
+ Complexity     6770     4781     -1989     
=============================================
  Files           473      473               
  Lines         40982    40994       +12     
  Branches       5241     5245        +4     
=============================================
- Hits          28005    19947     -8058     
- Misses        10722    19068     +8346     
+ Partials       2255     1979      -276     
Flag Coverage Δ
bookie ?
client 44.28% <0.00%> (+0.07%) ⬆️
remaining 29.41% <0.00%> (-0.10%) ⬇️
replication ?
tls 20.99% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache/bookkeeper/bookie/GarbageCollectorThread.java 34.34% <0.00%> (-42.89%) ⬇️
...rg/apache/bookkeeper/conf/ServerConfiguration.java 53.03% <0.00%> (-24.47%) ⬇️

... and 251 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@hangc0276 hangc0276 merged commit 2fad33b into apache:master Mar 16, 2023
merlimat added a commit to merlimat/bookkeeper that referenced this pull request Mar 21, 2023
* Fix memory leak issue of reading small entries (apache#3844)

* Make read entry request recyclable (apache#3842)

* Make read entry request recyclable

* Move recycle to finally block

* Fix test and comments

* Fix test

* Avoid unnecessary force write. (apache#3847)

* Avoid unnecessary force write.

* code clean.

* fix style

* Correct the running job name for the test group (apache#3851)

---

### Motivation

The running tests job name doesn't match the tests. Correct
the job name.

* add timeout for two flaky timeout tests (apache#3855)

* add V2 protocal and warmupMessages support for benchMark (apache#3856)

* disable trimStackTrack for code-coverage profile (apache#3854)

* Fix bkperf log directory not found (apache#3858)

### Motivation
When using the bkperf command `bin/bkperf journal append -j data -n 100000000 --sync true` to test the BookKeeper journal performance, it failed with the following exception
```
[0.002s][error][logging] Error opening log file '/Users/hangc/Downloads/tmp/tc/batch/ta/bookkeeper-all-4.16.0-SNAPSHOT/logs/bkperf-gc.log': No such file or directory
[0.002s][error][logging] Initialization of output 'file=/Users/hangc/Downloads/tmp/tc/batch/ta/bookkeeper-all-4.16.0-SNAPSHOT/logs/bkperf-gc.log' using options 'filecount=5,filesize=64m' failed.
Invalid -Xlog option '-Xlog:gc=info:file=/Users/hangc/Downloads/tmp/tc/batch/ta/bookkeeper-all-4.16.0-SNAPSHOT/logs/bkperf-gc.log::filecount=5,filesize=64m', see error log for details.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
```

The root cause is that the `logs` directory was not created.

### Modifications
Create the `logs` directory before bkperf started.

* [improve] Fix indexDirs upgrade failed (apache#3762)

* fix indexDirs upgrade failed

* Bump checkstyle-plugin from 3.1.2 to 3.2.1 (apache#3850)

* [Flaky] Fix flaky test in testRaceGuavaEvictAndReleaseBeforeRetain (apache#3857)

* Fix flaky test in testRaceGuavaEvictAndReleaseBeforeRetain

* format code

* Fix NPE in BenchThroughputLatency (apache#3859)

* Update website to 4.15.4 (apache#3862)

---

### Motivation

Update website to 4.15.4

* change rocksDB config level_compaction_dynamic_level_bytes to CFOptions (apache#3860)

### Motivation
After PR apache#3056 , Bookkeeper set `level_compaction_dynamic_level_bytes=true` as `TableOptions` in `entry_location_rocksdb.conf.default` , which will cause `level_compaction_dynamic_level_bytes` lose efficacy and will cause rocksDB .sst file compact sort chaos when update bookie release.
As RocksDB  conf, `level_compaction_dynamic_level_bytes` need set as `CFOptions` https://github.com/facebook/rocksdb/blob/master/examples/rocksdb_option_file_example.ini

<img width="703" alt="image" src="https://user-images.githubusercontent.com/84127069/224640399-d5481fe5-7b75-4229-ac06-3d280aa9ae6d.png">


<img width="240" alt="image" src="https://user-images.githubusercontent.com/84127069/224640621-737d0a42-4e01-4f38-bd5a-862a93bc4b32.png">

### Changes

1. Change `level_compaction_dynamic_level_bytes=true` from `TableOptions` to `CFOptions`  in `entry_location_rocksdb.conf.default` ;

* Correct the running job flag for the test group. (apache#3865)

* Release note for 4.15.4 (apache#3831)

---

### Motivation

Release note for 4.15.4

* Add trigger entry location index rocksDB compact interface. (apache#3802)

### Motivation
After the bookie instance running long time, the bookie entry location index rocksDB `.sst` file size maybe expand to 20-30GB as one ledger data dir's location index in some case, which will cause the rocksDB scan operator cost more time and cause the bookie client request timeout.

Add trigger entry location index rocksDB compact REST API which can trigger  entry location rocksDB compaction and get the compaction status. 

The full range entry location index rocksDB compact will cause the entry location index dir express higher IOUtils. So we'd better trigger the entry location rocksDB compact by the api in low data flow period.

**Some case before rocksDB compact:**
<img width="232" alt="image" src="https://user-images.githubusercontent.com/84127069/220893469-e6fbc1a3-c767-4ffe-8ae9-f05ad1833c50.png">


<img width="288" alt="image" src="https://user-images.githubusercontent.com/84127069/220891359-dc37e139-37b0-461b-8001-dcc48517366c.png">

**After rocksDB compact:**
<img width="255" alt="image" src="https://user-images.githubusercontent.com/84127069/220891419-24267fa7-348c-4fbd-8b3e-70a99840bce5.png">

### Changes
1. Add  REST API  to trigger entry location index rocksDB compact.

* Pick the higher leak detection level between netty and bookkeeper. (apache#3794)

### Motivation
1. Pick the higher leak detection level between netty and bookkeeper.
2. Enhance the bookkeeper leak detection value match rule, now it's case insensitive.

There are detailed information about it: https://lists.apache.org/thread/d3zw8bxhlg0wxfhocyjglq0nbxrww3sg

* Disable code coverage and codecov report (apache#3863)

### Motivation

There're two reasons that we want to disable the code coverage.

1. The current report result is not accurate.
2. We can't get the PR's unit test's code coverage because of the apache Codecov permission.

* Add small files check in garbage collection (apache#3631)

### Motivation
When we use `TransactionalEntryLogCompactor` to compact the entry log files, it will generate a lot of small entry log files, and for those files, the file usage is usually greater than 90%, which can not be compacted unless the file usage decreased.

![image](https://user-images.githubusercontent.com/5436568/201135615-4d6072f5-e353-483d-9afb-48fad8134044.png)


### Changes
We introduce the entry log file size check during compaction, and the checker is controlled by `gcEntryLogSizeRatio`. 
If the total entry log file size is less than `gcEntryLogSizeRatio * logSizeLimit`, the entry log file will be compacted even though the file usage is greater than 90%. This feature is disabled by default and the `gcEntryLogSizeRatio` default value is `0.0`

* [improvement] Delay all audit task when have a already delayed bookie check task (apache#3818)

### Motivation

Fixes apache#3817 

For details, see: apache#3817 

### Changes

When there is an `auditTask` during the `lostBookieRecoveryDelay` delay, other detection tasks should be skipped.

* Change order of doGcLedgers and extractMetaFromEntryLogs (apache#3869)

* [Bugfix] make metadataDriver initialization more robust (apache#3873)

Co-authored-by: zengqiang.xu <zengqiang.xu@shopee.com>

* Enable CI for the streamstorage python client (apache#3875)

* Fix compaction threshold default value precision problem. (apache#3871)

* Fix compaction threshold precision problem.

* Fix compaction threshold precision problem.

* Single buffer for small add requests (apache#3783)

* Single buffer for small add requests

* Fixed checkstyle

* Fixed treating of ComposityByteBuf

* Fixed merge issues

* Fixed merge issues

* WIP

* Fixed test and removed dead code

* Removed unused import

* Fixed BookieJournalTest

* removed unused import

* fix the checkstyle

* fix failed test

* fix failed test

---------

Co-authored-by: chenhang <chenhang@apache.org>

* Add log for entry log file delete. (apache#3872)

* Add log for entry log file delete.

* add log info.

* Address the comment.

* Address the comment.

* revert the code.

* Improve group and flush add-responses after journal sync (apache#3848)

Descriptions of the changes in this PR:
This is an improvement for apache#3837

### Motivation
1. Now if the maxPendingResponsesSize is expanded large, it will not decrease. => We should make it flexible.
2. Now after prepareSendResponseV2 to the channel, then we trigger all channels to flush pendingSendResponses, maybe there is only a few channels that need to flush, but if we trigger all channels, it's a waste. => We only flush the channel which prepareSendResponseV2.

---------

Co-authored-by: Penghui Li <penghui@apache.org>
Co-authored-by: Yong Zhang <zhangyong1025.zy@gmail.com>
Co-authored-by: Hang Chen <chenhang@apache.org>
Co-authored-by: wenbingshen <oliver.shen999@gmail.com>
Co-authored-by: ZhangJian He <shoothzj@gmail.com>
Co-authored-by: lixinyang <84127069+Nicklee007@users.noreply.github.com>
Co-authored-by: YANGLiiN <ielin@qq.com>
Co-authored-by: Lishen Yao <yaalsn@gmail.com>
Co-authored-by: Andrey Yegorov <8622884+dlg99@users.noreply.github.com>
Co-authored-by: ZanderXu <zanderxu@apache.org>
Co-authored-by: zengqiang.xu <zengqiang.xu@shopee.com>
Co-authored-by: Matteo Merli <mmerli@apache.org>
@hangc0276
Copy link
Contributor Author

This PR is hard to be cherry-picked to branch-4.14, and I will push another PR for branch-4.14

hangc0276 added a commit to hangc0276/bookkeeper that referenced this pull request Jun 27, 2023
When we use `TransactionalEntryLogCompactor` to compact the entry log files, it will generate a lot of small entry log files, and for those files, the file usage is usually greater than 90%, which can not be compacted unless the file usage decreased.

![image](https://user-images.githubusercontent.com/5436568/201135615-4d6072f5-e353-483d-9afb-48fad8134044.png)

We introduce the entry log file size check during compaction, and the checker is controlled by `gcEntryLogSizeRatio`.
If the total entry log file size is less than `gcEntryLogSizeRatio * logSizeLimit`, the entry log file will be compacted even though the file usage is greater than 90%. This feature is disabled by default and the `gcEntryLogSizeRatio` default value is `0.0`

(cherry picked from commit 2fad33b)
@zymap
Copy link
Member

zymap commented Dec 6, 2023

Cherry-pick this change without the tests class. The test class is based on the direct IO improvement.

zymap pushed a commit that referenced this pull request Dec 6, 2023
### Motivation
When we use `TransactionalEntryLogCompactor` to compact the entry log files, it will generate a lot of small entry log files, and for those files, the file usage is usually greater than 90%, which can not be compacted unless the file usage decreased.

![image](https://user-images.githubusercontent.com/5436568/201135615-4d6072f5-e353-483d-9afb-48fad8134044.png)

### Changes
We introduce the entry log file size check during compaction, and the checker is controlled by `gcEntryLogSizeRatio`.
If the total entry log file size is less than `gcEntryLogSizeRatio * logSizeLimit`, the entry log file will be compacted even though the file usage is greater than 90%. This feature is disabled by default and the `gcEntryLogSizeRatio` default value is `0.0`

(cherry picked from commit 2fad33b)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
### Motivation
When we use `TransactionalEntryLogCompactor` to compact the entry log files, it will generate a lot of small entry log files, and for those files, the file usage is usually greater than 90%, which can not be compacted unless the file usage decreased.

![image](https://user-images.githubusercontent.com/5436568/201135615-4d6072f5-e353-483d-9afb-48fad8134044.png)


### Changes
We introduce the entry log file size check during compaction, and the checker is controlled by `gcEntryLogSizeRatio`. 
If the total entry log file size is less than `gcEntryLogSizeRatio * logSizeLimit`, the entry log file will be compacted even though the file usage is greater than 90%. This feature is disabled by default and the `gcEntryLogSizeRatio` default value is `0.0`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants