KAFKA-4850:RocksDb cannot use Bloom Filters#3048
KAFKA-4850:RocksDb cannot use Bloom Filters#3048bharatviswa504 wants to merge 3 commits intoapache:trunkfrom
Conversation
|
@enothereska @mjsax @guozhangwang Could you please review. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
retest this please |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
tests passed locally. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
The test failure on JDK 7 is happening on other PR's too. |
|
retest this please |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Thank you. I think it looks good. I've started some system tests to see what impact this has on performance, will post link here once done. @bharatviswa504 do you know what the memory overhead might be when turning on bloom filters? |
| final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig(); | ||
| tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE); | ||
| tableConfig.setBlockSize(BLOCK_SIZE); | ||
| tableConfig.setFilter(new BloomFilter(10)); |
There was a problem hiding this comment.
What does the 10 mean? Could you extract it to a constant so it is named?
There was a problem hiding this comment.
@dguy It is bits per key. Added a constant to name it.
|
system test passed: https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/291/console. I don't necessarily see a perf improvement, but I'm not sure the tests are designed for showing off any improvement. @bharatviswa504 any suggestions on a good test/benchmark? |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
@enothereska Do you think increasing block size will improve performance? I don't have much details with rocksdb, let me know if I am missing something. |
dguy
left a comment
There was a problem hiding this comment.
LGTM - though would be good to understand how this impacts memory usage, performance etc
|
My understanding is that bloom filters is only beneficial for single key-value lookups, so the current system tests may not be best fit for analyzing its impacts. More specifically, for windowed rocksdb store since we always store the window start timestamp as prefix of the key, which is 64bits, hence first 10bits bloom filter would not help much on filtering those keys; for such cases it is better to have the prefix seeking support as discussed here: The code itself LGTM, but I think it is better to execute a benchmark with the following settings before merging the PR:
@bharatviswa504 Do you want to do this benchmark? |
|
Replaced by #6012 |
Added BloomFilter to speedup rocksdb lookup.