Skip to content

KAFKA-4850:RocksDb cannot use Bloom Filters#3048

Closed
bharatviswa504 wants to merge 3 commits intoapache:trunkfrom
bharatviswa504:KAFKA-4850
Closed

KAFKA-4850:RocksDb cannot use Bloom Filters#3048
bharatviswa504 wants to merge 3 commits intoapache:trunkfrom
bharatviswa504:KAFKA-4850

Conversation

@bharatviswa504
Copy link
Copy Markdown

Added BloomFilter to speedup rocksdb lookup.

@bharatviswa504
Copy link
Copy Markdown
Author

bharatviswa504 commented May 14, 2017

@enothereska @mjsax @guozhangwang Could you please review.

@asfbot
Copy link
Copy Markdown

asfbot commented May 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3882/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented May 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3871/
Test FAILed (JDK 8 and Scala 2.12).

@bharatviswa504
Copy link
Copy Markdown
Author

bharatviswa504 commented May 14, 2017

retest this please

@asfbot
Copy link
Copy Markdown

asfbot commented May 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3884/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented May 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3873/
Test FAILed (JDK 8 and Scala 2.12).

@bharatviswa504
Copy link
Copy Markdown
Author

bharatviswa504 commented May 14, 2017

tests passed locally.

@asfbot
Copy link
Copy Markdown

asfbot commented May 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3895/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented May 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3884/
Test PASSed (JDK 8 and Scala 2.12).

@bharatviswa504
Copy link
Copy Markdown
Author

The test failure on JDK 7 is happening on other PR's too.
Test failure is not related to this change.

@bharatviswa504
Copy link
Copy Markdown
Author

retest this please

@asfbot
Copy link
Copy Markdown

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3957/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3946/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3963/
Test PASSed (JDK 7 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3952/
Test PASSed (JDK 8 and Scala 2.12).

@enothereska
Copy link
Copy Markdown
Contributor

Thank you. I think it looks good. I've started some system tests to see what impact this has on performance, will post link here once done. @bharatviswa504 do you know what the memory overhead might be when turning on bloom filters?

final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE);
tableConfig.setBlockSize(BLOCK_SIZE);
tableConfig.setFilter(new BloomFilter(10));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the 10 mean? Could you extract it to a constant so it is named?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dguy It is bits per key. Added a constant to name it.

@enothereska
Copy link
Copy Markdown
Contributor

system test passed: https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/291/console. I don't necessarily see a perf improvement, but I'm not sure the tests are designed for showing off any improvement. @bharatviswa504 any suggestions on a good test/benchmark?

@asfbot
Copy link
Copy Markdown

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4014/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4000/
Test PASSed (JDK 8 and Scala 2.12).

@bharatviswa504
Copy link
Copy Markdown
Author

bharatviswa504 commented May 16, 2017

@enothereska
https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#memory-usage

Do you think increasing block size will improve performance?

I don't have much details with rocksdb, let me know if I am missing something.

Copy link
Copy Markdown
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - though would be good to understand how this impacts memory usage, performance etc

@guozhangwang
Copy link
Copy Markdown
Contributor

My understanding is that bloom filters is only beneficial for single key-value lookups, so the current system tests may not be best fit for analyzing its impacts. More specifically, for windowed rocksdb store since we always store the window start timestamp as prefix of the key, which is 64bits, hence first 10bits bloom filter would not help much on filtering those keys; for such cases it is better to have the prefix seeking support as discussed here:

https://confluentinc.atlassian.net/wiki/display/Engineering/Discussion%3A+Non-key+KTable-KTable+Joins#Discussion:Non-keyKTable-KTableJoins-Option1:useRocksDBprefixseeking

The code itself LGTM, but I think it is better to execute a benchmark with the following settings before merging the PR:

  1. have a running aggregate in Streams with RocksDBStore (i.e. not windowed store).
  2. have another thread continuously query the rocksDB store on single key lookup.
  3. compare the query QPS with / wo this PR; expect to have non-neglectible perf diffs.

@bharatviswa504 Do you want to do this benchmark?

@mjsax mjsax added the streams label Jan 30, 2018
@mjsax mjsax mentioned this pull request Jan 14, 2019
3 tasks
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jan 15, 2019

Replaced by #6012

@mjsax mjsax closed this Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants