KAFKA-4850: Enable bloomfilters by bbejeck · Pull Request #6012 · apache/kafka

bbejeck · 2018-12-07T15:47:45Z

This PR enables BloomFilters for RocksDB to speed up point lookups.
The request for this has been around for some time - https://issues.apache.org/jira/browse/KAFKA-4850

For testing, I've done the following

Ran the standard streams suite of unit and integration tests
Kicked off the simple benchmark test with bloom filters enabled
Kicked off the simple benchmark test with bloom filters not enabled
Kicked off streams system tests

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

bbejeck · 2018-12-07T15:49:37Z

ping @guozhangwang, @mjsax, and @vvcephei for reviews

bbejeck · 2018-12-07T18:48:49Z

-    @SuppressWarnings("unchecked")
-    public void openDB(final ProcessorContext context) {
-        // initialize the default rocksdb options
+    protected TableFormatConfig getTableConfig() {


Refactor table config to method

bbejeck · 2018-12-07T18:58:52Z

    }

+    @Override
+    protected TableFormatConfig getTableConfig() {


For windowed stores, don't enable bloom filters

Why? It is about range queries? Re-call that we convert range-queries into multiple point-lookups.

Furthermore, range queries could happen via IQ on key-value-stores, too.

Why? It is about range queries? Re-call that we convert range-queries into multiple point-lookups.

I took a cursory look at the code, but you raise a good point. Overall I'm thinking maybe we need to see if Bloom-filters affect range queries and if not, maybe just enable them across the board.

bbejeck · 2018-12-07T19:01:06Z

kicked off both streams simple benchmark tests

vvcephei · 2018-12-10T15:08:21Z

Thanks, @bbejeck !

Do you know if Rocks will automatically upgrade existing stores to add the bloom filter, or, if not, will it gracefully handle their absence?

bbejeck · 2018-12-10T17:42:58Z

Do you know if Rocks will automatically upgrade existing stores to add the bloom filter, or, if not, will it gracefully handle their absence?

@vvcephei - I think so, but I'll add a test to an existing unit/integration test to confirm

vvcephei · 2019-01-08T21:54:40Z

        final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
        tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE);
        tableConfig.setBlockSize(BLOCK_SIZE);
+        tableConfig.setFilter(new BloomFilter());


Hi @bbejeck,

Should we also consider options.optimizeFiltersForHits() to save memory on the bloom filter in exchange for one I/O for each get on a missing key?

(see https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks)

I just ran across this while reading about caching in Rocks.

@vvcephei yeah that makes sense.

\cc @guozhangwang @mjsax WDYT?

FWIW, here's what I'm thinking...
Bloom filters can only tell you if the key is definitely not in the set. They can only answer "no" or "maybe". So if the filter says the key isn't in some sst we don't have to actually do the I/O to check, but if it says "maybe", we do still have to check.

This optimization would add a bloom filter to every level in the SST hierarchy except the last level.

The rationale is that if you're pretty sure the keys you get are in the db, the bloom filters at the higher levels would let you skip querying the SSTs that don't contain your key. If you get all the way to the bottom level, we're pretty sure the key is there (via our prior assumption), so checking the bloom filter isn't that valuable, since it would rarely answer "no".
If it answers "maybe", we have to check anyway. In other words, the filter only saves I/O in the rare case that it does say "no".
On the other hand, the last level has the most keys in it, so those are the most expensive filters. By dropping that last level of filters, we save a bunch of memory in exchange for rare extra I/Os.

Do we have a prior assumption that the keys we query for are rarely missing? I think so...
In general, Streams only does a get while computing an aggregation value, etc. In this case, it does a get followed by a put. Therefore, the only get that might return missing is the very first one for each key.

Factors that would cause more missed gets would be stuff like:

IQ

data sets that have a lot of thrash in the key space

Thanks for the more in-depth explanation @vvcephei sounds good to me.

Thanks @vvcephei for the explanation. I agree with you that for most cases we expect multiple gets on each key (only the first get will miss), besides the ones you already listed that may not be the case another case is that windowed stream-stream join will always have distinct keys, but given that this may be fixed in the future I'm in favor of adding it as well.

mjsax · 2019-01-14T17:17:19Z

@bbejeck Should this be targeted to https://issues.apache.org/jira/browse/KAFKA-4850 instead of being a "MINOR" change?

mjsax · 2019-01-14T17:20:35Z

Seems there is another similar PR: #3048 -- should we close the other PR after this one gets merged?

\cc @guozhangwang

bbejeck · 2019-01-15T18:13:50Z

@bbejeck Should this be targeted to https://issues.apache.org/jira/browse/KAFKA-4850 instead of being a "MINOR" change?

@mjsax - done

guozhangwang

Hi @bbejeck , I've got two meta comments:

Could you upload the benchmark results (the original url has expired) as images to the PR here just for the record of the motivation for other readers?
As discussed in the PR, I think it is better to enable options.optimizeFiltersForHits() suggested by @vvcephei

guozhangwang · 2019-01-15T18:38:44Z

    private static final long WRITE_BUFFER_SIZE = 16 * 1024 * 1024L;
-    private static final long BLOCK_CACHE_SIZE = 50 * 1024 * 1024L;
-    private static final long BLOCK_SIZE = 4096L;
+    protected static final long BLOCK_CACHE_SIZE = 50 * 1024 * 1024L;


Why these two need to be protected now?

oversight from previous change, I'll revert

guozhangwang · 2019-01-15T18:47:00Z

        final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
        tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE);
        tableConfig.setBlockSize(BLOCK_SIZE);
+        tableConfig.setFilter(new BloomFilter());


Thanks @vvcephei for the explanation. I agree with you that for most cases we expect multiple gets on each key (only the first get will miss), besides the ones you already listed that may not be the case another case is that windowed stream-stream join will always have distinct keys, but given that this may be fixed in the future I'm in favor of adding it as well.

bbejeck · 2019-01-16T15:31:26Z

rebased with trunk and enabled optimizeFiltersForHits will add unit test next, pushed updated branch now to kick off smoke tests for perf numbers

guozhangwang · 2019-01-19T19:14:45Z

Thanks @bbejeck ! Please ping the reviewers whenever you think it's ready for another look again.

bbejeck · 2019-01-22T01:53:25Z

updated this with unit test showing saving records with bloom filters off, closing RocksDB then open again with bloom filters enabled, no errors and can retrieve previous records successfully.

bbejeck · 2019-01-22T01:58:11Z

Here's performance numbers based on 5 run of StreamsSimpleBenchmark

Bloom Filter Enabled	Bloom Filter Disabled
streamcount process-rate: 76084	streamcount process-rate: 71721
streamcountwindowed process-rate: 29442	streamcountwindowed process-rate: 29025
streamprocess process-rate: 89753	streamprocess process-rate: 90035
streamprocesswithsink process-rate: 64727	streamprocesswithsink process-rate: 69181
streamprocesswithstatestore process-rate: 54120	streamprocesswithstateststore process-rate: 51843
streamprocesswithwindowstore process-rate: 276	streamprocesswithwindowstore process-rate: 296
streamstreamjoin process-rate: 9022	streamstreamjoin process-rate: 8564
streamtablejoin process-rate: 62758	streamtablejoin process-rate: 61632
tabletablejoin process-rate: 23131	tabletablejoin process-rate: 20248

Bloom Filter Enabled	Bloom Filter Disabled
streamcount process-rate: 77640	streamcount process-rate: 72629
streamcountwindowed process-rate: 31144	streamcountwindowed process-rate: 29514
streamprocess process-rate: 89927	streamprocess process-rate: 90174
streamprocesswithsink process-rate: 67203	streamprocesswithsink process-rate: 71722
streamprocesswithstateststore process-rate: 55822	streamprocesswithstateststore process-rate: 53375
streamprocesswithwindowstore process-rate: 306	streamprocesswithwindowstore process-rate: 298
streamstreamjoin process-rate: 8829	streamstreamjoin process-rate: 8565
streamtablejoin process-rate: 63545	streamtablejoin process-rate: 61512
tabletablejoin process-rate: 22557	tabletablejoin process-rate: 20816

Bloom Filter Enabled	Bloom Filter Disabled
streamcount process-rate: 79359	streamcount process-rate: 68352
streamcountwindowed process-rate: 30884	streamcountwindowed process-rate: 27044
streamprocess process-rate: 89071	streamprocess process-rate: 87730
streamprocesswithsink process-rate: 67792	streamprocesswithsink process-rate: 68265
streamprocesswithstateststore process-rate: 54194	streamprocesswithstateststore process-rate: 49070
streamprocesswithwindowstore process-rate: 295	streamprocesswithwindowstore process-rate: 245
streamstreamjoin process-rate: 8906	streamstreamjoin process-rate: 8294
streamtablejoin process-rate: 63767	streamtablejoin process-rate: 54966
tabletablejoin process-rate: 22190	tabletablejoin process-rate: 18780

Bloom Filter Enabled	Bloom Filter Disabled
streamcount process-rate: 78089	streamcount process-rate: 71954
streamcountwindowed process-rate: 30700	streamcountwindowed process-rate: 28342
streamprocess process-rate: 89418	streamprocess process-rate: 88664
streamprocesswithsink process-rate: 66792	streamprocesswithsink process-rate: 68618
streamprocesswithstateststore process-rate: 54254	streamprocesswithstateststore process-rate: 51045
streamprocesswithwindowstore process-rate: 304	streamprocesswithwindowstore process-rate: 254
streamstreamjoin process-rate: 9020	streamstreamjoin process-rate: 8298
streamtablejoin process-rate: 61304	streamtablejoin process-rate: 61538
tabletablejoin process-rate: 20046	tabletablejoin process-rate: 22182

Bloom Filter Enabled	Bloom Filter Disabled
streamcount process-rate: 72707	streamcount process-rate: 75692
streamcountwindowed process-rate: 26947	streamcountwindowed process-rate: 30048
streamprocess process-rate: 89815	streamprocess process-rate: 88445
streamprocesswithsink process-rate: 65672	streamprocesswithsink process-rate: 71350
streamprocesswithstateststore process-rate: 51210	streamprocesswithstateststore process-rate: 53989
streamprocesswithwindowstore process-rate: 294	streamprocesswithwindowstore process-rate: 263
streamstreamjoin process-rate: 9317	streamstreamjoin process-rate: 8978
streamtablejoin process-rate: 62071	streamtablejoin process-rate: 60340
tabletablejoin process-rate: 22024	tabletablejoin process-rate: 21077

bbejeck · 2019-01-22T02:01:02Z

ping @ableegoldman, @guozhangwang, @mjsax, and @vvcephei for another review

vvcephei

Hi @bbejeck ,

Thanks for adding the test and the optimization. It looks good overall.

I'm just curious of the motivation for changing the vagrant config...

Thanks,
-John

vvcephei · 2019-01-22T15:27:11Z

Just curious, why did you need to update the jvm?

branch builder was failing without this update it's from the tools team, but I meant to pull this commit out, I'll rebase.

vvcephei

Looks good to me! Thanks, @bbejeck

guozhangwang

LGTM. Thanks @bbejeck .

One interesting observation though is that for process-with-sink, which does not use state stores at all, the new code is consistently worse than current trunk... we need to look into a better way for consistent perf regression testing :)

* ak/trunk: MINOR: fix race condition in KafkaStreamsTest (apache#6185) KAFKA-4850: Enable bloomfilters (apache#6012) MINOR: ducker-ak: add down -f, avoid using a terminal in ducker test KAFKA-5117: Stop resolving externalized configs in Connect REST API MINOR: Cleanup handling of mixed transactional/idempotent records (apache#6172) KAFKA-7844: Use regular subproject for generator to fix *All targets (apache#6182) Fix Documentation for cleanup.policy is out of date (apache#6181) MINOR: increase timeouts for KafkaStreamsTest (apache#6178) MINOR: Rejoin split ssl principal mapping rules (apache#6099) MINOR: Handle case where connector status endpoints returns 404 (apache#6176) MINOR: Remove unused imports, exceptions, and values (apache#6117) KAFKA-3522: Add internal RecordConverter interface (apache#6150) Fix Javadoc of KafkaConsumer (apache#6155) KAFKA-6455: Extend CacheFlushListener to forward timestamp (apache#6147) MINOR: Log partition info when creating new request batch in controller (apache#6145) KAFKA-7652: Part I; Fix SessionStore's findSession(single-key) (apache#6134) MINOR: Remove the InvalidTopicException handling in InternalTopicManager (apache#6167) [KAFKA-7024] Rocksdb state directory should be created before opening the DB (apache#6138) MINOR:: Fix typos (apache#6079)

This PR enables BloomFilters for RocksDB to speed up point lookups. The request for this has been around for some time - https://issues.apache.org/jira/browse/KAFKA-4850 For testing, I've done the following Ran the standard streams suite of unit and integration tests Kicked off the simple benchmark test with bloom filters enabled Kicked off the simple benchmark test with bloom filters not enabled Kicked off streams system tests Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>

bbejeck commented Dec 7, 2018

View reviewed changes

mjsax added the streams label Dec 9, 2018

vvcephei reviewed Jan 8, 2019

View reviewed changes

mjsax mentioned this pull request Jan 15, 2019

KAFKA-4850:RocksDb cannot use Bloom Filters #3048

Closed

bbejeck changed the title ~~MINOR: Enable bloomfilters~~ KAFKA-4850: Enable bloomfilters Jan 15, 2019

guozhangwang reviewed Jan 15, 2019

View reviewed changes

bbejeck force-pushed the MINOR_enable_bloom_filters branch from 1db37c0 to 75a9ea6 Compare January 16, 2019 15:29

vvcephei reviewed Jan 22, 2019

View reviewed changes

bbejeck added 7 commits January 22, 2019 10:39

Enable bloomfilters

fb3a2aa

Segment should not use bloom filter as it does range lookups

fc77ea8

MINOR: Enable bloom filter for all RocksDB instances

b7041c0

KAFKA-4850: Updates for comments, enable optimize for hits

725cdac

KAFKA-4850: Apply git diff file from tools for branch builder

d0ea3e3

KAFKA-4850: added test showing can toggle bloom filters on/off

56da867

KAFKA-4850: revert changes to base.sh for branch builder

846ea8d

bbejeck force-pushed the MINOR_enable_bloom_filters branch from 46bd465 to 846ea8d Compare January 22, 2019 15:43

vvcephei approved these changes Jan 22, 2019

View reviewed changes

guozhangwang approved these changes Jan 23, 2019

View reviewed changes

guozhangwang merged commit 0efed12 into apache:trunk Jan 24, 2019

bbejeck deleted the MINOR_enable_bloom_filters branch July 10, 2024 13:59

Conversation

bbejeck commented Dec 7, 2018

Committer Checklist (excluded from commit message)

Uh oh!

bbejeck commented Dec 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbejeck commented Dec 7, 2018

Uh oh!

vvcephei commented Dec 10, 2018

Uh oh!

bbejeck commented Dec 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbejeck Jan 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax commented Jan 14, 2019

Uh oh!

mjsax commented Jan 14, 2019

Uh oh!

bbejeck commented Jan 15, 2019

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbejeck commented Jan 16, 2019

Uh oh!

guozhangwang commented Jan 19, 2019

Uh oh!

bbejeck commented Jan 22, 2019

Uh oh!

bbejeck commented Jan 22, 2019

Uh oh!

bbejeck commented Jan 22, 2019

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bbejeck commented Dec 10, 2018 •

edited

Loading

bbejeck Jan 10, 2019 •

edited

Loading