KAFKA-4850: Enable bloomfilters#6012
Conversation
|
ping @guozhangwang, @mjsax, and @vvcephei for reviews |
| @SuppressWarnings("unchecked") | ||
| public void openDB(final ProcessorContext context) { | ||
| // initialize the default rocksdb options | ||
| protected TableFormatConfig getTableConfig() { |
There was a problem hiding this comment.
Refactor table config to method
| } | ||
|
|
||
| @Override | ||
| protected TableFormatConfig getTableConfig() { |
There was a problem hiding this comment.
For windowed stores, don't enable bloom filters
There was a problem hiding this comment.
Why? It is about range queries? Re-call that we convert range-queries into multiple point-lookups.
Furthermore, range queries could happen via IQ on key-value-stores, too.
There was a problem hiding this comment.
Why? It is about range queries? Re-call that we convert range-queries into multiple point-lookups.
I took a cursory look at the code, but you raise a good point. Overall I'm thinking maybe we need to see if Bloom-filters affect range queries and if not, maybe just enable them across the board.
|
kicked off both streams simple benchmark tests |
|
Thanks, @bbejeck ! Do you know if Rocks will automatically upgrade existing stores to add the bloom filter, or, if not, will it gracefully handle their absence? |
@vvcephei - I think so, but I'll add a test to an existing unit/integration test to confirm |
| final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig(); | ||
| tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE); | ||
| tableConfig.setBlockSize(BLOCK_SIZE); | ||
| tableConfig.setFilter(new BloomFilter()); |
There was a problem hiding this comment.
Hi @bbejeck,
Should we also consider options.optimizeFiltersForHits() to save memory on the bloom filter in exchange for one I/O for each get on a missing key?
(see https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks)
I just ran across this while reading about caching in Rocks.
There was a problem hiding this comment.
@vvcephei yeah that makes sense.
\cc @guozhangwang @mjsax WDYT?
There was a problem hiding this comment.
FWIW, here's what I'm thinking...
Bloom filters can only tell you if the key is definitely not in the set. They can only answer "no" or "maybe". So if the filter says the key isn't in some sst we don't have to actually do the I/O to check, but if it says "maybe", we do still have to check.
This optimization would add a bloom filter to every level in the SST hierarchy except the last level.
The rationale is that if you're pretty sure the keys you get are in the db, the bloom filters at the higher levels would let you skip querying the SSTs that don't contain your key. If you get all the way to the bottom level, we're pretty sure the key is there (via our prior assumption), so checking the bloom filter isn't that valuable, since it would rarely answer "no".
If it answers "maybe", we have to check anyway. In other words, the filter only saves I/O in the rare case that it does say "no".
On the other hand, the last level has the most keys in it, so those are the most expensive filters. By dropping that last level of filters, we save a bunch of memory in exchange for rare extra I/Os.
Do we have a prior assumption that the keys we query for are rarely missing? I think so...
In general, Streams only does a get while computing an aggregation value, etc. In this case, it does a get followed by a put. Therefore, the only get that might return missing is the very first one for each key.
Factors that would cause more missed gets would be stuff like:
- IQ
- data sets that have a lot of thrash in the key space
There was a problem hiding this comment.
Thanks for the more in-depth explanation @vvcephei sounds good to me.
There was a problem hiding this comment.
Thanks @vvcephei for the explanation. I agree with you that for most cases we expect multiple gets on each key (only the first get will miss), besides the ones you already listed that may not be the case another case is that windowed stream-stream join will always have distinct keys, but given that this may be fixed in the future I'm in favor of adding it as well.
|
@bbejeck Should this be targeted to https://issues.apache.org/jira/browse/KAFKA-4850 instead of being a "MINOR" change? |
|
Seems there is another similar PR: #3048 -- should we close the other PR after this one gets merged? \cc @guozhangwang |
@mjsax - done |
guozhangwang
left a comment
There was a problem hiding this comment.
Hi @bbejeck , I've got two meta comments:
-
Could you upload the benchmark results (the original url has expired) as images to the PR here just for the record of the motivation for other readers?
-
As discussed in the PR, I think it is better to enable
options.optimizeFiltersForHits()suggested by @vvcephei
| private static final long WRITE_BUFFER_SIZE = 16 * 1024 * 1024L; | ||
| private static final long BLOCK_CACHE_SIZE = 50 * 1024 * 1024L; | ||
| private static final long BLOCK_SIZE = 4096L; | ||
| protected static final long BLOCK_CACHE_SIZE = 50 * 1024 * 1024L; |
There was a problem hiding this comment.
Why these two need to be protected now?
There was a problem hiding this comment.
oversight from previous change, I'll revert
| final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig(); | ||
| tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE); | ||
| tableConfig.setBlockSize(BLOCK_SIZE); | ||
| tableConfig.setFilter(new BloomFilter()); |
There was a problem hiding this comment.
Thanks @vvcephei for the explanation. I agree with you that for most cases we expect multiple gets on each key (only the first get will miss), besides the ones you already listed that may not be the case another case is that windowed stream-stream join will always have distinct keys, but given that this may be fixed in the future I'm in favor of adding it as well.
1db37c0 to
75a9ea6
Compare
|
rebased with trunk and enabled |
|
Thanks @bbejeck ! Please ping the reviewers whenever you think it's ready for another look again. |
|
updated this with unit test showing saving records with bloom filters off, closing RocksDB then open again with bloom filters enabled, no errors and can retrieve previous records successfully. |
|
Here's performance numbers based on 5 run of
|
|
ping @ableegoldman, @guozhangwang, @mjsax, and @vvcephei for another review |
There was a problem hiding this comment.
Just curious, why did you need to update the jvm?
There was a problem hiding this comment.
branch builder was failing without this update it's from the tools team, but I meant to pull this commit out, I'll rebase.
46bd465 to
846ea8d
Compare
guozhangwang
left a comment
There was a problem hiding this comment.
LGTM. Thanks @bbejeck .
One interesting observation though is that for process-with-sink, which does not use state stores at all, the new code is consistently worse than current trunk... we need to look into a better way for consistent perf regression testing :)
* ak/trunk: MINOR: fix race condition in KafkaStreamsTest (apache#6185) KAFKA-4850: Enable bloomfilters (apache#6012) MINOR: ducker-ak: add down -f, avoid using a terminal in ducker test KAFKA-5117: Stop resolving externalized configs in Connect REST API MINOR: Cleanup handling of mixed transactional/idempotent records (apache#6172) KAFKA-7844: Use regular subproject for generator to fix *All targets (apache#6182) Fix Documentation for cleanup.policy is out of date (apache#6181) MINOR: increase timeouts for KafkaStreamsTest (apache#6178) MINOR: Rejoin split ssl principal mapping rules (apache#6099) MINOR: Handle case where connector status endpoints returns 404 (apache#6176) MINOR: Remove unused imports, exceptions, and values (apache#6117) KAFKA-3522: Add internal RecordConverter interface (apache#6150) Fix Javadoc of KafkaConsumer (apache#6155) KAFKA-6455: Extend CacheFlushListener to forward timestamp (apache#6147) MINOR: Log partition info when creating new request batch in controller (apache#6145) KAFKA-7652: Part I; Fix SessionStore's findSession(single-key) (apache#6134) MINOR: Remove the InvalidTopicException handling in InternalTopicManager (apache#6167) [KAFKA-7024] Rocksdb state directory should be created before opening the DB (apache#6138) MINOR:: Fix typos (apache#6079)
This PR enables BloomFilters for RocksDB to speed up point lookups. The request for this has been around for some time - https://issues.apache.org/jira/browse/KAFKA-4850 For testing, I've done the following Ran the standard streams suite of unit and integration tests Kicked off the simple benchmark test with bloom filters enabled Kicked off the simple benchmark test with bloom filters not enabled Kicked off streams system tests Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>
This PR enables BloomFilters for RocksDB to speed up point lookups.
The request for this has been around for some time - https://issues.apache.org/jira/browse/KAFKA-4850
For testing, I've done the following
Committer Checklist (excluded from commit message)