use a sha512 hash of bloom filter for cache key instead of filter bytes#6488
Merged
b-slim merged 7 commits intoapache:masterfrom Oct 22, 2018
Merged
use a sha512 hash of bloom filter for cache key instead of filter bytes#6488b-slim merged 7 commits intoapache:masterfrom
b-slim merged 7 commits intoapache:masterfrom
Conversation
b-slim
reviewed
Oct 19, 2018
jon-wei
reviewed
Oct 19, 2018
jon-wei
approved these changes
Oct 19, 2018
gianm
reviewed
Oct 20, 2018
… use hash instead of bloomKFilter which has no tostring or equals of its own
84341e1 to
5a4d10e
Compare
Contributor
|
@clintropolis could you do a backport to 0.13.0 for this? |
clintropolis
added a commit
to clintropolis/druid
that referenced
this pull request
Nov 13, 2018
…es (apache#6488) * use a sha512 hash of bloom filter for cache key instead of filter bytes * make serde private, BloomDimFilter.toString and BloomDimFilter.equals use hash instead of bloomKFilter which has no tostring or equals of its own * keep and use HashCode object instead of converting to bytes up front * uneeded imports oops * tweaks from review * refactor dupe code * refactor
dclim
pushed a commit
that referenced
this pull request
Nov 13, 2018
…es (#6488) (#6618) * use a sha512 hash of bloom filter for cache key instead of filter bytes * make serde private, BloomDimFilter.toString and BloomDimFilter.equals use hash instead of bloomKFilter which has no tostring or equals of its own * keep and use HashCode object instead of converting to bytes up front * uneeded imports oops * tweaks from review * refactor dupe code * refactor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BloomKFilterobjects with largemaxNumEntriescan be sized in the tens to hundreds of megabytes. Requests this size already put significant pressure on the heap, but having giant cache keys is brutal.This PR changes
BloomDimFilterto deserialize thebloomKFilterproperty into aBloomKFilterHolderin order to both deserialize theBloomKFilterand compute asha512hash from the raw filter bytes to be used for a cache key.For an example using a 10 million entry bloom filter, cache key size in bytes shrinks from 7794075 bytes to 86 bytes.
Additionally, this PR changes
BloomDimFilter.toStringandBloomDimFilter.equalsto use thehashvalue instead of theBloomKFilter, which does not have it's own overrides forequalsand whosetoStringonly prints the size of the filter.We may want to consider modifying
CacheKeyBuilderto always compute a hash for a cache key if over a certain threshold, but not in this PR, which I think will still want to deserialize the value in this manner so as to not have to re-serialize in order to create the cache key.