Skip to content

use a sha512 hash of bloom filter for cache key instead of filter bytes#6488

Merged
b-slim merged 7 commits intoapache:masterfrom
clintropolis:bloom-cache-key-fix
Oct 22, 2018
Merged

use a sha512 hash of bloom filter for cache key instead of filter bytes#6488
b-slim merged 7 commits intoapache:masterfrom
clintropolis:bloom-cache-key-fix

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Oct 18, 2018

BloomKFilter objects with large maxNumEntries can be sized in the tens to hundreds of megabytes. Requests this size already put significant pressure on the heap, but having giant cache keys is brutal.

This PR changes BloomDimFilter to deserialize the bloomKFilter property into a BloomKFilterHolder in order to both deserialize the BloomKFilter and compute a sha512 hash from the raw filter bytes to be used for a cache key.

For an example using a 10 million entry bloom filter, cache key size in bytes shrinks from 7794075 bytes to 86 bytes.

Additionally, this PR changes BloomDimFilter.toString and BloomDimFilter.equals to use the hash value instead of the BloomKFilter, which does not have it's own overrides for equals and whose toString only prints the size of the filter.

We may want to consider modifying CacheKeyBuilder to always compute a hash for a cache key if over a certain threshold, but not in this PR, which I think will still want to deserialize the value in this manner so as to not have to re-serialize in order to create the cache key.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private maybe?

Copy link
Copy Markdown
Contributor

@b-slim b-slim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good test name

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the two comments.

@b-slim b-slim merged commit e83cc22 into apache:master Oct 22, 2018
@clintropolis clintropolis deleted the bloom-cache-key-fix branch October 22, 2018 19:27
@dclim dclim added this to the 0.13.0 milestone Oct 31, 2018
@jon-wei jon-wei removed their assignment Nov 5, 2018
@dclim
Copy link
Copy Markdown
Contributor

dclim commented Nov 13, 2018

@clintropolis could you do a backport to 0.13.0 for this?

clintropolis added a commit to clintropolis/druid that referenced this pull request Nov 13, 2018
…es (apache#6488)

* use a sha512 hash of bloom filter for cache key instead of filter bytes

* make serde private, BloomDimFilter.toString and BloomDimFilter.equals use hash instead of bloomKFilter which has no tostring or equals of its own

* keep and use HashCode object instead of converting to bytes up front

* uneeded imports oops

* tweaks from review

* refactor dupe code

* refactor
dclim pushed a commit that referenced this pull request Nov 13, 2018
…es (#6488) (#6618)

* use a sha512 hash of bloom filter for cache key instead of filter bytes

* make serde private, BloomDimFilter.toString and BloomDimFilter.equals use hash instead of bloomKFilter which has no tostring or equals of its own

* keep and use HashCode object instead of converting to bytes up front

* uneeded imports oops

* tweaks from review

* refactor dupe code

* refactor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants