Skip to content

Move caffeine out of extension and make it the default cache implementation.#4810

Merged
drcrallen merged 11 commits intoapache:masterfrom
metamx:caffeineBase
Sep 22, 2017
Merged

Move caffeine out of extension and make it the default cache implementation.#4810
drcrallen merged 11 commits intoapache:masterfrom
metamx:caffeineBase

Conversation

@drcrallen
Copy link
Copy Markdown
Contributor

@drcrallen drcrallen commented Sep 15, 2017

The caffeine library itself is actually something that could be used outside of just the extension. This allows other things to pull in caffeine, as well as puts caffeine as a core cache in druid without the need for an extension. This also sets Caffeine as the default cache and deprecates local cache.

The setting for druid.cache.sizeInBytes should transparently carry over to the new cache, but the other two settings for local cache do not have analogs in the caffeine cache.

@ben-manes
Copy link
Copy Markdown

I think that newer Caffeine versions are better for the JRE bug, but definitely it is not ideal. Previously the write buffer was unbounded, so when investigating we switched to being bounded (planned, but hadn't been a priority). Then writers spun waiting for the state condition, assuming the task was executing or queue'd to run, and a small busy wait would be good enough. But a full executor, lost task, write storms, etc. made that a poor idea.

Now the writers help out by busy waiting and then falling back to perform the maintenance work (acquire the lock). This improves write throughput in a synthetic load by descheduling threads and lets progress occur if the executor has been swamped (so our task can't get run). For lost tasks before 8u60, it gets reset by the writer doing the work. So this solves the problem, but means the write buffer has to be full before the back pressure kicks in. That could be good enough, but does mean the cache would exceed capacity as write notifications queue'd up and delayed eviction.

@drcrallen
Copy link
Copy Markdown
Contributor Author

Thank you @ben-manes ! Under write buffers being full I would be surprised if this performed worse than the synchronized map implementation the current default uses. So if fixes in the latest prevent the lost tasks issue from persisting forever, and an upgrade in java version is a "performance boost", then I'm good with making it default and will update the PR with the latest and make caffeine the default (pending other community comments).

@drcrallen drcrallen changed the title Move caffeine out of extension. Move caffeine out of extension and make it the default cache implementation. Sep 15, 2017
@ben-manes
Copy link
Copy Markdown

Yes, it would definitely be faster. The full buffer would only mean that the next few writes would be slightly penalized. The maintenance is fast and is used to batch the work, so the buffer is drained letting writes queue up again. The idea being that a cheap buffer write and letting the caller return is faster than blocking on a lock (just like how a write-ahead log speeds db performance).

On the read side, its buffer is lossy and assumes the task will get run properly. It discards new read events in favor of concurrency, since by the power law most frequent would probably get enqueued already. That means only when the write buffer is forced to be processed would the read buffer be, which would have dropped most events. Then the hit rate is diminished since fewer events were recorded & replied against eviction policy, so worse decisions might be made.

So the worst cases are pretty acceptable. They are a tad annoying, but easy to rectify by using an updated JRE.

@drcrallen drcrallen closed this Sep 16, 2017
@drcrallen drcrallen reopened this Sep 16, 2017
Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of the change (fwiw our distro at Imply already sets caffeine as the cache in the template configs that we ship).

@drcrallen are you wanting to get this in for 0.11?

Comment thread docs/content/configuration/caching.md Outdated

#### Local Cache

(*DEPRECATED: Use caffeine instead*)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest this instead.

<div class="note caution">
some nice note
</div>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread docs/content/configuration/caching.md Outdated

### Caffeine Cache

A highly performant local cache implementation for Druid based on [Caffeine](https://github.com/ben-manes/caffeine). Requires a JRE8u60 or higher
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires a JRE8u60 or higher

Does it really, or is that just if you use COMMON_FJP?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for COMMON_FJP

Comment thread docs/content/configuration/caching.md Outdated

Here are the possible values for `druid.cache.cacheExecutorFactory`, which controls how maintenance tasks are run

* `COMMON_FJP` (default) use the common ForkJoinPool. Do NOT use this option unless you are running 8u60 or higher
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with COMMON_FJP + 8u60? Will you get crashes? Slow behavior? (the docs should say)

And would caffeine with one of the other cacheExecutorFactorys work worse or better than local? (would be great if the docs say this too, if it's something we know)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not something that is known for sure. Added a bit of docs about the suggested impact.

Comment thread server/pom.xml Outdated
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>2.3.1</version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version should not be specified here, it should come in through dependencyManagement from the main pom.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

import com.google.inject.Provider;

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = LocalCacheProvider.class)
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = CaffeineCacheProvider.class)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please edit the first comment of this PR to indicate that you do want to change the default? This threw me for a loop since it said you weren't going to, and I hadn't read the conversation between you and Ben yet where you changed your mind.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

|`query/cache/caffeine/*/requests`|Count of hits or misses|hit + miss|
|`query/cache/caffeine/*/loadTime`|Length of time caffeine spends loading new values (unused feature)|0|
|`query/cache/caffeine/*/evictionBytes`|Size in bytes that have been evicted from the cache|Varies, should tune cache `sizeInBytes` so that `sizeInBytes`/`evictionBytes` is approximately the rate of cache churn you desire|
[Moved](../../configuration/caching.html) No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add this to _redirects.json instead and get an browser redirect, so people don't have to click the link.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@drcrallen
Copy link
Copy Markdown
Contributor Author

@gianm It can go out in 0.11.0 or later depending on when it gets merged, I'm not in a hurry since this is a maintenance patch rather than a code change or bug fix.

@himanshug
Copy link
Copy Markdown
Contributor

👍

@drcrallen drcrallen added this to the 0.11.1 milestone Sep 21, 2017
Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, suggested a minor doc change.

#### Local Cache

<div class="note caution">
DEPRECATED: Use caffeine instead
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please expand on this a little bit? Imagine a reader that doesn't know why it's deprecated, or when it will be removed, how urgent the switch is, and so on. Maybe something like:

The local cache is deprecated in favor of the Caffeine cache, and may be removed in a future version of Druid. The Caffeine cache affords better performance and control over eviction behavior, and is recommended in any situation where you are using JRE 8u60 or higher.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

The local cache is deprecated in favor of the Caffeine cache, and may be removed in a future version of Druid. The Caffeine cache affords significantly better performance and control over eviction behavior compared to local cache, and is recommended in any situation where you are using JRE 8u60 or higher.

@drcrallen drcrallen merged commit a6470c1 into apache:master Sep 22, 2017
@drcrallen drcrallen deleted the caffeineBase branch September 22, 2017 17:47
gianm pushed a commit to implydata/druid-public that referenced this pull request Oct 30, 2017
…tation. (apache#4810)

* Move caffeine out of extension.

* Remove `JsonTypeName` from the class itself

* Fix bad docs

* Fix distribution pom

* Fix unused import

* Make caffeine default

* Address code comments

* Add more description around the jre version in the readme

* Add suggested comments
gianm pushed a commit to implydata/druid-public that referenced this pull request Nov 14, 2017
…tation. (apache#4810)

* Move caffeine out of extension.

* Remove `JsonTypeName` from the class itself

* Fix bad docs

* Fix distribution pom

* Fix unused import

* Make caffeine default

* Address code comments

* Add more description around the jre version in the readme

* Add suggested comments
gianm pushed a commit to implydata/druid-public that referenced this pull request Dec 5, 2017
…tation. (apache#4810)

* Move caffeine out of extension.

* Remove `JsonTypeName` from the class itself

* Fix bad docs

* Fix distribution pom

* Fix unused import

* Make caffeine default

* Address code comments

* Add more description around the jre version in the readme

* Add suggested comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants