Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion distribution/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
<classpath/>
<argument>-Ddruid.extensions.loadList=[]</argument>
<argument>-Ddruid.extensions.directory=${project.build.directory}/extensions</argument>
<argument>-Ddruid.extensions.hadoopDependenciesDir=${project.build.directory}/hadoop_dependencies</argument>
<argument>-Ddruid.extensions.hadoopDependenciesDir=${project.build.directory}/hadoop-dependencies</argument>
<argument>io.druid.cli.Main</argument>
<argument>tools</argument>
<argument>pull-deps</argument>
Expand Down
119 changes: 93 additions & 26 deletions distribution/src/assembly/assembly.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,85 +34,152 @@
</excludes>
<outputDirectory>extensions</outputDirectory>
</fileSet>

<fileSet>
<directory>${project.build.directory}/hadoop_dependencies</directory>
<directory>${project.build.directory}/hadoop-dependencies</directory>
<includes>
<include>*/*/*</include>
</includes>
<outputDirectory>hadoop_dependencies</outputDirectory>
<outputDirectory>hadoop-dependencies</outputDirectory>
</fileSet>

<fileSet>
<directory>../examples/config</directory>
<directory>../examples/quickstart/</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config</outputDirectory>
<outputDirectory>quickstart</outputDirectory>
</fileSet>

<fileSet>
<directory>../examples/config/_common</directory>
<directory>../examples/conf-quickstart</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config/_common</outputDirectory>
<outputDirectory>conf-quickstart</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/config/broker</directory>
<directory>../examples/conf-quickstart/druid</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config/broker</outputDirectory>
<outputDirectory>conf-quickstart/druid</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/config/coordinator</directory>
<directory>../examples/conf-quickstart/druid/_common</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config/coordinator</outputDirectory>
<outputDirectory>conf-quickstart/druid/_common/</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/config/realtime</directory>
<directory>../examples/conf-quickstart/druid/broker</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config/realtime</outputDirectory>
<outputDirectory>conf-quickstart/druid/broker</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/config/historical</directory>
<directory>../examples/conf-quickstart/druid/coordinator</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config/historical</outputDirectory>
<outputDirectory>conf-quickstart/druid/coordinator</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/config/overlord</directory>
<directory>../examples/conf-quickstart/druid/historical</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>config/overlord</outputDirectory>
<outputDirectory>conf-quickstart/druid/historical</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/bin</directory>
<directory>../examples/conf-quickstart/druid/overlord</directory>
<includes>
<include>*sh</include>
<include>*</include>
</includes>
<fileMode>744</fileMode>
<outputDirectory>/</outputDirectory>
<outputDirectory>conf-quickstart/druid/overlord</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf-quickstart/druid/middleManager</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf-quickstart/druid/middleManager</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf-quickstart/tranquility</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf-quickstart/tranquility</outputDirectory>
</fileSet>

<fileSet>
<directory>../examples/conf</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf/druid/_common</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf/druid/_common</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf/druid/broker</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf/druid/broker</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf/druid/coordinator</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf/druid/coordinator</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf/druid/historical</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf/druid/historical</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/bin/examples</directory>
<directory>../examples/conf/druid/overlord</directory>
<includes>
<include>**</include>
<include>*</include>
</includes>
<outputDirectory>conf/druid/overlord</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf/druid/middleManager</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>conf/druid/middleManager</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/conf/tranquility</directory>
<includes>
<include>*</include>
</includes>
<outputDirectory>examples</outputDirectory>
<outputDirectory>conf/tranquility</outputDirectory>
</fileSet>
<fileSet>
<directory>../examples/bin/examples/twitter</directory>
<directory>../examples/bin</directory>
<includes>
<include>*sh</include>
<include>*</include>
</includes>
<fileMode>744</fileMode>
<outputDirectory>examples/twitter</outputDirectory>
<outputDirectory>bin</outputDirectory>
</fileSet>

<fileSet>
<directory>../</directory>
<includes>
Expand Down
2 changes: 1 addition & 1 deletion docs/content/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Many of Druid's external dependencies can be plugged in as modules. Extensions c
|Property|Description|Default|
|--------|-----------|-------|
|`druid.extensions.directory`|The root extension directory where user can put extensions related files. Druid will load extensions stored under this directory.|`extensions` (This is a relative path to Druid's working directory)|
|`druid.extensions.hadoopDependenciesDir`|The root hadoop dependencies directory where user can put hadoop related dependencies files. Druid will load the dependencies based on the hadoop coordinate specified in the hadoop index task.|`hadoop_dependencies` (This is a relative path to Druid's working directory|
|`druid.extensions.hadoopDependenciesDir`|The root hadoop dependencies directory where user can put hadoop related dependencies files. Druid will load the dependencies based on the hadoop coordinate specified in the hadoop index task.|`hadoop-dependencies` (This is a relative path to Druid's working directory|
|`druid.extensions.loadList`|A JSON array of extensions to load from extension directories by Druid. If it is not specified, its value will be `null` and Druid will load all the extensions under `druid.extensions.directory`. If its value is empty list `[]`, then no extensions will be loaded at all.|null|
|`druid.extensions.searchCurrentClassloader`|This is a boolean flag that determines if Druid will search the main classloader for extensions. It defaults to true but can be turned off if you have reason to not automatically add all modules on the classpath.|true|

Expand Down
35 changes: 27 additions & 8 deletions docs/content/configuration/production-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,40 @@ layout: doc_page
Production Cluster Configuration
================================

__This configuration is an example of what a production cluster could look like. Many other hardware combinations are possible! Cheaper hardware is absolutely possible.__
```note-info
This configuration is an example of what a production cluster could look like. Many other hardware combinations are
possible! Cheaper hardware is absolutely possible.
```

This production Druid cluster assumes that metadata storage and Zookeeper are already set up. The deep storage that is used for examples is S3 and memcached is used as a distributed cache.
This production Druid cluster assumes that metadata storage and Zookeeper are already set up. The deep storage that is
used for examples is [S3](https://aws.amazon.com/s3/) and [memcached](http://memcached.org/) is used for a distributed cache.

The nodes that respond to queries (Historical, Broker, and Middle manager nodes) will use as many cores as are available, depending on usage, so it is best to keep these on dedicated machines. The upper limit of effectively utilized cores is not well characterized yet and would depend on types of queries, query load, and the schema. Historical daemons should have a heap a size of at least 1GB per core for normal usage, but could be squeezed into a smaller heap for testing. Since in-memory caching is essential for good performance, even more RAM is better. Broker nodes will use RAM for caching, so they do more than just route queries. SSDs are highly recommended for Historical nodes not all data is loaded in available memory.
```note-info
The nodes in this example do not need to be on their own individual servers. Overlord and Coordinator nodes should be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be -> can be ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should, there's no reason not to

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this line gives an impression that they "have" to be on same machine but theoretically they can be on separate machines...anyways I am OK with it, there is no need to change it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather recommend they be in the same server always because I want them to be the same process long term :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that's why :)

co-located on the same hardware.
```

The nodes that are responsible for coordination (Coordinator and Overlord nodes) require much less processing.
The nodes that respond to queries (Historical, Broker, and MiddleManager nodes) will use as many cores as are available,
depending on usage, so it is best to keep these on dedicated machines. The upper limit of effectively utilized cores is
not well characterized yet and would depend on types of queries, query load, and the schema. Historical daemons should
have a heap size of at least 1GB per core for normal usage, but could be squeezed into a smaller heap for testing.
Since in-memory caching is essential for good performance, even more RAM is better.
Broker nodes will use RAM for caching, so they do more than just route queries.
SSDs are highly recommended for Historical nodes when all they have more segments loaded than available memory.

The effective utilization of cores by Zookeeper, metadata storage, and Coordinator nodes is likely to be between 1 and 2 for each process/daemon, so these could potentially share a machine with lots of cores. These daemons work with heap a size between 500MB and 1GB.
The nodes that are responsible for coordination (Coordinator and Overlord nodes) require much less processing.

We'll use r3.8xlarge nodes for query facing nodes and m1.xlarge nodes for coordination nodes. The following examples work relatively well in production, however, a more optimized tuning for the nodes we selected and more optimal hardware for a Druid cluster are both definitely possible.
The effective utilization of cores by Zookeeper, metadata storage, and Coordinator nodes is likely to be between 1 and 2
for each process/daemon, so these could potentially share a machine with lots of cores. These daemons work with heap
size between 500MB and 1GB.

For general purposes of high availability, there should be at least 2 of every node type.
We'll use [EC2](https://aws.amazon.com/ec2/) r3.8xlarge nodes for query facing nodes and m1.xlarge nodes for coordination nodes.
The following examples work relatively well in production, however, a more optimized tuning for the nodes we selected and
more optimal hardware for a Druid cluster are both definitely possible.

To setup a local Druid cluster, see [Simple Cluster Configuration](../configuration/simple-cluster.html).
```note-caution
For high availability, there should be at least a redundant copy of every process running on separate hardware.
```

### Common Configuration (common.runtime.properties)

Expand Down
7 changes: 7 additions & 0 deletions docs/content/configuration/realtime.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@

---
layout: doc_page
---

```note-caution
If you are doing stream-pull based ingestion, we suggest using [stream-pushed](../ingestion/stream-push.html) based ingestion instead and not
using real-time nodes.
```

Realtime Node Configuration
==============================
For general Realtime Node information, see [here](../design/realtime.html).
Expand Down
115 changes: 0 additions & 115 deletions docs/content/configuration/simple-cluster.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/content/design/coordinator.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ Disables a datasource.

* `/druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myISO8601Interval}>`

Runs a [Kill task](../misc/tasks.html) for a given interval and datasource.
Runs a [Kill task](../ingestion/tasks.html) for a given interval and datasource.

* `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`

Expand Down
Loading