apache · gianm · Jan 31, 2019 · Jan 18, 2019 · Jan 28, 2019 · Jan 29, 2019
diff --git a/docs/content/comparisons/druid-vs-redshift.md b/docs/content/comparisons/druid-vs-redshift.md
@@ -42,7 +42,7 @@ Druid’s write semantics are not as fluid and does not support full joins (we s
 
 ### Data distribution model
 
-Druid’s data distribution is segment-based and leverages a highly available "deep" storage such as S3 or HDFS. Scaling up (or down) does not require massive copy actions or downtime; in fact, losing any number of historical nodes does not result in data loss because new historical nodes can always be brought up by reading data from "deep" storage.
+Druid’s data distribution is segment-based and leverages a highly available "deep" storage such as S3 or HDFS. Scaling up (or down) does not require massive copy actions or downtime; in fact, losing any number of Historical nodes does not result in data loss because new Historical nodes can always be brought up by reading data from "deep" storage.
 
 To contrast, ParAccel’s data distribution model is hash-based. Expanding the cluster requires re-hashing the data across the nodes, making it difficult to perform without taking downtime. Amazon’s Redshift works around this issue with a multi-step process:
 

diff --git a/docs/content/configuration/index.md b/docs/content/configuration/index.md
diff --git a/docs/content/configuration/realtime.md b/docs/content/configuration/realtime.md
@@ -61,7 +61,7 @@ The realtime node uses several of the global configs in [Configuration](../confi
 |Property|Description|Default|
 |--------|-----------|-------|
 |`druid.processing.buffer.sizeBytes`|This specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed.|auto (max 1GB)|
-|`druid.processing.formatString`|Realtime and historical nodes use this format string to name their processing threads.|processing-%s|
+|`druid.processing.formatString`|Realtime and Historical nodes use this format string to name their processing threads.|processing-%s|
 |`druid.processing.numMergeBuffers`|The number of direct memory buffers available for merging query results. The buffers are sized by `druid.processing.buffer.sizeBytes`. This property is effectively a concurrency limit for queries that require merging buffers. If you are using any queries that require merge buffers (currently, just groupBy v2) then you should have at least two of these.|`max(2, druid.processing.numThreads / 4)`|
 |`druid.processing.numThreads`|The number of processing threads to have available for parallel processing of segments. Our rule of thumb is `num_cores - 1`, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments. If only one core is available, this property defaults to the value `1`.|Number of cores - 1 (or 1)|
 |`druid.processing.columnCache.sizeBytes`|Maximum size in bytes for the dimension value lookup cache. Any value greater than `0` enables the cache. It is currently disabled by default. Enabling the lookup cache can significantly improve the performance of aggregators operating on dimension values, such as the JavaScript aggregator, or cardinality aggregator, but can slow things down if the cache hit rate is low (i.e. dimensions with few repeating values). Enabling it may also require additional garbage collection tuning to avoid long GC pauses.|`0` (disabled)|

diff --git a/docs/content/dependencies/cassandra-deep-storage.md b/docs/content/dependencies/cassandra-deep-storage.md
@@ -29,7 +29,7 @@ title: "Cassandra Deep Storage"
 Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
 `index_storage` and `descriptor_storage`.  Underneath the hood, the Cassandra integration leverages Astyanax.  The
 index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains
-compressed segments for distribution to historical nodes.  Since segments can be large, the Chunked Object storage allows the integration to multi-thread
+compressed segments for distribution to Historical nodes.  Since segments can be large, the Chunked Object storage allows the integration to multi-thread
 the write to Cassandra, and spreads the data across all the nodes in a cluster.  The descriptor storage table is a normal C* table that
 stores the segment metadatak.
 
@@ -52,7 +52,7 @@ CREATE TABLE descriptor_storage(key varchar,
 First create the schema above. I use a new keyspace called `druid` for this purpose, which can be created using the
 [Cassandra CQL `CREATE KEYSPACE`](http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/create_keyspace_r.html) command.
 
-Then, add the following to your historical and realtime runtime properties files to enable a Cassandra backend.
+Then, add the following to your Historical and realtime runtime properties files to enable a Cassandra backend.
 
 ```properties
 druid.extensions.loadList=["druid-cassandra-storage"]

diff --git a/docs/content/dependencies/metadata-storage.md b/docs/content/dependencies/metadata-storage.md
@@ -122,8 +122,7 @@ parameters across the cluster at runtime.
 
 ### Task-related Tables
 
-There are also a number of tables created and used by the [Indexing
-Service](../design/indexing-service.html) in the course of its work.
+There are also a number of tables created and used by the [Overlord](../design/overlord.html) and [MiddleManager](../design/middlemanager.html) when managing tasks.
 
 ### Audit Table
 

diff --git a/docs/content/dependencies/zookeeper.md b/docs/content/dependencies/zookeeper.md
@@ -29,8 +29,8 @@ Druid uses [ZooKeeper](http://zookeeper.apache.org/) (ZK) for management of curr
 1.  [Coordinator](../design/coordinator.html) leader election
 2.  Segment "publishing" protocol from [Historical](../design/historical.html) and [Realtime](../design/realtime.html)
 3.  Segment load/drop protocol between [Coordinator](../design/coordinator.html) and [Historical](../design/historical.html)
-4.  [Overlord](../design/indexing-service.html) leader election
-5.  [Indexing Service](../design/indexing-service.html) task management
+4.  [Overlord](../design/overlord.html) leader election
+5.  [Overlord](../design/overlord.html) and [MiddleManager](../design/middlemanager.html) task management
 
 ### Coordinator Leader Election
 
@@ -74,4 +74,4 @@ When the [Coordinator](../design/coordinator.html) decides that a [Historical](.
 ${druid.zk.paths.loadQueuePath}/_host_of_historical_node/_segment_identifier
 ```
 
-This node will contain a payload that indicates to the historical node what it should do with the given segment. When the historical node is done with the work, it will delete the znode in order to signify to the Coordinator that it is complete.
+This node will contain a payload that indicates to the Historical node what it should do with the given segment. When the Historical node is done with the work, it will delete the znode in order to signify to the Coordinator that it is complete.
diff --git a/docs/content/design/auth.md b/docs/content/design/auth.md
@@ -80,7 +80,7 @@ druid.auth.authenticator.anonymous.authorizerName=myBasicAuthorizer
 ```
 
 ## Escalator
-The `druid.escalator.type` property determines what authentication scheme should be used for internal Druid cluster communications (such as when a broker node communicates with historical nodes for query processing).
+The `druid.escalator.type` property determines what authentication scheme should be used for internal Druid cluster communications (such as when a Broker process communicates with Historical processes for query processing).
 
 The Escalator chosen for this property must use an authentication scheme that is supported by an Authenticator in `druid.auth.authenticationChain`. Authenticator extension implementors must also provide a corresponding Escalator implementation if they intend to use a particular authentication scheme for internal Druid communications.
 

diff --git a/docs/content/design/broker.md b/docs/content/design/broker.md
@@ -47,9 +47,9 @@ org.apache.druid.cli.Main server broker
 
 Most druid queries contain an interval object that indicates a span of time for which data is requested. Likewise, Druid [Segments](../design/segments.html) are partitioned to contain data for some interval of time and segments are distributed across a cluster. Consider a simple datasource with 7 segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple nodes, and hence, the query will likely hit multiple nodes.
 
-To determine which nodes to forward queries to, the Broker node first builds a view of the world from information in Zookeeper. Zookeeper maintains information about [Historical](../design/historical.html) and [Realtime](../design/realtime.html) nodes and the segments they are serving. For every datasource in Zookeeper, the Broker node builds a timeline of segments and the nodes that serve them. When queries are received for a specific datasource and interval, the Broker node performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the nodes that contain data for the query. The Broker node then forwards down the query to the selected nodes.
+To determine which nodes to forward queries to, the Broker node first builds a view of the world from information in Zookeeper. Zookeeper maintains information about [Historical](../design/historical.html) and [Realtime](../design/realtime.html) nodes and the segments they are serving. For every datasource in Zookeeper, the Broker node builds a timeline of segments and the nodes that serve them. When queries are received for a specific datasource and interval, the Broker node performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the nodes that contain data for the query. The Broker process then forwards down the query to the selected nodes.
 
 ### Caching
 
-Broker nodes employ a cache with a LRU cache invalidation strategy. The broker cache stores per-segment results. The cache can be local to each broker node or shared across multiple nodes using an external distributed cache such as [memcached](http://memcached.org/). Each time a broker node receives a query, it ﬁrst maps the query to a set of segments. A subset of these segment results may already exist in the cache and the results can be directly pulled from the cache. For any segment results that do not exist in the cache, the broker node will forward the query to the
-historical nodes. Once the historical nodes return their results, the broker will store those results in the cache. Real-time segments are never cached and hence requests for real-time data will always be forwarded to real-time nodes. Real-time data is perpetually changing and caching the results would be unreliable.
+Broker nodes employ a cache with a LRU cache invalidation strategy. The Broker cache stores per-segment results. The cache can be local to each Broker process or shared across multiple processes using an external distributed cache such as [memcached](http://memcached.org/). Each time a broker node receives a query, it ﬁrst maps the query to a set of segments. A subset of these segment results may already exist in the cache and the results can be directly pulled from the cache. For any segment results that do not exist in the cache, the broker node will forward the query to the
+Historical nodes. Once the Historical processes return their results, the Broker will store those results in the cache. Real-time segments are never cached and hence requests for real-time data will always be forwarded to real-time nodes. Real-time data is perpetually changing and caching the results would be unreliable.