apache · fjy · Mar 23, 2016 · Mar 22, 2016
diff --git a/docs/content/dependencies/deep-storage.md b/docs/content/dependencies/deep-storage.md
@@ -1,7 +1,9 @@
 ---
 layout: doc_page
 ---
+
 # Deep Storage
+
 Deep storage is where segments are stored.  It is a storage mechanism that Druid does not provide.  This deep storage infrastructure defines the level of durability of your data, as long as Druid nodes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose.  If segments disappear from this storage layer, then you will lose whatever data those segments represented.
 
 ## Local Mount
@@ -21,24 +23,12 @@ If you are using the Hadoop indexer in local mode, then just give it a local fil
 
 ## S3-compatible
 
-S3-compatible deep storage is basically either S3 or something like Google Storage which exposes the same API as S3.
-
-S3 configuration parameters are
-
-|Property|Possible Values|Description|Default|
-|--------|---------------|-----------|-------|
-|`druid.s3.accessKey`||S3 access key.|Must be set.|
-|`druid.s3.secretKey`||S3 secret key.|Must be set.|
-|`druid.storage.bucket`||Bucket to store in.|Must be set.|
-|`druid.storage.baseKey`||Base key prefix to use, i.e. what directory.|Must be set.|
+See [druid-s3-extensions extension documentation](../development/extensions-core/s3.html).
 
 ## HDFS
 
-In order to use hdfs for deep storage, you need to set the following configuration in your common configs.
+See [druid-hdfs-storage extension documentation](../development/extensions-core/hdfs.html).
 
-|Property|Possible Values|Description|Default|
-|--------|---------------|-----------|-------|
-|`druid.storage.type`|hdfs||Must be set.|
-|`druid.storage.storageDirectory`||Directory for storing segments.|Must be set.|
+## Additional Deep Stores
 
-If you are using the Hadoop indexer, set your output directory to be a location on Hadoop and it will work
+For additional deep stores, please see our [extensions list](../development/extensions.html).
diff --git a/docs/content/dependencies/metadata-storage.md b/docs/content/dependencies/metadata-storage.md
@@ -1,128 +1,19 @@
 ---
 layout: doc_page
 ---
+
 # Metadata Storage
 
 The Metadata Storage is an external dependency of Druid. Druid uses it to store
 various metadata about the system, but not to store the actual data. There are
 a number of tables used for various purposes described below.
 
-## Supported Metadata Storages
-
-The following metadata storage engines are supported:
-
-* Derby (default, but not suitable for production)
-* MySQL
-* PostgreSQL
-
-Even though Derby is the default, it works only if you have all Druid
-processes running on the same host, and should be used only for experimentation.
-For production, MySQL or PostgreSQL should be used.
-
-To choose the metadata storage type, set `druid.metadata.storage.type` to
-`mysql`, `postgres` or `derby`.
-Set other `druid.metadata.storage` configuration
-keywords as shown below to give Druid information about how to connect to
-the database.
-
-As discussed in [Including Extensions](../operations/including-extensions.html),
-there are two ways for giving Druid the extension files it needs for the
-database you are using.
-The first is to put the extension files in the classpath.  The second is to
-put the extension files in a subdirectory of
-`druid.extensions.directory` (by default `extensions` under the Druid working directory) and list the subdirectory name in
-`druid.extensions.loadList`.  The example properties below show the second
-way.
-
-## Setting up MySQL
-
-1. Install MySQL
-
-  Use your favorite package manager to install mysql, e.g.:
-  - on Ubuntu/Debian using apt `apt-get install mysql-server`
-  - on OS X, using [Homebrew](http://brew.sh/) `brew install mysql`
-
-  Alternatively, download and follow installation instructions for MySQL
-  Community Server here:
-  [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/)
-
-2. Create a druid database and user
-
-  Connect to MySQL from the machine where it is installed.
-
-  ```bash
-  > mysql -u root
-  ```
-
-  Paste the following snippet into the mysql prompt:
-
-  ```sql
-  -- create a druid database, make sure to use utf8 as encoding
-  CREATE DATABASE druid DEFAULT CHARACTER SET utf8;
-
-  -- create a druid user, and grant it all permission on the database we just created
-  GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';
-  ```
-
-3. Configure your Druid metadata storage extension:
-
-  Add the following parameters to your Druid configuration, replacing `<host>`
-  with the location (host name and port) of the database.
+Derby is the default metadata store for Druid, however, it is not suitable for production. 
+[MySQL](../development/extensions-core/mysql.html) and [PostgreSQL](../development/extensions-core/postgresql.html) are more production suitable metadata stores.
 
-  ```properties
-  druid.extensions.loadList=["mysql-metadata-storage"]
-  druid.metadata.storage.type=mysql
-  druid.metadata.storage.connector.connectURI=jdbc:mysql://<host>/druid
-  druid.metadata.storage.connector.user=druid
-  druid.metadata.storage.connector.password=diurd
-  ```
-
-  Note: the metadata storage extension is not packaged within the main Druid tarball; it is
-  packaged in a separate tarball that can be downloaded from [here](http://druid.io/downloads.html).
-  You can also get it using [pull-deps](../pull-deps.html), or you can build
-  it from source code; see [Build from Source](../development/build.html).
-
-## Setting up PostgreSQL
-
-1. Install PostgreSQL
-
-  Use your favorite package manager to install PostgreSQL, e.g.:
-  - on Ubuntu/Debian using apt `apt-get install postgresql`
-  - on OS X, using [Homebrew](http://brew.sh/) `brew install postgresql`
-
-2. Create a druid database and user
-
-  On the machine where PostgreSQL is installed, using an account with proper
-  postgresql permissions:
-
-  Create a druid user, enter `diurd` when prompted for the password.
-
-  ```bash
-  createuser druid -P
-  ```
-
-  Create a druid database owned by the user we just created
-
-  ```bash
-  createdb druid -O druid
-  ```
-
-  *Note:* On Ubuntu / Debian you may have to prefix the `createuser` and
-  `createdb` commands with `sudo -u postgres` in order to gain proper
-  permissions.
-
-3. Configure your Druid metadata storage extension:
-
-  Add the following parameters to your Druid configuration, replacing `<host>`
-  with the location (host name and port) of the database.
-
-  ```properties
-  druid.extensions.loadList=["postgresql-metadata-storage"]
-  druid.metadata.storage.type=postgresql
-  druid.metadata.storage.connector.connectURI=jdbc:postgresql://<host>/druid
-  druid.metadata.storage.connector.user=druid
-  druid.metadata.storage.connector.password=diurd
-  ```
+<div class="note caution">
+Derby is not suitable for production use as a metadata store. Use MySQL or PostgreSQL instead.
+</div>
 
 ## Using derby
 
@@ -132,6 +23,14 @@ way.
   druid.metadata.storage.type=derby
   druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527//home/y/var/druid_state/derby;create=true
   ```
+
+## MySQL
+
+See [mysql-metadata-storage extension documentation](../development/extensions-core/mysql.html).  
+
+## PostgreSQL 
+
+See [postgresql-metadata-storage](../development/extensions-core/postgresql.html). 
 
 ## Metadata Storage Tables
 

diff --git a/...development/community-extensions/azure.md → ...t/development/extensions-contrib/azure.md b/...development/community-extensions/azure.md → ...t/development/extensions-contrib/azure.md
diff --git a/...lopment/community-extensions/cassandra.md → ...velopment/extensions-contrib/cassandra.md b/...lopment/community-extensions/cassandra.md → ...velopment/extensions-contrib/cassandra.md
diff --git a/...opment/community-extensions/cloudfiles.md → ...elopment/extensions-contrib/cloudfiles.md b/...opment/community-extensions/cloudfiles.md → ...elopment/extensions-contrib/cloudfiles.md
diff --git a/...elopment/community-extensions/graphite.md → ...evelopment/extensions-contrib/graphite.md b/...elopment/community-extensions/graphite.md → ...evelopment/extensions-contrib/graphite.md
diff --git a/...ment/community-extensions/kafka-simple.md → ...opment/extensions-contrib/kafka-simple.md b/...ment/community-extensions/kafka-simple.md → ...opment/extensions-contrib/kafka-simple.md
diff --git a/...elopment/community-extensions/rabbitmq.md → ...evelopment/extensions-contrib/rabbitmq.md b/...elopment/community-extensions/rabbitmq.md → ...evelopment/extensions-contrib/rabbitmq.md
diff --git a/...elopment/community-extensions/rocketmq.md → ...evelopment/extensions-contrib/rocketmq.md b/...elopment/community-extensions/rocketmq.md → ...evelopment/extensions-contrib/rocketmq.md
diff --git a/...ent/development/approximate-histograms.md → ...extensions-core/approximate-histograms.md b/...ent/development/approximate-histograms.md → ...extensions-core/approximate-histograms.md
@@ -2,7 +2,9 @@
 layout: doc_page
 ---
 
-### Approximate Histogram aggregator
+# Approximate Histogram aggregator
+
+Make sure to [include](../../operations/including-extensions.html) `druid-histogram` as an extension.
 
 This aggregator is based on
 [http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf)

diff --git a/docs/content/development/extensions-core/avro.md b/docs/content/development/extensions-core/avro.md
@@ -0,0 +1,114 @@
+---
+layout: doc_page
+---
+
+# Avro
+
+This extension enables Druid to ingest and understand the Apache Avro data format. Make sure to [include](../../operations/including-extensions.html) `druid-avro-extensions` as an extension.
+
+### Avro Stream Parser
+
+This is for streaming/realtime ingestion.
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+| type | String | This should say `avro_stream`. | no |
+| avroBytesDecoder | JSON Object | Specifies how to decode bytes to Avro record. | yes |
+| parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Should be a timeAndDims parseSpec. | yes |
+
+For example, using Avro stream parser with schema repo Avro bytes decoder:
+
+```json
+"parser" : {
+  "type" : "avro_stream",
+  "avroBytesDecoder" : {
+    "type" : "schema_repo",
+    "subjectAndIdConverter" : {
+      "type" : "avro_1124",
+      "topic" : "${YOUR_TOPIC}"
+    },
+    "schemaRepository" : {
+      "type" : "avro_1124_rest_client",
+      "url" : "${YOUR_SCHEMA_REPO_END_POINT}",
+    }
+  },
+  "parseSpec" : {
+    "type": "timeAndDims",
+    "timestampSpec": <standard timestampSpec>,
+    "dimensionsSpec": <standard dimensionsSpec>
+  }
+}
+```
+
+#### Avro Bytes Decoder
+
+If `type` is not included, the avroBytesDecoder defaults to `schema_repo`.
+
+##### SchemaRepo Based Avro Bytes Decoder
+
+This Avro bytes decoder first extract `subject` and `id` from input message bytes, then use them to lookup the Avro schema with which to decode Avro record from bytes. Details can be found in [schema repo](https://github.com/schema-repo/schema-repo) and [AVRO-1124](https://issues.apache.org/jira/browse/AVRO-1124). You will need an http service like schema repo to hold the avro schema. Towards schema registration on the message producer side, you can refer to `io.druid.data.input.AvroStreamInputRowParserTest#testParse()`.
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+| type | String | This should say `schema_repo`. | no |
+| subjectAndIdConverter | JSON Object | Specifies the how to extract subject and id from message bytes. | yes |
+| schemaRepository | JSON Object | Specifies the how to lookup Avro schema from subject and id. | yes |
+
+##### Avro-1124 Subject And Id Converter
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+| type | String | This should say `avro_1124`. | no |
+| topic | String | Specifies the topic of your kafka stream. | yes |
+
+
+##### Avro-1124 Schema Repository
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+| type | String | This should say `avro_1124_rest_client`. | no |
+| url | String | Specifies the endpoint url of your Avro-1124 schema repository. | yes |
+
+### Avro Hadoop Parser
+
+This is for batch ingestion using the HadoopDruidIndexer. The `inputFormat` of `inputSpec` in `ioConfig` must be set to `"io.druid.data.input.avro.AvroValueInputFormat"`. You may want to set Avro reader's schema in `jobProperties` in `tuningConfig`, eg: `"avro.schema.path.input.value": "/path/to/your/schema.avsc"` or `"avro.schema.input.value": "your_schema_JSON_object"`, if reader's schema is not set, the schema in Avro object container file will be used, see [Avro specification](http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution). Make sure to include "io.druid.extensions:druid-avro-extensions" as an extension.
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+| type | String | This should say `avro_hadoop`. | no |
+| parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Should be a timeAndDims parseSpec. | yes |
+| fromPigAvroStorage | Boolean | Specifies whether the data file is stored using AvroStorage. | no(default == false) |
+
+For example, using Avro Hadoop parser with custom reader's schema file:
+
+```json
+{
+  "type" : "index_hadoop",  
+  "spec" : {
+    "dataSchema" : {
+      "dataSource" : "",
+      "parser" : {
+        "type" : "avro_hadoop",
+        "parseSpec" : {
+          "type": "timeAndDims",
+          "timestampSpec": <standard timestampSpec>,
+          "dimensionsSpec": <standard dimensionsSpec>
+        }
+      }
+    },
+    "ioConfig" : {
+      "type" : "hadoop",
+      "inputSpec" : {
+        "type" : "static",
+        "inputFormat": "io.druid.data.input.avro.AvroValueInputFormat",
+        "paths" : ""
+      }
+    },
+    "tuningConfig" : {
+       "jobProperties" : {
+          "avro.schema.path.input.value" : "/path/to/my/schema.avsc",
+      }
+    }
+  }
+}
+```
diff --git a/...t/development/datasketches-aggregators.md → ...tensions-core/datasketches-aggregators.md b/...t/development/datasketches-aggregators.md → ...tensions-core/datasketches-aggregators.md
@@ -8,7 +8,7 @@ Druid aggregators based on [datasketches](http://datasketches.github.io/) librar
 At ingestion time, this aggregator creates the theta sketch objects which get stored in Druid segments. Logically speaking, a theta sketch object can be thought of as a Set data structure. At query time, sketches are read and aggregated (set unioned) together. In the end, by default, you receive the estimate of the number of unique entries in the sketch object. Also, you can use post aggregators to do union, intersection or difference on sketch columns in the same row. 
 Note that you can use `thetaSketch` aggregator on columns which were not ingested using same, it will return estimated cardinality of the column. It is recommended to use it at ingestion time as well to make querying faster.
 
-To use the datasketch aggregators, make sure you include the extension in your config file:
+To use the datasketch aggregators, make sure you [include](../operations/including-extensions.html) the extension in your config file:
 
 ```
 druid.extensions.loadList=["druid-datasketches"]
@@ -20,26 +20,20 @@ druid.extensions.loadList=["druid-datasketches"]
 {
   "type" : "thetaSketch",
   "name" : <output_name>,
-  "fieldName" : <metric_name>,
-
-  //following boolean field is optional. This should only be used at
-  //indexing time if your input data contains theta sketch objects.
-  //that would be the case if you use datasketches library outside of Druid,
-  //say with Pig/Hive, to produce the data that you are ingesting into Druid
-  "isInputThetaSketch": false
-
-  //following field is optional, default = 16384. must be a power of 2.
-  //Internally, size refers to the maximum number
-  //of entries sketch object will retain, higher size would mean higher
-  //accuracy but higher space needed to store those sketches.
-  //note that after you index with a particular size, druid will persist sketch in segments
-  //and you will use size greater or equal to that at query time.
-  //See [theta-size](http://datasketches.github.io/docs/ThetaSize.html) for details.
-  //In general, We recommend just sticking to default size, which has worked well.
+  "fieldName" : <metric_name>,  
+  "isInputThetaSketch": false,
   "size": 16384
  }
 ```
 
+|property|description|required?|
+|--------|-----------|---------|
+|type|This String should always be "thetaSketch"|yes|
+|name|A String for the output (result) name of the calculation.|yes|
+|fieldName|A String for the name of the aggregator used at ingestion time.|yes|
+|isInputThetaSketch|This should only be used at indexing time if your input data contains theta sketch objects. This would be the case if you use datasketches library outside of Druid, say with Pig/Hive, to produce the data that you are ingesting into Druid |no, defaults to false|
+|size|Must be a power of 2. Internally, size refers to the maximum number of entries sketch object will retain. Higher size means higher accuracy but more space to store sketches. Note that after you index with a particular size, druid will persist sketch in segments and you will use size greater or equal to that at query time. See [theta-size](http://datasketches.github.io/docs/ThetaSize.html) for details. In general, We recommend just sticking to default size. |no, defaults to 16384|
+
 ### Post Aggregators
 
 #### Sketch Estimator

diff --git a/docs/content/development/extensions-core/examples.md b/docs/content/development/extensions-core/examples.md
@@ -0,0 +1,25 @@
+---
+layout: doc_page
+---
+
+# Druid examples
+
+## TwitterSpritzerFirehose
+
+This firehose connects directly to the twitter spritzer data stream.
+
+Sample spec:
+
+```json
+"firehose" : {
+    "type" : "twitzer",
+    "maxEventCount": -1,
+    "maxRunMinutes": 0
+}
+```
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|type|This should be "twitzer"|N/A|yes|
+|maxEventCount|max events to receive, -1 is infinite, 0 means nothing is delivered; use this to prevent infinite space consumption or to prevent getting throttled at an inconvenient time.|N/A|yes|
+|maxRunMinutes|maximum number of minutes to fetch Twitter events.  Use this to prevent getting throttled at an inconvenient time. If zero or less, no time limit for run.|N/A|yes|