Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions docs/design/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,6 @@ transaction.
metadata in a single transaction after the subtasks are finished. In simple (single-task) mode, the single task
publishes all segment metadata in a single transaction after it is complete.

[Tranquility](../ingestion/tranquility.md), a streaming ingestion method that is no longer recommended, does not perform
transactional loading.

Additionally, some ingestion methods offer an _idempotency_ guarantee. This means that repeated executions of the same
ingestion will not cause duplicate data to be ingested:

Expand Down
37 changes: 18 additions & 19 deletions docs/ingestion/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,18 @@ title: "Ingestion"
~ under the License.
-->

## Overview

All data in Druid is organized into _segments_, which are data files that generally have up to a few million rows each.
Loading data in Druid is called _ingestion_ or _indexing_ and consists of reading data from a source system and creating
All data in Druid is organized into _segments_, which are data files each of which may have up to a few million rows.
Loading data in Druid is called _ingestion_ or _indexing_, and consists of reading data from a source system and creating
segments based on that data.

In most ingestion methods, the work of loading data is done by Druid [MiddleManager](../design/middlemanager.md) processes
(or the [Indexer](../design/indexer.md) processes). One exception is
In most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes
(or the [Indexer](../design/indexer.md) processes) load your source data. One exception is
Hadoop-based ingestion, where this work is instead done using a Hadoop MapReduce job on YARN (although MiddleManager or Indexer
processes are still involved in starting and monitoring the Hadoop jobs). Once segments have been generated and stored
in [deep storage](../dependencies/deep-storage.md), they will be loaded by Historical processes. For more details on
how this works under the hood, see the [Storage design](../design/architecture.md#storage-design) section of Druid's design
documentation.
processes are still involved in starting and monitoring the Hadoop jobs).

Once segments have been generated and stored in [deep storage](../dependencies/deep-storage.md), they are loaded by Historical processes.
For more details on how this works, see the [Storage design](../design/architecture.md#storage-design) section
of Druid's design documentation.

## How to use this documentation

Expand All @@ -57,17 +56,17 @@ page.
### Streaming

The most recommended, and most popular, method of streaming ingestion is the
[Kafka indexing service](../development/extensions-core/kafka-ingestion.md) that reads directly from Kafka. The Kinesis
indexing service also works well if you prefer Kinesis.
[Kafka indexing service](../development/extensions-core/kafka-ingestion.md) that reads directly from Kafka. Alternatively, the Kinesis
indexing service works with Amazon Kinesis Data Streams.

This table compares the major available options:
This table compares the options:

| **Method** | [Kafka](../development/extensions-core/kafka-ingestion.md) | [Kinesis](../development/extensions-core/kinesis-ingestion.md) | [Tranquility](tranquility.md) |
|---|-----|--------------|------------|
| **Supervisor type** | `kafka` | `kinesis` | N/A |
| **How it works** | Druid reads directly from Apache Kafka. | Druid reads directly from Amazon Kinesis. | Tranquility, a library that ships separately from Druid, is used to push data into Druid. |
| **Can ingest late data?** | Yes | Yes | No (late data is dropped based on the `windowPeriod` config) |
| **Exactly-once guarantees?** | Yes | Yes | No |
| **Method** | [Kafka](../development/extensions-core/kafka-ingestion.md) | [Kinesis](../development/extensions-core/kinesis-ingestion.md) |
|---|-----|--------------|
| **Supervisor type** | `kafka` | `kinesis`|
| **How it works** | Druid reads directly from Apache Kafka. | Druid reads directly from Amazon Kinesis.|
| **Can ingest late data?** | Yes | Yes |
| **Exactly-once guarantees?** | Yes | Yes |

### Batch

Expand Down
1 change: 1 addition & 0 deletions docs/ingestion/standalone-realtime.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
id: standalone-realtime
layout: doc_page
title: "Realtime Process"
---
Expand Down
4 changes: 2 additions & 2 deletions docs/ingestion/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ Once `forceTimeChunkLock` is unset, the task will choose a proper lock type to u
Please note that segment lock is not always available. The most common use case where time chunk lock is enforced is
when an overwriting task changes the segment granularity.
Also, the segment locking is supported by only native indexing tasks and Kafka/Kinesis indexing tasks.
Hadoop indexing tasks and `index_realtime` tasks (used by [Tranquility](tranquility.md)) don't support it yet.
Hadoop indexing tasks don't support it.

`forceTimeChunkLock` in the task context is only applied to individual tasks.
If you want to unset it for all tasks, you would want to set `druid.indexer.tasklock.forceTimeChunkLock` to false in the [overlord configuration](../configuration/index.md#overlord-operations).
Expand Down Expand Up @@ -384,7 +384,7 @@ Submitted automatically, on your behalf, by a

### `index_realtime`

Submitted automatically, on your behalf, by [Tranquility](tranquility.md).
Submitted automatically, on your behalf, by [Tranquility](tranquility.md).

### `compact`

Expand Down
8 changes: 4 additions & 4 deletions docs/ingestion/tranquility.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ title: "Tranquility"
~ under the License.
-->

[Tranquility](https://github.com/druid-io/tranquility/) is a package for pushing
streams to Druid in real-time. Druid does not come bundled with Tranquility; it is available as a separate download.
[Tranquility](https://github.com/druid-io/tranquility/) is a separately distributed package for pushing
streams to Druid in real-time.

Note that as of this writing, the latest available version of Tranquility is built against the rather old Druid 0.9.2
release. It will still work with the latest Druid servers, but not all features and functionality will be available
Tranquility has not been built against a version of Druid later than Druid 0.9.2
release. It may still work with the latest Druid servers, but not all features and functionality will be available
due to limitations of older Druid APIs on the Tranquility side.

For new projects that require streaming ingestion, we recommend using Druid's native support for
Expand Down
6 changes: 6 additions & 0 deletions website/i18n/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@
"development/extensions-contrib/opentsdb-emitter": {
"title": "OpenTSDB Emitter"
},
"development/extensions-contrib/prometheus": {
"title": "Prometheus Emitter"
},
"development/extensions-contrib/redis-cache": {
"title": "Druid Redis Cache"
},
Expand Down Expand Up @@ -501,6 +504,9 @@
"title": "TopN queries",
"sidebar_label": "TopN"
},
"querying/using-caching": {
"title": "Using query caching"
},
"querying/virtual-columns": {
"title": "Virtual columns"
},
Expand Down
1 change: 0 additions & 1 deletion website/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
"ids": [
"development/extensions-core/kafka-ingestion",
"development/extensions-core/kinesis-ingestion",
"ingestion/tranquility",
"ingestion/standalone-realtime"
]
},
Expand Down