From dee52861ddfd200df33855c723604116f35d5adc Mon Sep 17 00:00:00 2001 From: Steve Hetland Date: Mon, 19 Apr 2021 15:39:14 -0700 Subject: [PATCH 1/2] demote tranquility doc --- docs/design/architecture.md | 3 --- docs/ingestion/index.md | 37 +++++++++++++-------------- docs/ingestion/standalone-realtime.md | 1 + docs/ingestion/tasks.md | 4 +-- docs/ingestion/tranquility.md | 8 +++--- website/i18n/en.json | 6 +++++ website/sidebars.json | 1 - 7 files changed, 31 insertions(+), 29 deletions(-) diff --git a/docs/design/architecture.md b/docs/design/architecture.md index 0711f39a8659..217bd14b764d 100644 --- a/docs/design/architecture.md +++ b/docs/design/architecture.md @@ -235,9 +235,6 @@ transaction. metadata in a single transaction after the subtasks are finished. In simple (single-task) mode, the single task publishes all segment metadata in a single transaction after it is complete. -[Tranquility](../ingestion/tranquility.md), a streaming ingestion method that is no longer recommended, does not perform -transactional loading. - Additionally, some ingestion methods offer an _idempotency_ guarantee. This means that repeated executions of the same ingestion will not cause duplicate data to be ingested: diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md index ccc9f0b29ac6..207fee495e0c 100644 --- a/docs/ingestion/index.md +++ b/docs/ingestion/index.md @@ -22,19 +22,18 @@ title: "Ingestion" ~ under the License. --> -## Overview - -All data in Druid is organized into _segments_, which are data files that generally have up to a few million rows each. -Loading data in Druid is called _ingestion_ or _indexing_ and consists of reading data from a source system and creating +All data in Druid is organized into _segments_, which are data files each of which may have up to a few million rows. +Loading data in Druid is called _ingestion_ or _indexing_, and consists of reading data from a source system and creating segments based on that data. -In most ingestion methods, the work of loading data is done by Druid [MiddleManager](../design/middlemanager.md) processes -(or the [Indexer](../design/indexer.md) processes). One exception is +In most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes +(or the [Indexer](../design/indexer.md) processes) do the work of loading data. One exception is Hadoop-based ingestion, where this work is instead done using a Hadoop MapReduce job on YARN (although MiddleManager or Indexer -processes are still involved in starting and monitoring the Hadoop jobs). Once segments have been generated and stored -in [deep storage](../dependencies/deep-storage.md), they will be loaded by Historical processes. For more details on -how this works under the hood, see the [Storage design](../design/architecture.md#storage-design) section of Druid's design -documentation. +processes are still involved in starting and monitoring the Hadoop jobs). + +Once segments have been generated and stored in [deep storage](../dependencies/deep-storage.md), they are loaded by Historical processes. +For more details on how this works, see the [Storage design](../design/architecture.md#storage-design) section +of Druid's design documentation. ## How to use this documentation @@ -57,17 +56,17 @@ page. ### Streaming The most recommended, and most popular, method of streaming ingestion is the -[Kafka indexing service](../development/extensions-core/kafka-ingestion.md) that reads directly from Kafka. The Kinesis -indexing service also works well if you prefer Kinesis. +[Kafka indexing service](../development/extensions-core/kafka-ingestion.md) that reads directly from Kafka. Alternatively, the Kinesis +indexing service works with Amazon Kinesis Data Streams. -This table compares the major available options: +This table compares the options: -| **Method** | [Kafka](../development/extensions-core/kafka-ingestion.md) | [Kinesis](../development/extensions-core/kinesis-ingestion.md) | [Tranquility](tranquility.md) | -|---|-----|--------------|------------| -| **Supervisor type** | `kafka` | `kinesis` | N/A | -| **How it works** | Druid reads directly from Apache Kafka. | Druid reads directly from Amazon Kinesis. | Tranquility, a library that ships separately from Druid, is used to push data into Druid. | -| **Can ingest late data?** | Yes | Yes | No (late data is dropped based on the `windowPeriod` config) | -| **Exactly-once guarantees?** | Yes | Yes | No | +| **Method** | [Kafka](../development/extensions-core/kafka-ingestion.md) | [Kinesis](../development/extensions-core/kinesis-ingestion.md) | +|---|-----|--------------| +| **Supervisor type** | `kafka` | `kinesis`| +| **How it works** | Druid reads directly from Apache Kafka. | Druid reads directly from Amazon Kinesis.| +| **Can ingest late data?** | Yes | Yes | +| **Exactly-once guarantees?** | Yes | Yes | ### Batch diff --git a/docs/ingestion/standalone-realtime.md b/docs/ingestion/standalone-realtime.md index b8ba92a8211c..7a3a9e0e35a6 100644 --- a/docs/ingestion/standalone-realtime.md +++ b/docs/ingestion/standalone-realtime.md @@ -1,4 +1,5 @@ --- +id: standalone-realtime layout: doc_page title: "Realtime Process" --- diff --git a/docs/ingestion/tasks.md b/docs/ingestion/tasks.md index 3b96e759a35c..9f8239910727 100644 --- a/docs/ingestion/tasks.md +++ b/docs/ingestion/tasks.md @@ -294,7 +294,7 @@ Once `forceTimeChunkLock` is unset, the task will choose a proper lock type to u Please note that segment lock is not always available. The most common use case where time chunk lock is enforced is when an overwriting task changes the segment granularity. Also, the segment locking is supported by only native indexing tasks and Kafka/Kinesis indexing tasks. -Hadoop indexing tasks and `index_realtime` tasks (used by [Tranquility](tranquility.md)) don't support it yet. +Hadoop indexing tasks don't support it. `forceTimeChunkLock` in the task context is only applied to individual tasks. If you want to unset it for all tasks, you would want to set `druid.indexer.tasklock.forceTimeChunkLock` to false in the [overlord configuration](../configuration/index.md#overlord-operations). @@ -384,7 +384,7 @@ Submitted automatically, on your behalf, by a ### `index_realtime` -Submitted automatically, on your behalf, by [Tranquility](tranquility.md). +Submitted automatically, on your behalf, by [Tranquility](tranquility.md). ### `compact` diff --git a/docs/ingestion/tranquility.md b/docs/ingestion/tranquility.md index b50ac1956046..f66464456188 100644 --- a/docs/ingestion/tranquility.md +++ b/docs/ingestion/tranquility.md @@ -22,11 +22,11 @@ title: "Tranquility" ~ under the License. --> -[Tranquility](https://github.com/druid-io/tranquility/) is a package for pushing -streams to Druid in real-time. Druid does not come bundled with Tranquility; it is available as a separate download. +[Tranquility](https://github.com/druid-io/tranquility/) is a separately distributed package for pushing +streams to Druid in real-time. -Note that as of this writing, the latest available version of Tranquility is built against the rather old Druid 0.9.2 -release. It will still work with the latest Druid servers, but not all features and functionality will be available +Tranquility has not been built against a version of Druid later than Druid 0.9.2 +release. It may still work with the latest Druid servers, but not all features and functionality will be available due to limitations of older Druid APIs on the Tranquility side. For new projects that require streaming ingestion, we recommend using Druid's native support for diff --git a/website/i18n/en.json b/website/i18n/en.json index ae666f4c94c4..e3c8b34f8166 100644 --- a/website/i18n/en.json +++ b/website/i18n/en.json @@ -134,6 +134,9 @@ "development/extensions-contrib/opentsdb-emitter": { "title": "OpenTSDB Emitter" }, + "development/extensions-contrib/prometheus": { + "title": "Prometheus Emitter" + }, "development/extensions-contrib/redis-cache": { "title": "Druid Redis Cache" }, @@ -501,6 +504,9 @@ "title": "TopN queries", "sidebar_label": "TopN" }, + "querying/using-caching": { + "title": "Using query caching" + }, "querying/virtual-columns": { "title": "Virtual columns" }, diff --git a/website/sidebars.json b/website/sidebars.json index 86ade793f815..8505bb2008d3 100644 --- a/website/sidebars.json +++ b/website/sidebars.json @@ -41,7 +41,6 @@ "ids": [ "development/extensions-core/kafka-ingestion", "development/extensions-core/kinesis-ingestion", - "ingestion/tranquility", "ingestion/standalone-realtime" ] }, From 1da7e685ce3d84d5d81641fad259e2a7fb5a9170 Mon Sep 17 00:00:00 2001 From: sthetland Date: Tue, 27 Apr 2021 16:20:41 -0700 Subject: [PATCH 2/2] Update docs/ingestion/index.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> --- docs/ingestion/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md index 207fee495e0c..b628eea4d14e 100644 --- a/docs/ingestion/index.md +++ b/docs/ingestion/index.md @@ -27,7 +27,7 @@ Loading data in Druid is called _ingestion_ or _indexing_, and consists of readi segments based on that data. In most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes -(or the [Indexer](../design/indexer.md) processes) do the work of loading data. One exception is +(or the [Indexer](../design/indexer.md) processes) load your source data. One exception is Hadoop-based ingestion, where this work is instead done using a Hadoop MapReduce job on YARN (although MiddleManager or Indexer processes are still involved in starting and monitoring the Hadoop jobs).