From 82be0fd4766eb2c31a70a8374899a59e5be391d9 Mon Sep 17 00:00:00 2001 From: Charles Smith Date: Mon, 15 Mar 2021 17:19:20 -0700 Subject: [PATCH 1/3] remove experimental from Kinesis with caveats --- .../extensions-core/kinesis-ingestion.md | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/docs/development/extensions-core/kinesis-ingestion.md b/docs/development/extensions-core/kinesis-ingestion.md index 03b5416ba1ee..34b1c39b0b92 100644 --- a/docs/development/extensions-core/kinesis-ingestion.md +++ b/docs/development/extensions-core/kinesis-ingestion.md @@ -24,16 +24,15 @@ sidebar_label: "Amazon Kinesis" --> -Similar to the [Kafka indexing service](./kafka-ingestion.md), the Kinesis indexing service enables the configuration of *supervisors* on the Overlord, which facilitate ingestion from -Kinesis by managing the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis's own +Similar to the [Kafka indexing service](./kafka-ingestion.md), the Kinesis indexing service for Apache Druid enables the configuration of *supervisors* on the Overlord. These supervisors facilitate ingestion from Kinesis by managing the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis's own Shards and Sequence Number mechanism and are therefore able to provide guarantees of exactly-once ingestion. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. The Kinesis indexing service is provided as the `druid-kinesis-indexing-service` core Apache Druid extension (see -[Including Extensions](../../development/extensions.md#loading-extensions)). Please note that this is -currently designated as an *experimental feature* and is subject to the usual -[experimental caveats](../experimental.md). +[Including Extensions](../../development/extensions.md#loading-extensions)). + +> Before you deploy the Kinesis extension to production, read the [Kinesis known issues](#kinesis-known-issues). ## Submitting a Supervisor Spec @@ -471,3 +470,13 @@ with an assignment of closed shards that have been fully read and to ensure a ba This window with early task shutdowns and possible task failures will conclude when: - All closed shards have been fully read and the Kinesis ingestion tasks have published the data from those shards, committing the "closed" state to metadata storage - Any remaining tasks that had inactive shards in the assignment have been shutdown (these tasks would have been created before the closed shards were completely drained) + +## Kinesis known issues + +Before you deploy the Kinesis extension to production, consider the following known issues: + +- Avoid implementing more than one Kinesis supervisor that read from the same Kinesis stream for ingestion. Kinesis has a per-shard read throughput limit and having multiple supervisors on the same stream can reduce available read throughput for an individual Supervisor's tasks. Additionally, multiple Supervisors ingesting to the same Druid Datasource can cause increased contention for locks on the Datasource. +- The only way to change the stream reset policy is to submit a new ingestion spec and set up a new supervisor. +- Timeouts for retrieving earliest sequence number will cause a reset of the supervisor. The job will resume own its own eventually, but it can trigger alerts. +- The Kinesis supervisor will not make progress if you have empty shards. Make sure you have at least 1 record in the shard. +- If ingestion tasks get stuck, the supervisor does not automatically recover. You should monitor ingestion tasks and investigate if your ingestion falls behind. From 20f211cc639084a6955d6a8017e9f9221224b67f Mon Sep 17 00:00:00 2001 From: Charles Smith Date: Thu, 25 Mar 2021 13:33:46 -0700 Subject: [PATCH 2/3] add suggested known issue --- docs/development/extensions-core/kinesis-ingestion.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/development/extensions-core/kinesis-ingestion.md b/docs/development/extensions-core/kinesis-ingestion.md index 34b1c39b0b92..542a4f6092a3 100644 --- a/docs/development/extensions-core/kinesis-ingestion.md +++ b/docs/development/extensions-core/kinesis-ingestion.md @@ -480,3 +480,4 @@ Before you deploy the Kinesis extension to production, consider the following kn - Timeouts for retrieving earliest sequence number will cause a reset of the supervisor. The job will resume own its own eventually, but it can trigger alerts. - The Kinesis supervisor will not make progress if you have empty shards. Make sure you have at least 1 record in the shard. - If ingestion tasks get stuck, the supervisor does not automatically recover. You should monitor ingestion tasks and investigate if your ingestion falls behind. +- A Kinesis supervisor can sometimes check if the checkpointed offset has fallen behind the retention window of the stream. These checks fetch the earliest sequence number for Kinesis which can result in `IteratorAgeMilliseconds` becoming very high in AWS CloudWatch. From 6c807c1ac3bc4287471136c4fdcdda2119fe2e33 Mon Sep 17 00:00:00 2001 From: Charles Smith Date: Fri, 26 Mar 2021 16:04:27 -0700 Subject: [PATCH 3/3] spelling fixes --- docs/development/extensions-core/kinesis-ingestion.md | 2 +- website/.spelling | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/development/extensions-core/kinesis-ingestion.md b/docs/development/extensions-core/kinesis-ingestion.md index 542a4f6092a3..e25be080f08e 100644 --- a/docs/development/extensions-core/kinesis-ingestion.md +++ b/docs/development/extensions-core/kinesis-ingestion.md @@ -480,4 +480,4 @@ Before you deploy the Kinesis extension to production, consider the following kn - Timeouts for retrieving earliest sequence number will cause a reset of the supervisor. The job will resume own its own eventually, but it can trigger alerts. - The Kinesis supervisor will not make progress if you have empty shards. Make sure you have at least 1 record in the shard. - If ingestion tasks get stuck, the supervisor does not automatically recover. You should monitor ingestion tasks and investigate if your ingestion falls behind. -- A Kinesis supervisor can sometimes check if the checkpointed offset has fallen behind the retention window of the stream. These checks fetch the earliest sequence number for Kinesis which can result in `IteratorAgeMilliseconds` becoming very high in AWS CloudWatch. +- A Kinesis supervisor can sometimes compare the checkpoint offset to retention window of the stream to see if it has fallen behind. These checks fetch the earliest sequence number for Kinesis which can result in `IteratorAgeMilliseconds` becoming very high in AWS CloudWatch. diff --git a/website/.spelling b/website/.spelling index 3a85dc6ad361..39ec30bc4a13 100644 --- a/website/.spelling +++ b/website/.spelling @@ -38,10 +38,11 @@ Base64-encoded ByteBuffer CIDR CORS +CNF CPUs CSVs Ceph -CNF +CloudWatch ColumnDescriptor Corretto DDL