From f300d8bc13f3071742f3e9092cd1d5e006208c8e Mon Sep 17 00:00:00 2001 From: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Date: Wed, 16 Aug 2023 18:18:21 +0530 Subject: [PATCH 1/3] Add clarification to the docs --- .../kafka-supervisor-reference.md | 21 ++++++++++--------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md b/docs/development/extensions-core/kafka-supervisor-reference.md index 536f1ade95be..6a17af5aaa8b 100644 --- a/docs/development/extensions-core/kafka-supervisor-reference.md +++ b/docs/development/extensions-core/kafka-supervisor-reference.md @@ -37,8 +37,8 @@ This topic contains configuration reference information for the Apache Kafka sup |Field|Type|Description|Required| |-----|----|-----------|--------| -|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use this setting when you want to ingest from a single kafka topic.|yes| -|`topicPattern`|String|A regex pattern that can used to select multiple kafka topics to ingest data from. Either this or `topic` can be used in a spec. See [Ingesting from multiple topics](#ingesting-from-multiple-topics) for more details.|yes| +|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use this setting when you want to ingest from a single Kafka topic.|yes| +|`topicPattern`|String|A regex pattern that can used to select multiple Kafka topics to ingest data from. Either this or `topic` can be used in a spec. See [Ingesting from multiple topics](#ingesting-from-multiple-topics) for more details.|yes| |`inputFormat`|Object|`inputFormat` to define input data parsing. See [Specifying data format](#specifying-data-format) for details about specifying the input format.|yes| |`consumerProperties`|Map|A map of properties to pass to the Kafka consumer. See [More on consumer properties](#more-on-consumerproperties).|yes| |`pollTimeout`|Long|The length of time to wait for the Kafka consumer to poll records, in milliseconds|no (default == 100)| @@ -144,14 +144,15 @@ Multiple topics can be passed as a regex pattern as the value for `topicPattern` ingest data from clicks and impressions, you will set `topicPattern` to `clicks|impressions` in the IO config. Similarly, you can use `metrics-.*` as the value for `topicPattern` if you want to ingest from all the topics that start with `metrics-`. If new topics are added to the cluster that match the regex, Druid will automatically start -ingesting from those new topics. If you enable multi-topic ingestion for a datasource, downgrading to a version -lesser than 28.0.0 will cause the ingestion for that datasource to fail. - -When ingesting data from multiple topics, the partitions are assigned based on the hashcode of topic and the id of the -partition within that topic. The partition assignment might not be uniform across all the tasks. It's also assumed -that partitions across individual topics have similar load. It is recommended that you have a higher number of -partitions for a high load topic and a lower number of partitions for a low load topic. Assuming that you want to -ingest from both high and low load topic in the same supervisor. +ingesting from those new topics. A topic name that only matches partially such as `my-metrics-12` will not be +included for ingestion. If you enable multi-topic ingestion for a datasource, downgrading to a version lesser than +28.0.0 will cause the ingestion for that datasource to fail. + +When ingesting data from multiple topics, the partitions are assigned based on the hashcode of the topic name and the +id of the partition within that topic. The partition assignment might not be uniform across all the tasks. It's also +assumed that partitions across individual topics have similar load. It is recommended that you have a higher number of +partitions for a high load topic and a lower number of partitions for a low load topic. Assuming that you want to +ingest from both high and low load topic in the same supervisor. ## More on consumerProperties From 02b8bb6bd59db01519ce6aca63c026969d617fc7 Mon Sep 17 00:00:00 2001 From: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Date: Wed, 16 Aug 2023 22:15:29 +0530 Subject: [PATCH 2/3] comments --- .../development/extensions-core/kafka-supervisor-reference.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md b/docs/development/extensions-core/kafka-supervisor-reference.md index 6a17af5aaa8b..935912df5efd 100644 --- a/docs/development/extensions-core/kafka-supervisor-reference.md +++ b/docs/development/extensions-core/kafka-supervisor-reference.md @@ -37,8 +37,8 @@ This topic contains configuration reference information for the Apache Kafka sup |Field|Type|Description|Required| |-----|----|-----------|--------| -|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use this setting when you want to ingest from a single Kafka topic.|yes| -|`topicPattern`|String|A regex pattern that can used to select multiple Kafka topics to ingest data from. Either this or `topic` can be used in a spec. See [Ingesting from multiple topics](#ingesting-from-multiple-topics) for more details.|yes| +|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use this setting when you want to ingest from a single Kafka topic.|yes if `topicPattern` is not set| +|`topicPattern`|String|A regex pattern that can used to select multiple Kafka topics to ingest data from. Either this or `topic` can be used in a spec. See [Ingesting from multiple topics](#ingesting-from-multiple-topics) for more details.|yes if `topic` is not set| |`inputFormat`|Object|`inputFormat` to define input data parsing. See [Specifying data format](#specifying-data-format) for details about specifying the input format.|yes| |`consumerProperties`|Map|A map of properties to pass to the Kafka consumer. See [More on consumer properties](#more-on-consumerproperties).|yes| |`pollTimeout`|Long|The length of time to wait for the Kafka consumer to poll records, in milliseconds|no (default == 100)| From 03bf38f6064f774271d274e9ba0f8a6a3c68236a Mon Sep 17 00:00:00 2001 From: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Date: Thu, 17 Aug 2023 12:02:20 +0530 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Kashif Faraz --- .../extensions-core/kafka-supervisor-reference.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md b/docs/development/extensions-core/kafka-supervisor-reference.md index 935912df5efd..d2d059ceb0a7 100644 --- a/docs/development/extensions-core/kafka-supervisor-reference.md +++ b/docs/development/extensions-core/kafka-supervisor-reference.md @@ -37,8 +37,8 @@ This topic contains configuration reference information for the Apache Kafka sup |Field|Type|Description|Required| |-----|----|-----------|--------| -|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use this setting when you want to ingest from a single Kafka topic.|yes if `topicPattern` is not set| -|`topicPattern`|String|A regex pattern that can used to select multiple Kafka topics to ingest data from. Either this or `topic` can be used in a spec. See [Ingesting from multiple topics](#ingesting-from-multiple-topics) for more details.|yes if `topic` is not set| +|`topic`|String|The Kafka topic to read from. Must be a specific topic. Use this setting when you want to ingest from a single Kafka topic.|yes, only if `topicPattern` is not set| +|`topicPattern`|String|A regex pattern that can used to select multiple Kafka topics to ingest data from. Either this or `topic` can be used in a spec. See [Ingesting from multiple topics](#ingesting-from-multiple-topics) for more details.|yes, only if `topic` is not set| |`inputFormat`|Object|`inputFormat` to define input data parsing. See [Specifying data format](#specifying-data-format) for details about specifying the input format.|yes| |`consumerProperties`|Map|A map of properties to pass to the Kafka consumer. See [More on consumer properties](#more-on-consumerproperties).|yes| |`pollTimeout`|Long|The length of time to wait for the Kafka consumer to poll records, in milliseconds|no (default == 100)| @@ -145,10 +145,10 @@ ingest data from clicks and impressions, you will set `topicPattern` to `clicks| Similarly, you can use `metrics-.*` as the value for `topicPattern` if you want to ingest from all the topics that start with `metrics-`. If new topics are added to the cluster that match the regex, Druid will automatically start ingesting from those new topics. A topic name that only matches partially such as `my-metrics-12` will not be -included for ingestion. If you enable multi-topic ingestion for a datasource, downgrading to a version lesser than +included for ingestion. If you enable multi-topic ingestion for a datasource, downgrading to a version older than 28.0.0 will cause the ingestion for that datasource to fail. -When ingesting data from multiple topics, the partitions are assigned based on the hashcode of the topic name and the +When ingesting data from multiple topics, partitions are assigned based on the hashcode of the topic name and the id of the partition within that topic. The partition assignment might not be uniform across all the tasks. It's also assumed that partitions across individual topics have similar load. It is recommended that you have a higher number of partitions for a high load topic and a lower number of partitions for a low load topic. Assuming that you want to