From e823b0543b73f50979231c187c49f5650954277c Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Thu, 3 Jul 2025 11:04:15 -0700 Subject: [PATCH 01/10] hadoop and java11 --- docs/ingestion/data-formats.md | 8 ++++++-- docs/ingestion/faq.md | 4 ---- docs/ingestion/hadoop.md | 7 +++++++ docs/ingestion/index.md | 5 ++--- docs/operations/java.md | 8 ++------ docs/operations/other-hadoop.md | 6 ++++++ docs/tutorials/cluster.md | 4 ++-- docs/tutorials/index.md | 4 ++-- docs/tutorials/tutorial-batch-hadoop.md | 5 +++++ docs/tutorials/tutorial-kerberos-hadoop.md | 6 ++++++ docs/tutorials/tutorial-query.md | 1 - 11 files changed, 38 insertions(+), 20 deletions(-) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index 1bb2de9918ff..bb9b94953714 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -962,11 +962,15 @@ Each line can be further parsed using [`parseSpec`](#parsespec). ### Avro Hadoop Parser -:::info - You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser. +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. + ::: :::info + You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser. + See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid ::: diff --git a/docs/ingestion/faq.md b/docs/ingestion/faq.md index 24e119585ab6..3fab83f0ea99 100644 --- a/docs/ingestion/faq.md +++ b/docs/ingestion/faq.md @@ -49,10 +49,6 @@ Other common reasons that hand-off fails are as follows: 4) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the Coordinator logs have no errors. -## How do I get HDFS to work? - -Make sure to include the `druid-hdfs-storage` and all the hadoop configuration, dependencies (that can be obtained by running command `hadoop classpath` on a machine where hadoop has been setup) in the classpath. And, provide necessary HDFS settings as described in [deep storage](../design/deep-storage.md) . - ## How do I know when I can make query to Druid after submitting batch ingestion task? You can verify if segments created by a recent ingestion task are loaded onto historicals and available for querying using the following workflow. diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index db665f9769db..29694387266f 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -23,6 +23,13 @@ sidebar_label: "Hadoop-based" ~ under the License. --> +:::caution[Deprecated] + +Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. + +::: + + Apache Hadoop-based batch ingestion in Apache Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running instance of a Druid [Overlord](../design/overlord.md). Please refer to our [Hadoop-based vs. native batch comparison table](index.md#batch) for comparisons between Hadoop-based, native batch (simple), and native batch (parallel) ingestion. diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md index b2c9002df927..de90051fca09 100644 --- a/docs/ingestion/index.md +++ b/docs/ingestion/index.md @@ -28,8 +28,7 @@ your source system and stores it in data files called [_segments_](../design/seg In general, segment files contain a few million rows each. For most ingestion methods, the Druid [Middle Manager](../design/middlemanager.md) processes or the -[Indexer](../design/indexer.md) processes load your source data. The sole exception is Hadoop-based ingestion, which -uses a Hadoop MapReduce job on YARN. +[Indexer](../design/indexer.md) processes load your source data. During ingestion, Druid creates segments and stores them in [deep storage](../design/deep-storage.md). Historical nodes load the segments into memory to respond to queries. For streaming ingestion, the Middle Managers and indexers can respond to queries in real-time with arriving data. For more information, see [Storage overview](../design/storage.md). @@ -66,7 +65,7 @@ supervisor. There are three available options for batch ingestion. Batch ingestion jobs are associated with a controller task that runs for the duration of the job. -| **Method** | [Native batch](./native-batch.md) | [SQL](../multi-stage-query/index.md) | [Hadoop-based](hadoop.md) | +| **Method** | [Native batch](./native-batch.md) | [SQL](../multi-stage-query/index.md) | [Hadoop-based (deprecated)](hadoop.md) | |---|-----|--------------|------------| | **Controller task type** | `index_parallel` | `query_controller` | `index_hadoop` | | **How you submit it** | Send an `index_parallel` spec to the [Tasks API](../api-reference/tasks-api.md). | Send an [INSERT](../multi-stage-query/concepts.md#load-data-with-insert) or [REPLACE](../multi-stage-query/concepts.md#overwrite-data-with-replace) statement to the [SQL task API](../api-reference/sql-ingestion-api.md#submit-a-query). | Send an `index_hadoop` spec to the [Tasks API](../api-reference/tasks-api.md). | diff --git a/docs/operations/java.md b/docs/operations/java.md index d16f78c8abe1..efb49b7bc773 100644 --- a/docs/operations/java.md +++ b/docs/operations/java.md @@ -27,11 +27,7 @@ a Java runtime for Druid. ## Selecting a Java runtime -Druid fully supports Java 11 and Java 17. The project team recommends Java 17. - -:::info -Note: Starting with Apache Druid 32.0.0, support for Java 8 has been removed. -::: + The project team recommends Java 17. Although, you can use Java 11, support for it is deprecated. The project team recommends using an OpenJDK-based Java distribution. There are many free and actively-supported distributions available, including @@ -74,7 +70,7 @@ Exception in thread "main" java.lang.ExceptionInInitializerError ``` Druid's out-of-box configuration adds these parameters transparently when you use the bundled `bin/start-druid` or -similar commands. In this case, there is nothing special you need to do to run successfully on Java 11 or 17. However, +similar commands. In this case, there is nothing special you need to do to run successfully. However, if you have customized your Druid service launching system, you will need to ensure the required Java parameters are added. There are many ways of doing this. Choose the one that works best for you. diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index ba19a8326435..7a55295d55e1 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -23,6 +23,12 @@ title: "Working with different versions of Apache Hadoop" --> +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. + +::: + Apache Druid can interact with Hadoop in two ways: 1. [Use HDFS for deep storage](../development/extensions-core/hdfs.md) using the druid-hdfs-storage extension. diff --git a/docs/tutorials/cluster.md b/docs/tutorials/cluster.md index f2128489216c..8ef37a01196e 100644 --- a/docs/tutorials/cluster.md +++ b/docs/tutorials/cluster.md @@ -133,8 +133,8 @@ The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) has info We recommend running your favorite Linux distribution. You will also need -* [Java 11 or 17](../operations/java.md) -* Python 2 or Python 3 +* [17](../operations/java.md) +* Python 3 :::info If needed, you can specify where to find Java using the environment variables diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index 187f3cb952b0..c7d6fd091992 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -40,8 +40,8 @@ You can follow these steps on a relatively modest machine, such as a workstation The software requirements for the installation machine are: * Linux, Mac OS X, or other Unix-like OS. (Windows is not supported) -* [Java 11 or 17](../operations/java.md) -* Python 3 (preferred) or Python 2 +* [17](../operations/java.md) +* Python 3 * Perl 5 Java must be available. Either it is on your path, or set one of the `JAVA_HOME` or `DRUID_JAVA_HOME` environment variables. diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index a71823544af5..1ea7cb4bb290 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -23,6 +23,11 @@ sidebar_label: Load from Apache Hadoop ~ under the License. --> +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. + +::: This tutorial shows you how to load data files into Apache Druid using a remote Hadoop cluster. diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index 24fc290b6a6d..16b428fd9745 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -23,6 +23,12 @@ sidebar_label: Kerberized HDFS deep storage ~ under the License. --> +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. Use SQL-based ingestion instead of MapReduce or MiddleManager-less ingestion with Kubernetes instead of YARN + +::: + ## Hadoop Setup diff --git a/docs/tutorials/tutorial-query.md b/docs/tutorials/tutorial-query.md index 1ad7e8e28bf9..9fd65fbaf79c 100644 --- a/docs/tutorials/tutorial-query.md +++ b/docs/tutorials/tutorial-query.md @@ -32,7 +32,6 @@ by following one of them: * [Load a file](../tutorials/tutorial-batch.md) * [Load stream data from Kafka](../tutorials/tutorial-kafka.md) -* [Load a file using Hadoop](../tutorials/tutorial-batch-hadoop.md) There are various ways to run Druid SQL queries: from the web console, using a command line utility and by posting the query by HTTP. We'll look at each of these. From 6acc37e5cab2cf6b6e0ae64acc9484f121143981 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 18 Jul 2025 10:32:48 -0700 Subject: [PATCH 02/10] update wording --- docs/ingestion/data-formats.md | 2 +- docs/operations/other-hadoop.md | 2 +- docs/tutorials/tutorial-batch-hadoop.md | 2 +- docs/tutorials/tutorial-kerberos-hadoop.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index bb9b94953714..7b6d38564b0d 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -964,7 +964,7 @@ Each line can be further parsed using [`parseSpec`](#parsespec). :::caution[Deprecated] -Hadoop-based ingestion is deprecated. +Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index 7a55295d55e1..360d6256869c 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -25,7 +25,7 @@ title: "Working with different versions of Apache Hadoop" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. +Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index 1ea7cb4bb290..666a3a3ec831 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Load from Apache Hadoop :::caution[Deprecated] -Hadoop-based ingestion is deprecated. +Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index 16b428fd9745..85dceb05a620 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Kerberized HDFS deep storage :::caution[Deprecated] -Hadoop-based ingestion is deprecated. Use SQL-based ingestion instead of MapReduce or MiddleManager-less ingestion with Kubernetes instead of YARN +Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: From cdf14918a7ac596283efe15334e5f256fa445826 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Mon, 21 Jul 2025 16:42:18 -0700 Subject: [PATCH 03/10] Update docs/operations/java.md --- docs/operations/java.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/operations/java.md b/docs/operations/java.md index efb49b7bc773..0b8a474ac950 100644 --- a/docs/operations/java.md +++ b/docs/operations/java.md @@ -27,7 +27,7 @@ a Java runtime for Druid. ## Selecting a Java runtime - The project team recommends Java 17. Although, you can use Java 11, support for it is deprecated. + The project team recommends Java 17. Although you can use Java 11, support for it is deprecated. The project team recommends using an OpenJDK-based Java distribution. There are many free and actively-supported distributions available, including From ce5ce3aa342a6dabae444cf30da431e728267796 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Mon, 21 Jul 2025 16:52:21 -0700 Subject: [PATCH 04/10] Apply suggestions from code review --- docs/ingestion/data-formats.md | 2 +- docs/ingestion/hadoop.md | 2 +- docs/operations/other-hadoop.md | 2 +- docs/tutorials/tutorial-batch-hadoop.md | 2 +- docs/tutorials/tutorial-kerberos-hadoop.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index 7b6d38564b0d..0dd8ee28dc6c 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -964,7 +964,7 @@ Each line can be further parsed using [`parseSpec`](#parsespec). :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index 29694387266f..b71662df192c 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -25,7 +25,7 @@ sidebar_label: "Hadoop-based" :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index 360d6256869c..9f090b55a276 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -25,7 +25,7 @@ title: "Working with different versions of Apache Hadoop" :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index 666a3a3ec831..56aef3a69ee0 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Load from Apache Hadoop :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index 85dceb05a620..0da99cbc471b 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Kerberized HDFS deep storage :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use SQL-based ingestion instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. ::: From 8cf383cc821f3d923cc7132e74448ed1d24efb45 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Tue, 22 Jul 2025 10:06:52 -0700 Subject: [PATCH 05/10] add 18239 --- docs/ingestion/data-formats.md | 2 ++ docs/ingestion/hadoop.md | 2 ++ docs/operations/other-hadoop.md | 2 ++ docs/tutorials/tutorial-batch-hadoop.md | 2 ++ docs/tutorials/tutorial-kerberos-hadoop.md | 2 ++ 5 files changed, 10 insertions(+) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index 0dd8ee28dc6c..d4b17d382f1d 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -966,6 +966,8 @@ Each line can be further parsed using [`parseSpec`](#parsespec). Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) + ::: :::info diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index b71662df192c..aab3cae0edc5 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -27,6 +27,8 @@ sidebar_label: "Hadoop-based" Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) + ::: diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index 9f090b55a276..91065bda5a33 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -27,6 +27,8 @@ title: "Working with different versions of Apache Hadoop" Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) + ::: Apache Druid can interact with Hadoop in two ways: diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index 56aef3a69ee0..fae1c4cb14ae 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -27,6 +27,8 @@ sidebar_label: Load from Apache Hadoop Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) + ::: This tutorial shows you how to load data files into Apache Druid using a remote Hadoop cluster. diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index 0da99cbc471b..ea5752b87a11 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -27,6 +27,8 @@ sidebar_label: Kerberized HDFS deep storage Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) + ::: From 9355fb4c24bcdb1d7345841ea31435b09ad9d51d Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 6 Aug 2025 14:40:07 -0700 Subject: [PATCH 06/10] Apply suggestions from code review Co-authored-by: Victoria Lim --- docs/ingestion/data-formats.md | 6 +++--- docs/ingestion/hadoop.md | 2 +- docs/operations/other-hadoop.md | 2 +- docs/tutorials/cluster.md | 2 +- docs/tutorials/index.md | 2 +- docs/tutorials/tutorial-batch-hadoop.md | 2 +- docs/tutorials/tutorial-kerberos-hadoop.md | 2 +- 7 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index d4b17d382f1d..22a42f306e27 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -964,14 +964,14 @@ Each line can be further parsed using [`parseSpec`](#parsespec). :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. -You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) +You must now explicitly opt in to using the deprecated `index_hadoop` task type. To opt in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239). ::: :::info - You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser. +You need to include [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser. See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid ::: diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index aab3cae0edc5..e64c8c458d8e 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -25,7 +25,7 @@ sidebar_label: "Hadoop-based" :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index 91065bda5a33..04101ab10f9c 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -25,7 +25,7 @@ title: "Working with different versions of Apache Hadoop" :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/tutorials/cluster.md b/docs/tutorials/cluster.md index 8ef37a01196e..cd435c5e1ceb 100644 --- a/docs/tutorials/cluster.md +++ b/docs/tutorials/cluster.md @@ -133,7 +133,7 @@ The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) has info We recommend running your favorite Linux distribution. You will also need -* [17](../operations/java.md) +* [Java 17](../operations/java.md) * Python 3 :::info diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index c7d6fd091992..f55b26e60f6c 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -40,7 +40,7 @@ You can follow these steps on a relatively modest machine, such as a workstation The software requirements for the installation machine are: * Linux, Mac OS X, or other Unix-like OS. (Windows is not supported) -* [17](../operations/java.md) +* [Java 17](../operations/java.md) * Python 3 * Perl 5 diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index fae1c4cb14ae..aca76645ad03 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Load from Apache Hadoop :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index ea5752b87a11..5ccd0cd5789c 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Kerberized HDFS deep storage :::caution[Deprecated] -Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) From bf192cbc1d3d888c6a6d1acf9a5875317aadfa37 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 8 Aug 2025 10:20:39 -0700 Subject: [PATCH 07/10] Update docs/ingestion/hadoop.md Co-authored-by: Lucas Capistrant --- docs/ingestion/hadoop.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index e64c8c458d8e..af86c38bde3b 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -25,7 +25,7 @@ sidebar_label: "Hadoop-based" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) From 44ff2c1cdd288da7cdb56b7ef0a1e5a891963307 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 8 Aug 2025 10:22:13 -0700 Subject: [PATCH 08/10] Apply suggestions from code review --- docs/ingestion/data-formats.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index 22a42f306e27..a63a33682987 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -964,9 +964,9 @@ Each line can be further parsed using [`parseSpec`](#parsespec). :::caution[Deprecated] -Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) -You must now explicitly opt in to using the deprecated `index_hadoop` task type. To opt in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239). +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) ::: From 42820b397418b7a61559e00abfb23222fb767e19 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 8 Aug 2025 10:23:23 -0700 Subject: [PATCH 09/10] Apply suggestions from code review --- docs/operations/other-hadoop.md | 2 +- docs/tutorials/tutorial-batch-hadoop.md | 2 +- docs/tutorials/tutorial-kerberos-hadoop.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index 04101ab10f9c..20a55cd9587d 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -25,7 +25,7 @@ title: "Working with different versions of Apache Hadoop" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index aca76645ad03..762c5e81caf6 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Load from Apache Hadoop :::caution[Deprecated] -Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index 5ccd0cd5789c..37ca8c388a17 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Kerberized HDFS deep storage :::caution[Deprecated] -Hadoop-based ingestion is deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) From 0f1a4d90ca4d9271fb5d1249e6cc4a196118f917 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 8 Aug 2025 10:27:44 -0700 Subject: [PATCH 10/10] Apply suggestions from code review --- docs/ingestion/data-formats.md | 2 +- docs/ingestion/hadoop.md | 2 +- docs/operations/other-hadoop.md | 2 +- docs/tutorials/tutorial-batch-hadoop.md | 2 +- docs/tutorials/tutorial-kerberos-hadoop.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index a63a33682987..a595b1c6ded7 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -964,7 +964,7 @@ Each line can be further parsed using [`parseSpec`](#parsespec). :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index af86c38bde3b..3dd738f78910 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -25,7 +25,7 @@ sidebar_label: "Hadoop-based" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index 20a55cd9587d..a82b331de4bb 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -25,7 +25,7 @@ title: "Working with different versions of Apache Hadoop" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index 762c5e81caf6..c75fc7d35e89 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Load from Apache Hadoop :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index 37ca8c388a17..cace9b8794f4 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -25,7 +25,7 @@ sidebar_label: Kerberized HDFS deep storage :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion patterns, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)