Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions docs/ingestion/data-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -962,11 +962,17 @@ Each line can be further parsed using [`parseSpec`](#parsespec).

### Avro Hadoop Parser

:::info
You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser.
:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)
Comment thread
317brian marked this conversation as resolved.

:::

:::info
You need to include [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser.

See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid
:::

Expand Down
4 changes: 0 additions & 4 deletions docs/ingestion/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,6 @@ Other common reasons that hand-off fails are as follows:

4) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the Coordinator logs have no errors.

## How do I get HDFS to work?

Make sure to include the `druid-hdfs-storage` and all the hadoop configuration, dependencies (that can be obtained by running command `hadoop classpath` on a machine where hadoop has been setup) in the classpath. And, provide necessary HDFS settings as described in [deep storage](../design/deep-storage.md) .

## How do I know when I can make query to Druid after submitting batch ingestion task?

You can verify if segments created by a recent ingestion task are loaded onto historicals and available for querying using the following workflow.
Expand Down
9 changes: 9 additions & 0 deletions docs/ingestion/hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@ sidebar_label: "Hadoop-based"
~ under the License.
-->

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)

:::


Apache Hadoop-based batch ingestion in Apache Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running
instance of a Druid [Overlord](../design/overlord.md). Please refer to our [Hadoop-based vs. native batch comparison table](index.md#batch) for
comparisons between Hadoop-based, native batch (simple), and native batch (parallel) ingestion.
Expand Down
5 changes: 2 additions & 3 deletions docs/ingestion/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,7 @@ your source system and stores it in data files called [_segments_](../design/seg
In general, segment files contain a few million rows each.

For most ingestion methods, the Druid [Middle Manager](../design/middlemanager.md) processes or the
[Indexer](../design/indexer.md) processes load your source data. The sole exception is Hadoop-based ingestion, which
uses a Hadoop MapReduce job on YARN.
[Indexer](../design/indexer.md) processes load your source data.

During ingestion, Druid creates segments and stores them in [deep storage](../design/deep-storage.md). Historical nodes load the segments into memory to respond to queries. For streaming ingestion, the Middle Managers and indexers can respond to queries in real-time with arriving data. For more information, see [Storage overview](../design/storage.md).

Expand Down Expand Up @@ -66,7 +65,7 @@ supervisor.
There are three available options for batch ingestion. Batch ingestion jobs are associated with a controller task that
runs for the duration of the job.

| **Method** | [Native batch](./native-batch.md) | [SQL](../multi-stage-query/index.md) | [Hadoop-based](hadoop.md) |
| **Method** | [Native batch](./native-batch.md) | [SQL](../multi-stage-query/index.md) | [Hadoop-based (deprecated)](hadoop.md) |
|---|-----|--------------|------------|
| **Controller task type** | `index_parallel` | `query_controller` | `index_hadoop` |
| **How you submit it** | Send an `index_parallel` spec to the [Tasks API](../api-reference/tasks-api.md). | Send an [INSERT](../multi-stage-query/concepts.md#load-data-with-insert) or [REPLACE](../multi-stage-query/concepts.md#overwrite-data-with-replace) statement to the [SQL task API](../api-reference/sql-ingestion-api.md#submit-a-query). | Send an `index_hadoop` spec to the [Tasks API](../api-reference/tasks-api.md). |
Expand Down
8 changes: 2 additions & 6 deletions docs/operations/java.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,7 @@ a Java runtime for Druid.

## Selecting a Java runtime

Druid fully supports Java 11 and Java 17. The project team recommends Java 17.

:::info
Note: Starting with Apache Druid 32.0.0, support for Java 8 has been removed.
:::
The project team recommends Java 17. Although you can use Java 11, support for it is deprecated.

The project team recommends using an OpenJDK-based Java distribution. There are many free and actively-supported
distributions available, including
Expand Down Expand Up @@ -74,7 +70,7 @@ Exception in thread "main" java.lang.ExceptionInInitializerError
```

Druid's out-of-box configuration adds these parameters transparently when you use the bundled `bin/start-druid` or
similar commands. In this case, there is nothing special you need to do to run successfully on Java 11 or 17. However,
similar commands. In this case, there is nothing special you need to do to run successfully. However,
if you have customized your Druid service launching system, you will need to ensure the required Java parameters are
added. There are many ways of doing this. Choose the one that works best for you.

Expand Down
8 changes: 8 additions & 0 deletions docs/operations/other-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ title: "Working with different versions of Apache Hadoop"
-->


:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)

:::

Apache Druid can interact with Hadoop in two ways:

1. [Use HDFS for deep storage](../development/extensions-core/hdfs.md) using the druid-hdfs-storage extension.
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) has info

We recommend running your favorite Linux distribution. You will also need

* [Java 11 or 17](../operations/java.md)
* Python 2 or Python 3
* [Java 17](../operations/java.md)
* Python 3

:::info
If needed, you can specify where to find Java using the environment variables
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ You can follow these steps on a relatively modest machine, such as a workstation
The software requirements for the installation machine are:

* Linux, Mac OS X, or other Unix-like OS. (Windows is not supported)
* [Java 11 or 17](../operations/java.md)
* Python 3 (preferred) or Python 2
* [Java 17](../operations/java.md)
* Python 3
* Perl 5

Java must be available. Either it is on your path, or set one of the `JAVA_HOME` or `DRUID_JAVA_HOME` environment variables.
Expand Down
7 changes: 7 additions & 0 deletions docs/tutorials/tutorial-batch-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ sidebar_label: Load from Apache Hadoop
~ under the License.
-->

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)

:::

This tutorial shows you how to load data files into Apache Druid using a remote Hadoop cluster.

Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/tutorial-kerberos-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ sidebar_label: Kerberized HDFS deep storage
~ under the License.
-->

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)

:::


## Hadoop Setup

Expand Down
1 change: 0 additions & 1 deletion docs/tutorials/tutorial-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ by following one of them:

* [Load a file](../tutorials/tutorial-batch.md)
* [Load stream data from Kafka](../tutorials/tutorial-kafka.md)
* [Load a file using Hadoop](../tutorials/tutorial-batch-hadoop.md)

There are various ways to run Druid SQL queries: from the web console, using a command line utility
and by posting the query by HTTP. We'll look at each of these.
Expand Down