Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions docs/ingestion/data-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -986,12 +986,11 @@ Each line can be further parsed using [`parseSpec`](#parsespec).

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)
Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion).

:::


:::info
You need to include [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser.

Expand Down Expand Up @@ -1053,6 +1052,14 @@ For example, using Avro Hadoop parser with custom reader's schema file:

### ORC Hadoop Parser

:::caution[Deprecated]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this config para repeated multiple times.
Can we create a single page, capture the contents of the para and have the config block explicitly point to the single page.
That way any changes need to be done only in one place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion).


:::


:::info
You need to include the [`druid-orc-extensions`](../development/extensions-core/orc.md) as an extension to use the ORC Hadoop Parser.
:::
Expand Down Expand Up @@ -1298,6 +1305,13 @@ setting `"mapreduce.job.user.classpath.first": "true"`, then this will not be an

### Parquet Hadoop Parser

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion).

:::


:::info
You need to include the [`druid-parquet-extensions`](../development/extensions-core/parquet.md) as an extension to use the Parquet Hadoop Parser.
:::
Expand Down Expand Up @@ -1442,6 +1456,13 @@ However, the Parquet Avro Hadoop Parser was the original basis for supporting th

### Parquet Avro Hadoop Parser

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion).

:::


:::info
Consider using the [Parquet Hadoop Parser](#parquet-hadoop-parser) over this parser to ingest
Parquet files. See [Parquet Hadoop Parser vs Parquet Avro Hadoop Parser](#parquet-hadoop-parser-vs-parquet-avro-hadoop-parser)
Expand Down
7 changes: 5 additions & 2 deletions docs/ingestion/hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,16 @@ sidebar_label: "Hadoop-based"

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)
Hadoop-based ingestion is deprecated and scheduled to be removed with Druid 37.0.0.

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)
We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239).

:::



Apache Hadoop-based batch ingestion in Apache Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running
instance of a Druid [Overlord](../design/overlord.md). Please refer to our [Hadoop-based vs. native batch comparison table](index.md#batch) for
comparisons between Hadoop-based, native batch (simple), and native batch (parallel) ingestion.
Expand Down
7 changes: 5 additions & 2 deletions docs/operations/other-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,15 @@ title: "Working with different versions of Apache Hadoop"

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)
Hadoop-based ingestion is deprecated and scheduled to be removed with Druid 37.0.0.

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)
We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239).

:::


Apache Druid can interact with Hadoop in two ways:

1. [Use HDFS for deep storage](../development/extensions-core/hdfs.md) using the druid-hdfs-storage extension.
Expand Down
79 changes: 79 additions & 0 deletions docs/release-info/upgrade-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,85 @@ For more information, see [Migration guide: front-coded dictionaries](./migr-fro

If you're already using this feature, you don't need to take any action.

## 34.0.0

### Upgrade notes

#### Hadoop-based ingestion

Hadoop-based ingestion has been deprecated since Druid 32.0 and is scheduled to be removed in Druid 37.0.0.

We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md).

As part of this change, you must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail.

To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file.
[#18239](https://github.com/apache/druid/pull/18239)

#### `groupBy` and `topN` queries

Druid now uses the `groupBy` native query type, rather than `topN`, for SQL queries that group
by and order by the same column, have `LIMIT`, and don't have `HAVING`. This speeds up execution
of such queries since `groupBy` is vectorized while `topN` is not.

You can restore the previous behavior by setting the query context parameter `useLexicographicTopN` to `true`. Behavior for `useApproximateTopN` is unchanged, and the default remains `true`.

#### `IS_INCREMENTAL_HANDOFF_SUPPORTED` config removed

Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to `LegacyKafkaIndexTaskRunner` in versions earlier than 0.16.0, which has since been removed.

#### `useMaxMemoryEstimates` config removed

Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much more accurate memory estimate that was introduced in Druid 0.23.0. That more accurate method is the only available method now. The config has defaulted to false for several releases.

[#17936](https://github.com/apache/druid/pull/17936)

## 33.0.0

### Upgrade notes

#### `useMaxMemoryEstimates`

`useMaxMemoryEstimates` is now set to false for MSQ task engine tasks. Additionally, the property has been deprecated and will be removed in a future release. Setting this to false allows for better on-heap memory estimation.

[#17792](https://github.com/apache/druid/pull/17792)

#### Automatic kill tasks interval

Automatic kill tasks are now limited to 30 days or fewer worth of segments per task.

The previous behavior (no limit on interval per kill task) can be restored by setting `druid.coordinator.kill.maxInterval = P0D`.

[#17680](https://github.com/apache/druid/pull/17680)

#### Kubernetes deployments

By default, the Docker image now uses the canonical hostname if you're running Druid in Kubernetes. Otherwise, it uses the IP address otherwise [#17697](https://github.com/apache/druid/pull/17697)

#### Updated configs

Various configs were deprecated in a previous release and have now been removed. The following table lists the removed configs and their replacements:

| Removed config | Replacement config|
|-|-|
|`druid.processing.merge.task.initialYieldNumRows `|`druid.processing.merge.initialYieldNumRows`|
|`druid.processing.merge.task.targetRunTimeMillis`|`druid.processing.merge.targetRunTimeMillis`|
|`druid.processing.merge.task.smallBatchNumRows`|`druid.processing.merge.smallBatchNumRows`|
|`druid.processing.merge.pool.awaitShutdownMillis`|
|`druid.processing.merge.awaitShutdownMillis`|
|`druid.processing.merge.pool.parallelism`|`druid.processing.merge.parallelism`|
|`druid.processing.merge.pool.defaultMaxQueryParallelism`|`druid.processing.merge.defaultMaxQueryParallelism`|

[#17776](https://github.com/apache/druid/pull/17776)

#### Segment metadata cache configs

If you need to downgrade to a version where Druid doesn't support the segment metadata cache, you must set the `druid.manager.segments.useCache` config to false or remove it prior to the upgrade.

This feature is introduced in Druid 33.0.

[#17653](https://github.com/apache/druid/pull/17653)

## 32.0.0

### Incompatible changes
Expand Down
5 changes: 2 additions & 3 deletions docs/tutorials/tutorial-batch-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,11 @@ sidebar_label: Load from Apache Hadoop

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)
Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion).

:::


This tutorial shows you how to load data files into Apache Druid using a remote Hadoop cluster.

For this tutorial, we'll assume that you've already completed the previous
Expand Down
4 changes: 1 addition & 3 deletions docs/tutorials/tutorial-kerberos-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,7 @@ sidebar_label: Kerberized HDFS deep storage

:::caution[Deprecated]

Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md)

You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239)
Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion).

:::

Expand Down