diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index b36ab87d6545..4bdf99a4ea22 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -986,12 +986,11 @@ Each line can be further parsed using [`parseSpec`](#parsespec). :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) - -You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) +Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion). ::: + :::info You need to include [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser. @@ -1053,6 +1052,14 @@ For example, using Avro Hadoop parser with custom reader's schema file: ### ORC Hadoop Parser +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion). + + +::: + + :::info You need to include the [`druid-orc-extensions`](../development/extensions-core/orc.md) as an extension to use the ORC Hadoop Parser. ::: @@ -1298,6 +1305,13 @@ setting `"mapreduce.job.user.classpath.first": "true"`, then this will not be an ### Parquet Hadoop Parser +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion). + +::: + + :::info You need to include the [`druid-parquet-extensions`](../development/extensions-core/parquet.md) as an extension to use the Parquet Hadoop Parser. ::: @@ -1442,6 +1456,13 @@ However, the Parquet Avro Hadoop Parser was the original basis for supporting th ### Parquet Avro Hadoop Parser +:::caution[Deprecated] + +Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion). + +::: + + :::info Consider using the [Parquet Hadoop Parser](#parquet-hadoop-parser) over this parser to ingest Parquet files. See [Parquet Hadoop Parser vs Parquet Avro Hadoop Parser](#parquet-hadoop-parser-vs-parquet-avro-hadoop-parser) diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md index 3dd738f78910..af416c877375 100644 --- a/docs/ingestion/hadoop.md +++ b/docs/ingestion/hadoop.md @@ -25,13 +25,16 @@ sidebar_label: "Hadoop-based" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated and scheduled to be removed with Druid 37.0.0. -You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) +We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) + +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239). ::: + Apache Hadoop-based batch ingestion in Apache Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running instance of a Druid [Overlord](../design/overlord.md). Please refer to our [Hadoop-based vs. native batch comparison table](index.md#batch) for comparisons between Hadoop-based, native batch (simple), and native batch (parallel) ingestion. diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md index a82b331de4bb..7d13a406c1ea 100644 --- a/docs/operations/other-hadoop.md +++ b/docs/operations/other-hadoop.md @@ -25,12 +25,15 @@ title: "Working with different versions of Apache Hadoop" :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) +Hadoop-based ingestion is deprecated and scheduled to be removed with Druid 37.0.0. -You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) +We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) + +You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239). ::: + Apache Druid can interact with Hadoop in two ways: 1. [Use HDFS for deep storage](../development/extensions-core/hdfs.md) using the druid-hdfs-storage extension. diff --git a/docs/release-info/upgrade-notes.md b/docs/release-info/upgrade-notes.md index a12ffef38ad2..440eb7d77efa 100644 --- a/docs/release-info/upgrade-notes.md +++ b/docs/release-info/upgrade-notes.md @@ -38,6 +38,85 @@ For more information, see [Migration guide: front-coded dictionaries](./migr-fro If you're already using this feature, you don't need to take any action. +## 34.0.0 + +### Upgrade notes + +#### Hadoop-based ingestion + +Hadoop-based ingestion has been deprecated since Druid 32.0 and is scheduled to be removed in Druid 37.0.0. + +We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md). + +As part of this change, you must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. + +To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. +[#18239](https://github.com/apache/druid/pull/18239) + +#### `groupBy` and `topN` queries + +Druid now uses the `groupBy` native query type, rather than `topN`, for SQL queries that group +by and order by the same column, have `LIMIT`, and don't have `HAVING`. This speeds up execution +of such queries since `groupBy` is vectorized while `topN` is not. + +You can restore the previous behavior by setting the query context parameter `useLexicographicTopN` to `true`. Behavior for `useApproximateTopN` is unchanged, and the default remains `true`. + +#### `IS_INCREMENTAL_HANDOFF_SUPPORTED` config removed + +Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to `LegacyKafkaIndexTaskRunner` in versions earlier than 0.16.0, which has since been removed. + +#### `useMaxMemoryEstimates` config removed + +Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much more accurate memory estimate that was introduced in Druid 0.23.0. That more accurate method is the only available method now. The config has defaulted to false for several releases. + +[#17936](https://github.com/apache/druid/pull/17936) + +## 33.0.0 + +### Upgrade notes + +#### `useMaxMemoryEstimates` + +`useMaxMemoryEstimates` is now set to false for MSQ task engine tasks. Additionally, the property has been deprecated and will be removed in a future release. Setting this to false allows for better on-heap memory estimation. + +[#17792](https://github.com/apache/druid/pull/17792) + +#### Automatic kill tasks interval + +Automatic kill tasks are now limited to 30 days or fewer worth of segments per task. + +The previous behavior (no limit on interval per kill task) can be restored by setting `druid.coordinator.kill.maxInterval = P0D`. + +[#17680](https://github.com/apache/druid/pull/17680) + +#### Kubernetes deployments + +By default, the Docker image now uses the canonical hostname if you're running Druid in Kubernetes. Otherwise, it uses the IP address otherwise [#17697](https://github.com/apache/druid/pull/17697) + +#### Updated configs + +Various configs were deprecated in a previous release and have now been removed. The following table lists the removed configs and their replacements: + +| Removed config | Replacement config| +|-|-| +|`druid.processing.merge.task.initialYieldNumRows `|`druid.processing.merge.initialYieldNumRows`| +|`druid.processing.merge.task.targetRunTimeMillis`|`druid.processing.merge.targetRunTimeMillis`| +|`druid.processing.merge.task.smallBatchNumRows`|`druid.processing.merge.smallBatchNumRows`| +|`druid.processing.merge.pool.awaitShutdownMillis`| +|`druid.processing.merge.awaitShutdownMillis`| +|`druid.processing.merge.pool.parallelism`|`druid.processing.merge.parallelism`| +|`druid.processing.merge.pool.defaultMaxQueryParallelism`|`druid.processing.merge.defaultMaxQueryParallelism`| + +[#17776](https://github.com/apache/druid/pull/17776) + +#### Segment metadata cache configs + +If you need to downgrade to a version where Druid doesn't support the segment metadata cache, you must set the `druid.manager.segments.useCache` config to false or remove it prior to the upgrade. + +This feature is introduced in Druid 33.0. + +[#17653](https://github.com/apache/druid/pull/17653) + ## 32.0.0 ### Incompatible changes diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md index c75fc7d35e89..a39957368855 100644 --- a/docs/tutorials/tutorial-batch-hadoop.md +++ b/docs/tutorials/tutorial-batch-hadoop.md @@ -25,12 +25,11 @@ sidebar_label: Load from Apache Hadoop :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) - -You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) +Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion). ::: + This tutorial shows you how to load data files into Apache Druid using a remote Hadoop cluster. For this tutorial, we'll assume that you've already completed the previous diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md b/docs/tutorials/tutorial-kerberos-hadoop.md index cace9b8794f4..0ec798e34a39 100644 --- a/docs/tutorials/tutorial-kerberos-hadoop.md +++ b/docs/tutorials/tutorial-kerberos-hadoop.md @@ -25,9 +25,7 @@ sidebar_label: Kerberized HDFS deep storage :::caution[Deprecated] -Hadoop-based ingestion is deprecated. We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) - -You must now explicitly opt-in to using the deprecated `index_hadoop` task type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. For more information, see [#18239](https://github.com/apache/druid/pull/18239) +Hadoop-based ingestion is deprecated. For more information, see the [upgrade notes](../release-info/upgrade-notes.md#hadoop-based-ingestion). :::