From ec18e5e5bcbfe8028e2ff3f2436ba757f9af24c4 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Thu, 10 Jul 2025 10:44:01 -0700 Subject: [PATCH 01/17] docs: 34.0.0 release notes --- docs/release-info/release-notes.md | 228 ++++++++++++++++++++++++++++- 1 file changed, 220 insertions(+), 8 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index d0573c5cbcc7..1f4fcee13627 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -24,7 +24,7 @@ title: "Release notes" -Apache Druid \{\{DRUIDVERSION}} contains over $NUMBER_FEATURES new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from $NUMBER_OF_CONTRIBUTORS contributors. +Apache Druid 34.0.0 contains over $NUMBER_FEATURES new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from $NUMBER_OF_CONTRIBUTORS contributors. +You can restore the previous behavior by setting the query context parameter `useLexicographicTopN` to `true`. Behavior for `useApproximateTopN` is unchanged, and the default remains `true`. -In Druid 32.0.0, the front coded dictionaries feature will be turned on by default. Front-coded dictionaries reduce storage and improve performance by optimizing for strings where the front part looks similar. +#### `IS_INCREMENTAL_HANDOFF_SUPPORTED` config removed -Once this feature is on, you cannot easily downgrade to an earlier version that does not support the feature. +Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to LegacyKafkaIndexTaskRunner in versions earlier than 0.16.0, which has since been removed. -For more information, see [Migration guide: front-coded dictionaries](./migr-front-coded-dict.md). +#### `useMaxMemoryEstimates` config removed -If you're already using this feature, you don't need to take any action. +Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much more accurate memory estimate that was introduced in Druid 0.23.0. That more accurate method is the only available method now. The config has defaulted to false for several releases. +[#17936](https://github.com/apache/druid/pull/17936) ### Incompatible changes ### Developer notes +- Some maven plugins no longer use hardcoded version numbers. Instead, they now pull from the Apache parent [#18138](https://github.com/apache/druid/pull/18138) + #### Dependency updates -The following dependencies have had their versions bumped: \ No newline at end of file +The following dependencies have had their versions bumped: + +- `aws.sdk` for Java from `1.12.638` to `1.12.784` [#18068](https://github.com/apache/druid/pull/18068) +- `fabric8` from `6.7.2` to `6.13.1`. The updated `fabric8` version uses `Vert.x` as an HTTP client instead of `OkHttp` [#17913](https://github.com/apache/druid/pull/17913) +- Curator from `5.5.0` to `5.8.0` [#17857](https://github.com/apache/druid/pull/17857) +- `com.fasterxml.jackson.core` from `2.12.7.1` to `2.18.4` [#18013](https://github.com/apache/druid/pull/18013) +- `fabric8` from `6.13.1` to `7.2.0` +- `org.apache.parquet:parquet-avro` from `1.15.1` to `1.15.2` [#18131](https://github.com/apache/druid/pull/18131) +- `commons-beanutils:commons-beanutils` from `1.9.4` to `1.11.0` [#18132](https://github.com/apache/druid/pull/18132) +- \ No newline at end of file From 7bdf76fe0d92ec0bc97e535ebb88a750cf67a27d Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 16 Jul 2025 22:35:47 -0700 Subject: [PATCH 02/17] july 9 --- docs/release-info/release-notes.md | 43 ++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 1f4fcee13627..682824a82999 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -112,7 +112,12 @@ As part of this feature, [new metrics](#overlord-kill-task-metrics) have been ad [#18028](https://github.com/apache/druid/pull/18028) -### Dart improvements +### Preferred tier selection +You can now configure the Broker service to prefer Historicals on a specific tier. This can help ensure Druid executes queries within the same availability zone if you have Druid deployed across multiple availability zones. + +[#18136](https://github.com/apache/druid/pull/18136) + +### Dart improvements NEED TO WRITE Dart specific endpoints have been removed and folded into SqlResource. [#18003](https://github.com/apache/druid/pull/18003) Added a new engine QueryContext parameter. The value can be native or msq-dart. The value determines the engine used to run the query. The default value is native. [#18003](https://github.com/apache/druid/pull/18003) @@ -139,6 +144,8 @@ The web console supports using SET statements to specify query context parameter - You can now assign tiered replications to tiers that aren't currently online [#18050](https://github.com/apache/druid/pull/18050) - You can now filter tasks by the error in the Task view [#18057](https://github.com/apache/druid/pull/18057) +- Improved SQL autocomplete and added JSON autocomplete [#18126](https://github.com/apache/druid/pull/18126) +- Updated the web console to use the Overlord APIs instead of Coordinator APIs when managing segments, such as marking them as unused [#18172](https://github.com/apache/druid/pull/18172) ### Ingestion @@ -181,6 +188,7 @@ You can use a segment metadata query to find the list of projections attached to #### Other querying improvements +- You can now perform big decimal aggregations using the MSQ task engine [#18164](https://github.com/apache/druid/pull/18164) - Changed `MV_OVERLAP` and `MV_CONTAINS` functions now aligns more closely with the native `inType` filter [#18084](https://github.com/apache/druid/pull/18084) - Improved query handling when some segments are missing on Historicals. Druid no longer incorrectly returns partial results [#18025](https://github.com/apache/druid/pull/18025) @@ -207,6 +215,8 @@ The following Coordinator APIs are now available: - Added the optional `taskCountStart` property to the lag based auto scaler. Use it to specify the initial task count for the supervisor to be submitted with [#17900](https://github.com/apache/druid/pull/17900) - Added audit logs for the following `BasicAuthorizerResource` update methods: `authorizerUserUpdateListener`, `authorizerGroupMappingUpdateListener`, `authorizerUpdateListener` (deprecated) [#17916](https://github.com/apache/druid/pull/17916) +- Added support for streaming task logs to Indexers [#18170](https://github.com/apache/druid/pull/18170) +- Improved how MSQ task engine tasks get canceled, speeding it up and freeing up resources sooner [#18095](https://github.com/apache/druid/pull/18095) ### Data management @@ -269,17 +279,39 @@ The following metrics that correspond to Kafka metrics have been added: [#18028](https://github.com/apache/druid/pull/18028) +#### MSQ task engine metrics + +The MSQ task engine now supports the following metrics: + +- `query/time`: Reported by controller and worker at the end of the query. +- `query/cpu/time`: Reported by each worker at the end of the query. + +Additionally, MSQ task engine metrics now include the following dimensions: + +- `queryId` +- `sqlQueryId` +- `engine`: Denotes the engine used for the query, `msq-dart` or `msq-task`. +- `dartQueryId`: (Dart only) +- `type`: Always `msq` +- `dataSource` +- `interval` +- `duration` +- `success` + +[#18121](https://github.com/apache/druid/pull/18121) + #### Other metrics and monitoring improvements - Added the `description` dimension for the `task/run/time` metric - Added a metric for how long it takes to complete an autoscale action: `task/autoScaler/scaleActionTime` [#17971](https://github.com/apache/druid/pull/17971) - Added a `taskType` dimension to Overlord-emitted task count metrics [#18032](https://github.com/apache/druid/pull/18032) - Added the following groupBy metrics to the Prometheus emitter: `mergeBuffer/used`, `mergeBuffer/acquisitionTimeNs`, `mergeBuffer/acquisition`, `groupBy/spilledQueries`, `groupBy/spilledBytes`, and `groupBy/mergeDictionarySize` [#17929](https://github.com/apache/druid/pull/17929) -- Changed the logging level for query cancelations from `warn` to `info` to reduce noise [#18046](https://github.com/apache/druid/pull/18046) -- Changed query logging so that SQL queries that can't be3 parsed are no longer logged and don't emit metrics [#18102](https://github.com/apache/druid/pull/18102) +- Changed the logging level for query cancellation from `warn` to `info` to reduce noise [#18046](https://github.com/apache/druid/pull/18046) +- Changed query logging so that SQL queries that can't be parsed are no longer logged and don't emit metrics [#18102](https://github.com/apache/druid/pull/18102) - Changed the logging level for lifecycle from `debug` to `info` [#17884](https://github.com/apache/druid/pull/17884) - Added `groupId` and `tasks` to Overlord logs [#17980](https://github.com/apache/druid/pull/17980) - You can now use the `druid.request.logging.rollPeriod` to configure the log rotation period (default 1 day) [#17976](https://github.com/apache/druid/pull/17976) +- Improved metric emission on the Broker to include per-query result-level caching (`query/resultCache/hit` returning `1` means the cache was used) [#18063](https://github.com/apache/druid/pull/18063) ### Extensions @@ -303,7 +335,7 @@ You can restore the previous behavior by setting the query context parameter `us #### `IS_INCREMENTAL_HANDOFF_SUPPORTED` config removed -Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to LegacyKafkaIndexTaskRunner in versions earlier than 0.16.0, which has since been removed. +Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to `LegacyKafkaIndexTaskRunner` in versions earlier than 0.16.0, which has since been removed. #### `useMaxMemoryEstimates` config removed @@ -315,12 +347,13 @@ Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much ### Developer notes -- Some maven plugins no longer use hardcoded version numbers. Instead, they now pull from the Apache parent [#18138](https://github.com/apache/druid/pull/18138) +- Some maven plugins no longer use hard-coded version numbers. Instead, they now pull from the Apache parent [#18138](https://github.com/apache/druid/pull/18138) #### Dependency updates The following dependencies have had their versions bumped: +- `apache.kafka` from `3.9.0` to `3.9.1` [#18178](https://github.com/apache/druid/pull/18178) - `aws.sdk` for Java from `1.12.638` to `1.12.784` [#18068](https://github.com/apache/druid/pull/18068) - `fabric8` from `6.7.2` to `6.13.1`. The updated `fabric8` version uses `Vert.x` as an HTTP client instead of `OkHttp` [#17913](https://github.com/apache/druid/pull/17913) - Curator from `5.5.0` to `5.8.0` [#17857](https://github.com/apache/druid/pull/17857) From 7a7c6125f1d0b6622f6432d31bb2580d4ed7e8e3 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Mon, 21 Jul 2025 15:45:39 -0700 Subject: [PATCH 03/17] Apply suggestions from code review Co-authored-by: Karan Kumar Co-authored-by: Kashif Faraz --- docs/release-info/release-notes.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 682824a82999..35c38eae1bc7 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -93,9 +93,9 @@ When you query your data using the native query engine, you can prefer (`preferC As part of this change, new Coordinator APIs are available. For more information, see [Coordinator APIs for clones](#coordinator-apis-for-clones). [#17863](https://github.com/apache/druid/pull/17863) [#17899](https://github.com/apache/druid/pull/17899) [#17956](https://github.com/apache/druid/pull/17956) -### Overlord kill tasks +### Embedded kill tasks on the Overlord (Experimental) -You can now run kill tasks directly on the Overlord itself. Running kill tasks on the Overlord provides the following benefits: +You can now run kill tasks directly on the Overlord itself. Embedded kill tasks provide several benefits as they: - Unused segments are killed as soon as they're eligible and are killed faster - Doesn't require a task slot @@ -160,7 +160,7 @@ The web console supports using SET statements to specify query context parameter ##### Multi-stream supervisors (experimental) -You can now use more than one supervisor to ingest data into the same datasource. Include the `spec.dataSchema.dataSource` field to help identify the supervisor. +You can now use more than one supervisor to ingest data into the same datasource. Use the `id` field to distinguish between supervisors ingesting into the same datasource (identified by `spec.dataSchema.dataSource` for streaming supervisors). When using this feature, make sure you set `useConcurrentLocks` to `true` for the `context` field in the supervisor spec. @@ -168,7 +168,7 @@ When using this feature, make sure you set `useConcurrentLocks` to `true` for th ##### Supervisors and the underlying input stream -Seekable stream supervisors (Kafka, Kinesis, and Rabbit) can no longer update the underlying input stream (such as a topic for Kafka) that is persisted for it. This action was previously allowed by the API, but it isn't fully supported by the underlying system. Going forward, a request to make such a change results in a 400 error from the Supervisor API that explains why it isn't allowed. +Seekable stream supervisors (Kafka, Kinesis, and Rabbit) can no longer be updated to ingest from a different input stream (such as a topic for Kafka). Since such a change is not fully supported by the underlying system, a request to make such a change will result in a 400 error. [#17955](https://github.com/apache/druid/pull/17955) [#17975](https://github.com/apache/druid/pull/17975) @@ -198,7 +198,7 @@ You can use a segment metadata query to find the list of projections attached to You can now configure a timeout for `index_parallel` and `compact` type tasks. Set the context parameter `subTaskTimeoutMillis` to the maximum time in milliseconds you want to wait before a subtask gets canceled. By default, there's no timeout. -Using this config helps parent tasks fail sooner instead of getting stuck and can free up tasks slots from zombie tasks. +Using this config helps parent tasks fail sooner instead of being stuck running zombie sub-tasks. [#18039](https://github.com/apache/druid/pull/18039) From 19d205b90ac4c7b3355bf951851fb061006c5a6b Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Mon, 21 Jul 2025 16:53:24 -0700 Subject: [PATCH 04/17] updates --- docs/release-info/release-notes.md | 87 +++++++++++++++++++++++------ docs/release-info/upgrade-notes.md | 88 ++++++++++++++++++++++++++++-- 2 files changed, 154 insertions(+), 21 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 35c38eae1bc7..9c4109d4a0c4 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -57,25 +57,69 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. -#### Improved HTTP endpoints +### Hadoop-based ingestion -You can now use raw SQL in the HTTP body for `/druid/v2/sql` endpoints. You can set `Content-Type` to `text/plain` instead of `application/json`, so you can provide raw text that isn't escaped. +Hadoop-based ingestion has been deprecated since Druid 32.0. You must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. + +To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. + +[#18239](https://github.com/apache/druid/pull/18239) + +### Use SET statements for query context parameters -[#17937](https://github.com/apache/druid/pull/17937) +You can now use SET statements to define query context parameters for a query through the [Druid console](#set-statements-in-the-druid-console) or the [API](#set-statements-with-the-api). -Additionally, SQL requests can now include multiple SET statements to build up context for the final statement. For example, the following query results in a statement that includes the `timeout`, `useCache`, `populateCache`, and `vectorize` query context parameters: +[#17894](https://github.com/apache/druid/pull/17894) [#17974](https://github.com/apache/druid/pull/17974) + +#### SET statements in the Druid console + +The web console now supports using SET statements to specify query context parameters. For example, if you include `SET timeout = 20000;` in your query, the timeout query context parameter is set: + +```sql +SET timeout = 20000; +SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2 +``` + +[#17966](https://github.com/apache/druid/pull/17966) + +#### SET statements with the API + +SQL queries issued to `/druid/v2/sql` can now include multiple SET statements to build up context for the final statement. For example, the following SQL query results includes the `timeout`, `useCache`, `populateCache`, `vectorize`, and `engine` query context parameters: ```sql SET timeout = 20000; SET useCache = false; SET populateCache = false; SET vectorize = 'force'; +SET engine = 'msq-dart' SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2 ``` +The API call for this query looks like the following: + +```curl +curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET timeout=20000; SET useCache=false; SET populateCache=false; SET engine='\''msq-dart'\'';SELECT user, commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia GROUP BY 1, 2 ORDER BY 2 DESC", + "resultFormat": "array", + "header": true, + "typesHeader": true, + "sqlTypesHeader": true +}' +``` + + + This improvement also works for INSERT and REPLACE queries using the MSQ task engine. Note that JDBC isn't supported. -[#17974](https://github.com/apache/druid/pull/17974) +#### Improved HTTP endpoints + +You can now use raw SQL in the HTTP body for `/druid/v2/sql` endpoints. You can set `Content-Type` to `text/plain` instead of `application/json`, so you can provide raw text that isn't escaped. + + + [#17937](https://github.com/apache/druid/pull/17937) + ### Cloning Historicals You can now configure clones for Historicals using the dynamic Coordinator configuration `cloneServers`. Cloned Historicals are useful for situations such as rolling updates where you want to launch a new Historical as a replacement for an existing one. @@ -95,19 +139,23 @@ As part of this change, new Coordinator APIs are available. For more information [#17863](https://github.com/apache/druid/pull/17863) [#17899](https://github.com/apache/druid/pull/17899) [#17956](https://github.com/apache/druid/pull/17956) ### Embedded kill tasks on the Overlord (Experimental) -You can now run kill tasks directly on the Overlord itself. Embedded kill tasks provide several benefits as they: +You can now run kill tasks directly on the Overlord itself. Embedded kill tasks provide several benefits; they: -- Unused segments are killed as soon as they're eligible and are killed faster -- Doesn't require a task slot -- Locked intervals are automatically skipped -- Configuration is simpler -- A large number of unused segments doesn't cause issues for them +- Kill segments as soon as they're eligible +- Don't take up tasks slot +- finish faster since they use optimized metadata queries and don't launch a new JVM +- Kill a small number of segments per task, ensuring locks on an interval aren't held for too long +- Skip locked intervals to avoid head-of-line blocking +- Require minimal configuration +- Can keep up with a large number of unused segments in the cluster This feature is controlled by the following configs: - `druid.manager.segments.killUnused.enabled` - Whether the feature is enabled or not - `druid.manager.segments.killUnused.bufferPeriod` - The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused. +T use embedded kill tasks, you need to have segment metadata cache enabled. + As part of this feature, [new metrics](#overlord-kill-task-metrics) have been added. [#18028](https://github.com/apache/druid/pull/18028) @@ -134,12 +182,6 @@ This section contains detailed release notes separated by areas. ### Web console -#### SET statements - -The web console supports using SET statements to specify query context parameters. For example, if you include `SET timeout = 20000;` in your query, the timeout query context parameter is set. - -[#17966](https://github.com/apache/druid/pull/17966) - #### Other web console improvements - You can now assign tiered replications to tiers that aren't currently online [#18050](https://github.com/apache/druid/pull/18050) @@ -325,6 +367,17 @@ Additionally, MSQ task engine metrics now include the following dimensions: ### Upgrade notes +#### Hadoop-based ingestion + +Hadoop-based ingestion has been deprecated since Druid 32.0. You must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. + +To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. + +Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. + + +[#18239](https://github.com/apache/druid/pull/18239) + #### `groupBy` and `topN` queries Druid now uses the `groupBy` native query type, rather than `topN`, for SQL queries that group diff --git a/docs/release-info/upgrade-notes.md b/docs/release-info/upgrade-notes.md index a12ffef38ad2..fd2412e2d03c 100644 --- a/docs/release-info/upgrade-notes.md +++ b/docs/release-info/upgrade-notes.md @@ -38,11 +38,91 @@ For more information, see [Migration guide: front-coded dictionaries](./migr-fro If you're already using this feature, you don't need to take any action. +## 34.0.0 + +### Upgrade notes + +#### Hadoop-based ingestion + +Hadoop-based ingestion has been deprecated since Druid 32.0. You must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. + +To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. + +Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. + +[#18239](https://github.com/apache/druid/pull/18239) + +#### `groupBy` and `topN` queries + +Druid now uses the `groupBy` native query type, rather than `topN`, for SQL queries that group +by and order by the same column, have `LIMIT`, and don't have `HAVING`. This speeds up execution +of such queries since `groupBy` is vectorized while `topN` is not. + +You can restore the previous behavior by setting the query context parameter `useLexicographicTopN` to `true`. Behavior for `useApproximateTopN` is unchanged, and the default remains `true`. + +#### `IS_INCREMENTAL_HANDOFF_SUPPORTED` config removed + +Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to `LegacyKafkaIndexTaskRunner` in versions earlier than 0.16.0, which has since been removed. + +#### `useMaxMemoryEstimates` config removed + +Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much more accurate memory estimate that was introduced in Druid 0.23.0. That more accurate method is the only available method now. The config has defaulted to false for several releases. + +[#17936](https://github.com/apache/druid/pull/17936) + +## 33.0.0 + +### Upgrade notes + +#### `useMaxMemoryEstimates` + +`useMaxMemoryEstimates` is now set to false for MSQ task engine tasks. Additionally, the property has been deprecated and will be removed in a future release. Setting this to false allows for better on-heap memory estimation. + +[#17792](https://github.com/apache/druid/pull/17792) + +#### Automatic kill tasks interval + +Automatic kill tasks are now limited to 30 days or fewer worth of segments per task. + +The previous behavior (no limit on interval per kill task) can be restored by setting `druid.coordinator.kill.maxInterval = P0D`. + +[#17680](https://github.com/apache/druid/pull/17680) + +#### Kubernetes deployments + +By default, the Docker image now uses the canonical hostname if you're running Druid in Kubernetes. Otherwise, it uses the IP address otherwise [#17697](https://github.com/apache/druid/pull/17697) + +#### Updated configs + +Various configs were deprecated in a previous release and have now been removed. The following table lists the removed configs and their replacements: + +| Removed config | Replacement config| +|-|-| +|`druid.processing.merge.task.initialYieldNumRows `|`druid.processing.merge.initialYieldNumRows`| +|`druid.processing.merge.task.targetRunTimeMillis`|`druid.processing.merge.targetRunTimeMillis`| +|`druid.processing.merge.task.smallBatchNumRows`|`druid.processing.merge.smallBatchNumRows`| +|`druid.processing.merge.pool.awaitShutdownMillis`| +|`druid.processing.merge.awaitShutdownMillis`| +|`druid.processing.merge.pool.parallelism`|`druid.processing.merge.parallelism`| +|`druid.processing.merge.pool.defaultMaxQueryParallelism`|`druid.processing.merge.defaultMaxQueryParallelism`| + +[#17776](https://github.com/apache/druid/pull/17776) + +#### Segment metadata cache configs + +If you need to downgrade to a version where Druid doesn't support the segment metadata cache, you must set the `druid.manager.segments.useCache` config to false or remove it prior to the upgrade. + +This feature is introduced in Druid 33.0. + +[#17653](https://github.com/apache/druid/pull/17653) + + + ## 32.0.0 ### Incompatible changes -### ANSI-SQL compatibility and query results +#### ANSI-SQL compatibility and query results Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: @@ -60,7 +140,7 @@ For more information about how to update your queries, see the [migration guide] [#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) -### Java support +#### Java support Java support in Druid has been updated: @@ -71,13 +151,13 @@ We recommend that you upgrade to Java 17. [#17466](https://github.com/apache/druid/pull/17466) -### Javascript support +#### Javascript support - Javascript tiered broker selector strategy and Javascript filters currently do not work on Java 17. ### Deprecations -### Hadoop-based ingestion +#### Hadoop-based ingestion Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. From f01ff0af1fa3094f3d5c77ba49fe9cc2f0ed858f Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Mon, 21 Jul 2025 16:55:54 -0700 Subject: [PATCH 05/17] address comment for 18121 --- docs/release-info/release-notes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 9c4109d4a0c4..26b19f583f1f 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -321,9 +321,9 @@ The following metrics that correspond to Kafka metrics have been added: [#18028](https://github.com/apache/druid/pull/18028) -#### MSQ task engine metrics +#### New task metrics -The MSQ task engine now supports the following metrics: +The MSQ task engine and Dart now support the following metrics: - `query/time`: Reported by controller and worker at the end of the query. - `query/cpu/time`: Reported by each worker at the end of the query. From 6fdb43f444179a1b4d5055bec90e4675d5bd131c Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Tue, 22 Jul 2025 10:07:37 -0700 Subject: [PATCH 06/17] Apply suggestions from code review Co-authored-by: Abhishek Radhakrishnan --- docs/release-info/release-notes.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 26b19f583f1f..7c6fba4f810c 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -154,11 +154,11 @@ This feature is controlled by the following configs: - `druid.manager.segments.killUnused.enabled` - Whether the feature is enabled or not - `druid.manager.segments.killUnused.bufferPeriod` - The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused. -T use embedded kill tasks, you need to have segment metadata cache enabled. +To use embedded kill tasks, you need to have segment metadata cache enabled. As part of this feature, [new metrics](#overlord-kill-task-metrics) have been added. -[#18028](https://github.com/apache/druid/pull/18028) +[#18028](https://github.com/apache/druid/pull/18028) [#18124](https://github.com/apache/druid/pull/18124) ### Preferred tier selection You can now configure the Broker service to prefer Historicals on a specific tier. This can help ensure Druid executes queries within the same availability zone if you have Druid deployed across multiple availability zones. @@ -232,7 +232,7 @@ You can use a segment metadata query to find the list of projections attached to - You can now perform big decimal aggregations using the MSQ task engine [#18164](https://github.com/apache/druid/pull/18164) - Changed `MV_OVERLAP` and `MV_CONTAINS` functions now aligns more closely with the native `inType` filter [#18084](https://github.com/apache/druid/pull/18084) -- Improved query handling when some segments are missing on Historicals. Druid no longer incorrectly returns partial results [#18025](https://github.com/apache/druid/pull/18025) +- Improved query handling when segments are temporarily missing on Historicals but not detected by Brokers. Druid doesn't return partial results incorrectly in such cases. [#18025](https://github.com/apache/druid/pull/18025) ### Cluster management From 7f19f08a6e39341d90004465fdaa3c4f31b03090 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Tue, 22 Jul 2025 10:17:51 -0700 Subject: [PATCH 07/17] add 17983 --- docs/release-info/release-notes.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 7c6fba4f810c..24d49e0dada7 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -227,6 +227,9 @@ You can use a segment metadata query to find the list of projections attached to [#18119](https://github.com/apache/druid/pull/18119) +#### `json_merge()` improvement + +`json_merge()` is now SQL-compliant when arguments are null. The function now returns null if any argument is null. For example, queries like SELECT JSON_MERGE(null, null) and SELECT JSON_MERGE(null, '{}') will return null instead of throwing an error. #### Other querying improvements From 62902870999a2d3f811d13459a49dbd907a68957 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 23 Jul 2025 14:19:32 -0700 Subject: [PATCH 08/17] Update docs/release-info/release-notes.md --- docs/release-info/release-notes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 24d49e0dada7..1c04f81b6156 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -151,8 +151,8 @@ You can now run kill tasks directly on the Overlord itself. Embedded kill tasks This feature is controlled by the following configs: -- `druid.manager.segments.killUnused.enabled` - Whether the feature is enabled or not -- `druid.manager.segments.killUnused.bufferPeriod` - The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused. +- `druid.manager.segments.killUnused.enabled` - Whether the feature is enabled or not (Defaults to `false`) +- `druid.manager.segments.killUnused.bufferPeriod` - The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused (Defaults to `P30D`) To use embedded kill tasks, you need to have segment metadata cache enabled. From 234a69482061e68cfc1fa44ae619ac728585262b Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 23 Jul 2025 16:16:03 -0700 Subject: [PATCH 09/17] july 23 --- docs/release-info/release-notes.md | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 1c04f81b6156..f244979bc0a1 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -30,7 +30,7 @@ Apache Druid 34.0.0 contains over $NUMBER_FEATURES new features, bug fixes, perf Replace {{MILESTONE}} with the correct milestone number. For example: https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A28.0+sort%3Aupdated-desc+ --> -See the [complete set of changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A{{MILESTONE}}+sort%3Aupdated-desc+) for additional details, including bug fixes. +See the [complete set of changes](https://github.com/apache/druid/milestone/62?closed=1) for additional details, including bug fixes. Review the [upgrade notes](#upgrade-notes) and [incompatible changes](#incompatible-changes) before you upgrade to Druid \{\{DRUIDVERSION}}. If you are upgrading across multiple versions, see the [Upgrade notes](upgrade-notes.md) page, which lists upgrade notes for the most recent Druid versions. @@ -109,15 +109,12 @@ curl --location 'http://HOST:PORT/druid/v2/sql' \ }' ``` - - This improvement also works for INSERT and REPLACE queries using the MSQ task engine. Note that JDBC isn't supported. #### Improved HTTP endpoints You can now use raw SQL in the HTTP body for `/druid/v2/sql` endpoints. You can set `Content-Type` to `text/plain` instead of `application/json`, so you can provide raw text that isn't escaped. - [#17937](https://github.com/apache/druid/pull/17937) ### Cloning Historicals @@ -137,6 +134,7 @@ When you query your data using the native query engine, you can prefer (`preferC As part of this change, new Coordinator APIs are available. For more information, see [Coordinator APIs for clones](#coordinator-apis-for-clones). [#17863](https://github.com/apache/druid/pull/17863) [#17899](https://github.com/apache/druid/pull/17899) [#17956](https://github.com/apache/druid/pull/17956) + ### Embedded kill tasks on the Overlord (Experimental) You can now run kill tasks directly on the Overlord itself. Embedded kill tasks provide several benefits; they: @@ -170,7 +168,9 @@ You can now configure the Broker service to prefer Historicals on a specific ti Dart specific endpoints have been removed and folded into SqlResource. [#18003](https://github.com/apache/druid/pull/18003) Added a new engine QueryContext parameter. The value can be native or msq-dart. The value determines the engine used to run the query. The default value is native. [#18003](https://github.com/apache/druid/pull/18003) -MSQ Dart is now able to query real-time tasks by setting the query context parameter includeSegmentSource to realtime, in a similar way to MSQ tasks. [#18076](https://github.com/apache/druid/pull/18076) +MSQ Dart can now query real-time tasks by setting the query context parameter `includeSegmentSource` to `realtime`, in a similar way to MSQ tasks. You can run synchronous or asynchronous queries. + +[#18076](https://github.com/apache/druid/pull/18076) [#18241](https://github.com/apache/druid/pull/18241) ### `SegmentMetadataCache` on the Coordinator @@ -182,11 +182,10 @@ This section contains detailed release notes separated by areas. ### Web console -#### Other web console improvements - - You can now assign tiered replications to tiers that aren't currently online [#18050](https://github.com/apache/druid/pull/18050) - You can now filter tasks by the error in the Task view [#18057](https://github.com/apache/druid/pull/18057) - Improved SQL autocomplete and added JSON autocomplete [#18126](https://github.com/apache/druid/pull/18126) +- Changed how the web console determines what functions are available, improving things like autocompletion [#18214](https://github.com/apache/druid/pull/18214) - Updated the web console to use the Overlord APIs instead of Coordinator APIs when managing segments, such as marking them as unused [#18172](https://github.com/apache/druid/pull/18172) ### Ingestion @@ -194,10 +193,6 @@ This section contains detailed release notes separated by areas. - Improved concurrency for batch and streaming ingestion tasks [#17828](https://github.com/apache/druid/pull/17828) - Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much more accurate memory estimate that was introduced in Druid 0.23.0. That more accurate method is the only available method now. The config has defaulted to false for several releases [#17936](https://github.com/apache/druid/pull/17936) -#### SQL-based ingestion - -##### Other SQL-based ingestion improvements - #### Streaming ingestion ##### Multi-stream supervisors (experimental) @@ -225,7 +220,7 @@ Seekable stream supervisors (Kafka, Kinesis, and Rabbit) can no longer be update You can use a segment metadata query to find the list of projections attached to a segment. -[#18119](https://github.com/apache/druid/pull/18119) +[#18119](https://github.com/apache/druid/pull/18119) [#18223](https://github.com/apache/druid/pull/) #### `json_merge()` improvement @@ -417,4 +412,4 @@ The following dependencies have had their versions bumped: - `fabric8` from `6.13.1` to `7.2.0` - `org.apache.parquet:parquet-avro` from `1.15.1` to `1.15.2` [#18131](https://github.com/apache/druid/pull/18131) - `commons-beanutils:commons-beanutils` from `1.9.4` to `1.11.0` [#18132](https://github.com/apache/druid/pull/18132) -- \ No newline at end of file +- `form-data` from `4.0.0` to `4.0.4` [18310](https://github.com/apache/druid/pull/18310) \ No newline at end of file From 06c197ddea4e2e5b394cdb633c9cc876c7dda68a Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 23 Jul 2025 16:21:56 -0700 Subject: [PATCH 10/17] fix typos --- docs/release-info/release-notes.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index f244979bc0a1..acccba49a982 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -165,10 +165,17 @@ You can now configure the Broker service to prefer Historicals on a specific ti ### Dart improvements NEED TO WRITE -Dart specific endpoints have been removed and folded into SqlResource. [#18003](https://github.com/apache/druid/pull/18003) -Added a new engine QueryContext parameter. The value can be native or msq-dart. The value determines the engine used to run the query. The default value is native. [#18003](https://github.com/apache/druid/pull/18003) +The Dart query engine now uses the `/druid/v2/sql` endpoint like other SQL query engines. The former Dart specific endpoint is no longer supported. To use Dart for a query, include the `engine` query context parameter and set it to `msq-dart`. -MSQ Dart can now query real-time tasks by setting the query context parameter `includeSegmentSource` to `realtime`, in a similar way to MSQ tasks. You can run synchronous or asynchronous queries. +[#18003](https://github.com/apache/druid/pull/18003) [#18003](https://github.com/apache/druid/pull/18003) + +Enabling Dart remains the same, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +Additionally, Dart can now query real-time tasks by setting the query context parameter `includeSegmentSource` to `realtime`, in a similar way to MSQ tasks. You can run synchronous or asynchronous queries. [#18076](https://github.com/apache/druid/pull/18076) [#18241](https://github.com/apache/druid/pull/18241) @@ -185,7 +192,7 @@ This section contains detailed release notes separated by areas. - You can now assign tiered replications to tiers that aren't currently online [#18050](https://github.com/apache/druid/pull/18050) - You can now filter tasks by the error in the Task view [#18057](https://github.com/apache/druid/pull/18057) - Improved SQL autocomplete and added JSON autocomplete [#18126](https://github.com/apache/druid/pull/18126) -- Changed how the web console determines what functions are available, improving things like autocompletion [#18214](https://github.com/apache/druid/pull/18214) +- Changed how the web console determines what functions are available, improving things like auto-completion [#18214](https://github.com/apache/druid/pull/18214) - Updated the web console to use the Overlord APIs instead of Coordinator APIs when managing segments, such as marking them as unused [#18172](https://github.com/apache/druid/pull/18172) ### Ingestion From 16d1f70bf36674a29d535f3c74f7199dc17e83ee Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Tue, 29 Jul 2025 16:44:47 -0700 Subject: [PATCH 11/17] add last prs --- docs/release-info/release-notes.md | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index acccba49a982..95d527b1ca1a 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -409,6 +409,8 @@ Removed the `useMaxMemoryEstimates` config. When set to false, Druid used a much #### Dependency updates +- Added `j2objc-annotations` [#18154](https://github.com/apache/druid/pull/18154) + The following dependencies have had their versions bumped: - `apache.kafka` from `3.9.0` to `3.9.1` [#18178](https://github.com/apache/druid/pull/18178) @@ -419,4 +421,24 @@ The following dependencies have had their versions bumped: - `fabric8` from `6.13.1` to `7.2.0` - `org.apache.parquet:parquet-avro` from `1.15.1` to `1.15.2` [#18131](https://github.com/apache/druid/pull/18131) - `commons-beanutils:commons-beanutils` from `1.9.4` to `1.11.0` [#18132](https://github.com/apache/druid/pull/18132) -- `form-data` from `4.0.0` to `4.0.4` [18310](https://github.com/apache/druid/pull/18310) \ No newline at end of file +- `form-data` from `4.0.0` to `4.0.4` [18310](https://github.com/apache/druid/pull/18310) +- `guava` from `32.0.1` to `32.1.3` [#18154](https://github.com/apache/druid/pull/18154) +- `confluent` from `6.2.12` to `6.2.15` [#18154](https://github.com/apache/druid/pull/18154) +- `netty4` from `4.1.118.Final` to `4.1.122.Final` [#18154](https://github.com/apache/druid/pull/18154) +- `slf4j` from `1.7.36` to `2.0.16` [#18154](https://github.com/apache/druid/pull/18154) +- `commons-logging` from `1.1.1` to `1.3.5` [#18154](https://github.com/apache/druid/pull/18154) +- `commons-lang3` to `3.17.0` [#18154](https://github.com/apache/druid/pull/18154) +- `commons-text` to `1.13.1` [#18154](https://github.com/apache/druid/pull/18154) +- `json-smart` to `2.5.2` [#18154](https://github.com/apache/druid/pull/18154) +- `kotlin-stdlib` to `1.9.25` [#18154](https://github.com/apache/druid/pull/18154) +- `joda-time` to `2.14.0` [#18154](https://github.com/apache/druid/pull/18154) +- `com.google.code.findbugs` to `3.0.2` [#18154](https://github.com/apache/druid/pull/18154) +- `log4j-slf4j` updated to `log4j-slf4j2` [#18154](https://github.com/apache/druid/pull/18154) +- `snappy-java` to `1.1.10.7` [#18154](https://github.com/apache/druid/pull/18154) +- `httpcore` to `4.4.16` [#18154](https://github.com/apache/druid/pull/18154) +- `asm` to `9.8` [#18154](https://github.com/apache/druid/pull/18154) +- `async-http-client` to `3.0.2` [#18154](https://github.com/apache/druid/pull/18154) +- `plexus-utils` to `3.1.0` [#18154](https://github.com/apache/druid/pull/18154) +- `equalsverifier` to `3.15.8` [#18154](https://github.com/apache/druid/pull/18154) +- `value-annotations` to `2.10.1` [#18154](https://github.com/apache/druid/pull/18154) +- `form-data` to `4.0.4` [#18310](https://github.com/apache/druid/pull/18310) \ No newline at end of file From 50033a65ba587816e092638d5cc3eb5e69c2d918 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 1 Aug 2025 10:28:17 -0700 Subject: [PATCH 12/17] cleanup --- docs/release-info/release-notes.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 95d527b1ca1a..c221197963d5 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -117,7 +117,7 @@ You can now use raw SQL in the HTTP body for `/druid/v2/sql` endpoints. You can [#17937](https://github.com/apache/druid/pull/17937) -### Cloning Historicals +### Cloning Historicals (experimental) You can now configure clones for Historicals using the dynamic Coordinator configuration `cloneServers`. Cloned Historicals are useful for situations such as rolling updates where you want to launch a new Historical as a replacement for an existing one. @@ -163,7 +163,7 @@ You can now configure the Broker service to prefer Historicals on a specific ti [#18136](https://github.com/apache/druid/pull/18136) -### Dart improvements NEED TO WRITE +### Dart improvements The Dart query engine now uses the `/druid/v2/sql` endpoint like other SQL query engines. The former Dart specific endpoint is no longer supported. To use Dart for a query, include the `engine` query context parameter and set it to `msq-dart`. @@ -175,7 +175,7 @@ Enabling Dart remains the same, add the following line to your `broker/runtime.p druid.msq.dart.enabled = true ``` -Additionally, Dart can now query real-time tasks by setting the query context parameter `includeSegmentSource` to `realtime`, in a similar way to MSQ tasks. You can run synchronous or asynchronous queries. +Additionally, Dart now queries real-time tasks by default. You can control this behavior by setting the query context parameter `includeSegmentSource` to `REALTIME` (default) or `NONE`, in a similar way to MSQ tasks. You can also run synchronous or asynchronous queries. [#18076](https://github.com/apache/druid/pull/18076) [#18241](https://github.com/apache/druid/pull/18241) From 0357c5e76bac63f728d357e29241721ec8f1e5cc Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 6 Aug 2025 14:44:33 -0700 Subject: [PATCH 13/17] Update docs/release-info/release-notes.md --- docs/release-info/release-notes.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index c221197963d5..d054da6514af 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -391,6 +391,7 @@ of such queries since `groupBy` is vectorized while `topN` is not. You can restore the previous behavior by setting the query context parameter `useLexicographicTopN` to `true`. Behavior for `useApproximateTopN` is unchanged, and the default remains `true`. +[#18074](https://github.com/apache/druid/pull/18074) #### `IS_INCREMENTAL_HANDOFF_SUPPORTED` config removed Removed the `IS_INCREMENTAL_HANDOFF_SUPPORTED` context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback to `LegacyKafkaIndexTaskRunner` in versions earlier than 0.16.0, which has since been removed. From ae5e0c268a7edf08c2ae45a0e92480e8a3095861 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Wed, 6 Aug 2025 14:45:32 -0700 Subject: [PATCH 14/17] Update docs/release-info/release-notes.md --- docs/release-info/release-notes.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index d054da6514af..b79e4b3ba17c 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -233,6 +233,7 @@ You can use a segment metadata query to find the list of projections attached to `json_merge()` is now SQL-compliant when arguments are null. The function now returns null if any argument is null. For example, queries like SELECT JSON_MERGE(null, null) and SELECT JSON_MERGE(null, '{}') will return null instead of throwing an error. +[#17983](https://github.com/apache/druid/pull/17983) #### Other querying improvements - You can now perform big decimal aggregations using the MSQ task engine [#18164](https://github.com/apache/druid/pull/18164) From 31685b5e6f0d2c4b264c7b2d4a98f1684e6bfe31 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 8 Aug 2025 10:17:04 -0700 Subject: [PATCH 15/17] Update docs/release-info/release-notes.md Co-authored-by: Lucas Capistrant --- docs/release-info/release-notes.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index b79e4b3ba17c..9e13c304efd9 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -159,7 +159,9 @@ As part of this feature, [new metrics](#overlord-kill-task-metrics) have been ad [#18028](https://github.com/apache/druid/pull/18028) [#18124](https://github.com/apache/druid/pull/18124) ### Preferred tier selection -You can now configure the Broker service to prefer Historicals on a specific tier. This can help ensure Druid executes queries within the same availability zone if you have Druid deployed across multiple availability zones. +You can now configure the Broker service to prefer Historicals on a specific tier. This is useful for across availability zone deployment. Brokers in one AZ select historicals in the same AZ by default but still keeps the ability to select historical nodes in another AZ if historicals in the same AZ are not available. + +To enable, set property `druid.broker.select.tier` to `perferred` in Broker runtime properties. You can then configure `druid.broker.select.tier.preferred.tier` to the tier you want each broker to prefer (i.e. for brokers in AZ1, you could set this to the tier name of your AZ1 historical servers). [#18136](https://github.com/apache/druid/pull/18136) From 869c7ff01fa7b438d765239d6995d1e71dc84d1e Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Fri, 8 Aug 2025 12:40:23 -0700 Subject: [PATCH 16/17] update hadoop blurbs --- docs/release-info/release-notes.md | 14 ++++++++------ docs/release-info/upgrade-notes.md | 7 ++++--- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index 9e13c304efd9..d574ddc2f6d1 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -59,7 +59,10 @@ This section contains important information about new and existing features. ### Hadoop-based ingestion -Hadoop-based ingestion has been deprecated since Druid 32.0. You must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. +Hadoop-based ingestion has been deprecated since Druid 32.0 and will be removed as early as Druid 35.0.0. +We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md). + +As part of this change, you must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. @@ -377,13 +380,12 @@ Additionally, MSQ task engine metrics now include the following dimensions: #### Hadoop-based ingestion -Hadoop-based ingestion has been deprecated since Druid 32.0. You must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. - -To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. - -Hadoop-based ingestion deprecated. Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +Hadoop-based ingestion has been deprecated since Druid 32.0 and will be removed as early as Druid 35.0.0. +We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md). +As part of this change, you must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. +To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. [#18239](https://github.com/apache/druid/pull/18239) #### `groupBy` and `topN` queries diff --git a/docs/release-info/upgrade-notes.md b/docs/release-info/upgrade-notes.md index fd2412e2d03c..6aad866df5b2 100644 --- a/docs/release-info/upgrade-notes.md +++ b/docs/release-info/upgrade-notes.md @@ -44,12 +44,13 @@ If you're already using this feature, you don't need to take any action. #### Hadoop-based ingestion -Hadoop-based ingestion has been deprecated since Druid 32.0. You must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. +Hadoop-based ingestion has been deprecated since Druid 32.0 and will be removed as early as Druid 35.0.0. -To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. +We recommend one of Druid's other supported ingestion methods, such as [SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md). -Use [SQL-based ingestion](../multi-stage-query/index.md) instead of MapReduce or [MiddleManager-less ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md) instead of YARN. +As part of this change, you must now opt-in to using the deprecated `index_hadoop` task type. If you don't do this, your Hadoop-based ingestion tasks will fail. +To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your `common.runtime.properties` file. [#18239](https://github.com/apache/druid/pull/18239) #### `groupBy` and `topN` queries From 588347b6d639f2cf42ab3b1c96581d3cfbc256bb Mon Sep 17 00:00:00 2001 From: Lucas Capistrant Date: Fri, 8 Aug 2025 18:04:23 -0500 Subject: [PATCH 17/17] Update docs/release-info/release-notes.md fix spellchecker --- docs/release-info/release-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-info/release-notes.md b/docs/release-info/release-notes.md index d574ddc2f6d1..b14f7c127d4e 100644 --- a/docs/release-info/release-notes.md +++ b/docs/release-info/release-notes.md @@ -164,7 +164,7 @@ As part of this feature, [new metrics](#overlord-kill-task-metrics) have been ad ### Preferred tier selection You can now configure the Broker service to prefer Historicals on a specific tier. This is useful for across availability zone deployment. Brokers in one AZ select historicals in the same AZ by default but still keeps the ability to select historical nodes in another AZ if historicals in the same AZ are not available. -To enable, set property `druid.broker.select.tier` to `perferred` in Broker runtime properties. You can then configure `druid.broker.select.tier.preferred.tier` to the tier you want each broker to prefer (i.e. for brokers in AZ1, you could set this to the tier name of your AZ1 historical servers). +To enable, set property `druid.broker.select.tier` to `perferred` in Broker runtime properties. You can then configure `druid.broker.select.tier.preferred.tier` to the tier you want each broker to prefer (i.e. for brokers in `AZ1`, you could set this to the tier name of your `AZ1` historical servers). [#18136](https://github.com/apache/druid/pull/18136)