apache · kfaraz · Jun 12, 2025 · Jun 11, 2025 · Jun 11, 2025 · Jun 12, 2025
diff --git a/docs/api-reference/tasks-api.md b/docs/api-reference/tasks-api.md
@@ -1601,7 +1601,7 @@ Content-Length: 134
 
 Manually clean up pending segments table in metadata storage for `datasource`. It returns a JSON object response with
 `numDeleted` for the number of rows deleted from the pending segments table. This API is used by the
-`druid.coordinator.kill.pendingSegments.on` [Coordinator setting](../configuration/index.md#coordinator-operation)
+`druid.coordinator.kill.pendingSegments.on` [Coordinator setting](../configuration/index.md#data-management)
 which automates this operation to perform periodically.
 
 #### URL

diff --git a/docs/configuration/index.md b/docs/configuration/index.md
@@ -878,9 +878,19 @@ These Coordinator static configurations can be defined in the `coordinator/runti
 |Property|Description|Default|
 |--------|-----------|-------|
 |`druid.coordinator.period`|The run period for the Coordinator. The Coordinator operates by maintaining the current state of the world in memory and periodically looking at the set of "used" segments and segments being served to make decisions about whether any changes need to be made to the data topology. This property sets the delay between each of these runs.|`PT60S`|
-|`druid.coordinator.period.indexingPeriod`|How often to send compact/merge/conversion tasks to the indexing service. It's recommended to be longer than `druid.manager.segments.pollDuration`|`PT1800S` (30 mins)|
 |`druid.coordinator.startDelay`|The operation of the Coordinator works on the assumption that it has an up-to-date view of the state of the world when it runs, the current ZooKeeper interaction code, however, is written in a way that doesn’t allow the Coordinator to know for a fact that it’s done loading the current state of the world. This delay is a hack to give it enough time to believe that it has all the data.|`PT300S`|
 |`druid.coordinator.load.timeout`|The timeout duration for when the Coordinator assigns a segment to a Historical service.|`PT15M`|
+|`druid.coordinator.balancer.strategy`|The [balancing strategy](../design/coordinator.md#balancing-segments-in-a-tier) used by the Coordinator to distribute segments among the Historical servers in a tier. The `cost` strategy distributes segments by minimizing a cost function, `diskNormalized` weights these costs with the disk usage ratios of the servers and `random` distributes segments randomly.|`cost`|
+|`druid.coordinator.loadqueuepeon.http.repeatDelay`|The start and repeat delay (in milliseconds) for the load queue peon, which manages the load/drop queue of segments for any server.|1 minute|
+|`druid.coordinator.loadqueuepeon.http.batchSize`|Number of segment load/drop requests to batch in one HTTP request. Note that it must be smaller than or equal to the `druid.segmentCache.numLoadingThreads` config on Historical service. If this value is not configured, the coordinator uses the value of the `numLoadingThreads` for the respective server. | `druid.segmentCache.numLoadingThreads` |
+|`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator service should act like an Overlord as well. This configuration allows users to simplify a Druid cluster by not having to deploy any standalone Overlord services. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also.|false|
+|`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord services and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
+
+##### Data management
+
+|Property|Description|Default|
+|--------|-----------|-------|
+|`druid.coordinator.period.indexingPeriod`|Period to run data management duties on the Coordinator including launching compact tasks and performing clean up of unused data. It is recommended to keep this value longer than `druid.manager.segments.pollDuration`.|`PT1800S` (30 mins)|
 |`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all datasources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true|
 |`druid.coordinator.kill.on`|Boolean flag to enable the Coordinator to submit a kill task for unused segments and delete them permanently from the metadata store and deep storage.|false|
 |`druid.coordinator.kill.period`| The frequency of sending kill tasks to the indexing service. The value must be greater than or equal to `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|Same as `druid.coordinator.period.indexingPeriod`|
@@ -889,11 +899,6 @@ These Coordinator static configurations can be defined in the `coordinator/runti
 |`druid.coordinator.kill.bufferPeriod`|The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused.|`P30D`|
 |`druid.coordinator.kill.maxSegments`|The number of unused segments to kill per kill task. This number must be greater than 0. This only applies when `druid.coordinator.kill.on=true`.|100|
 |`druid.coordinator.kill.maxInterval`|The largest interval, as an [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations), of segments to delete per kill task. Set to zero, e.g. `PT0S`, for unlimited. This only applies when `druid.coordinator.kill.on=true`.|`P30D`|
-|`druid.coordinator.balancer.strategy`|The [balancing strategy](../design/coordinator.md#balancing-segments-in-a-tier) used by the Coordinator to distribute segments among the Historical servers in a tier. The `cost` strategy distributes segments by minimizing a cost function, `diskNormalized` weights these costs with the disk usage ratios of the servers and `random` distributes segments randomly.|`cost`|
-|`druid.coordinator.loadqueuepeon.http.repeatDelay`|The start and repeat delay (in milliseconds) for the load queue peon, which manages the load/drop queue of segments for any server.|1 minute|
-|`druid.coordinator.loadqueuepeon.http.batchSize`|Number of segment load/drop requests to batch in one HTTP request. Note that it must be smaller than or equal to the `druid.segmentCache.numLoadingThreads` config on Historical service. If this value is not configured, the coordinator uses the value of the `numLoadingThreads` for the respective server. | `druid.segmentCache.numLoadingThreads` |
-|`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator service should act like an Overlord as well. This configuration allows users to simplify a Druid cluster by not having to deploy any standalone Overlord services. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also.|false|
-|`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord services and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
 
 ##### Metadata management
 
@@ -1186,6 +1191,16 @@ The following properties pertain to segment metadata caching on the Overlord tha
 |`druid.manager.segments.useIncrementalCache`|Denotes the usage mode of the segment metadata incremental cache. Possible modes are: (a) `never`: Cache is disabled. (b) `always`: Reads are always done from the cache. Service start-up will be blocked until cache has synced with the metadata store at least once. Transactions will block until cache has synced with the metadata store at least once after becoming leader. (c) `ifSynced`: Reads are done from the cache only if it has already synced with the metadata store. This mode does not block service start-up or transactions.|`never`|
 |`druid.manager.segments.pollDuration`|Duration (in ISO 8601 format) between successive syncs of the cache with the metadata store. This property is used only when `druid.manager.segments.useIncrementalCache` is set to `always` or `ifSynced`.|`PT1M` (1 minute)|
 
+##### Auto-kill unused segments (Experimental)
+
+These configs pertain to the new embedded mode of running [kill tasks on the Overlord](../data-management/delete.md#auto-kill-data-on-the-overlord-experimental).
+None of the configs that apply to [auto-kill performed by the Coordinator](../data-management/delete.md#auto-kill-data-using-coordinator-duties) are used by this feature.
+
+|Property|Description|Default|
+|--------|-----------|-------|
+|`druid.manager.segments.killUnused.enabled`|Boolean flag to enable auto-kill of eligible unused segments on the Overlord. This feature can be used only when [segment metadata caching](#segment-metadata-cache-experimental) is enabled on the Overlord and MUST NOT be enabled if `druid.coordinator.kill.on` is already set to `true` on the Coordinator.|`true`|
+|`druid.manager.segments.killUnused.bufferPeriod`|Period after which a segment marked as unused becomes eligible for auto-kill on the Overlord. This config is effective only if `druid.manager.segments.killUnused.enabled` is set to `true`.|`P30D` (30 days)|
+
 #### Overlord dynamic configuration
 
 The Overlord has dynamic configurations to tune how Druid assigns tasks to workers.

diff --git a/docs/data-management/automatic-compaction.md b/docs/data-management/automatic-compaction.md
@@ -85,7 +85,7 @@ For more details on each of the specs in an auto-compaction configuration, see [
 
 ## Auto-compaction using Coordinator duties
 
-You can control how often the Coordinator checks to see if auto-compaction is needed. The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks.
+You can control how often the Coordinator checks to see if auto-compaction is needed. The Coordinator [indexing period](../configuration/index.md#data-management), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks.
 The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled.
 This time period also affects other Coordinator duties such as cleanup of unused segments and stale pending segments.
 To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#change-compaction-frequency).

diff --git a/docs/data-management/delete.md b/docs/data-management/delete.md
@@ -22,7 +22,7 @@ title: "Data deletion"
   ~ under the License.
   -->
 
-## By time range, manually
+## Delete data for a time range manually
 
 Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports
 deleting data for time chunks by dropping segments. This is a fast, metadata-only operation.
@@ -42,17 +42,17 @@ For documentation on disabling segments using the Coordinator API, see the
 
 A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md).
 
-## By time range, automatically
+## Delete data automatically using drop rules
 
 Druid supports [load and drop rules](../operations/rule-configuration.md), which are used to define intervals of time
 where data should be preserved, and intervals where data should be discarded. Data that falls under a drop rule is
-marked unused, in the same manner as if you [manually mark that time range unused](#by-time-range-manually). This is a
+marked unused, in the same manner as if you [manually mark that time range unused](#delete-data-for-a-time-range-manually). This is a
 fast, metadata-only operation.
 
 Data that is dropped in this way is marked unused, but remains in deep storage. To permanently delete it, use a
 [`kill` task](#kill-task).
 
-## Specific records
+## Delete specific records
 
 Druid supports deleting specific records using [reindexing](update.md#reindex) with a filter. The filter specifies which
 data remains after reindexing, so it must be the inverse of the data you want to delete. Because segments must be
@@ -74,15 +74,15 @@ used to filter, modify, or enrich the data during the reindexing job.
 Data that is deleted in this way is marked unused, but remains in deep storage. To permanently delete it, use a [`kill`
 task](#kill-task).
 
-## Entire table
+## Delete an entire table
 
-Deleting an entire table works the same way as [deleting part of a table by time range](#by-time-range-manually). First,
+Deleting an entire table works the same way as [deleting part of a table by time range](#delete-data-for-a-time-range-manually). First,
 mark all segments unused using the Coordinator API or web console. Then, optionally, delete it permanently using a
 [`kill` task](#kill-task).
 
 <a name="kill-task"></a>
 
-## Permanently (`kill` task)
+## Delete data permanently using `kill` tasks
 
 Data that has been overwritten or soft-deleted still remains as segments that have been marked unused. You can use a
 `kill` task to permanently delete this data.
@@ -116,3 +116,33 @@ Some of the parameters used in the task payload are further explained below:
 **WARNING:** The `kill` task permanently removes all information about the affected segments from the metadata store and
 deep storage. This operation cannot be undone.
 
+### Auto-kill data using Coordinator duties
+
+Instead of submitting `kill` tasks manually to permanently delete data for a given interval, you can enable auto-kill of unused segments on the Coordinator.
+The Coordinator runs a duty periodically to identify intervals containing unused segments that are eligible for kill. It then launches a `kill` task for each of these intervals.
+
+Refer to [Data management on the Coordinator](../configuration/index.md#data-management) to configure auto-kill of unused segments on the Coordinator.
+
+### Auto-kill data on the Overlord (Experimental)
+
+:::info
+This is an experimental feature that:
+- Can be used only if [segment metadata caching](../configuration/index.md#segment-metadata-cache-experimental) is enabled on the Overlord.
+- MUST NOT be used if auto-kill of unused segments is already enabled on the Coordinator.
+:::
+
+This is an experimental feature to run kill tasks in an "embedded" mode on the Overlord itself.
+
+These embedded tasks offer several advantages over auto-kill performed by the Coordinator as they:
+- avoid a lot of unnecessary REST API calls to the Overlord from tasks or the Coordinator.
+- kill unused segments as soon as they become eligible.
+- run on the Overlord and do not take up task slots.
+- finish faster as they save on the overhead of launching a task process.
+- kill a small number of segments per task, to ensure that locks on an interval are not held for too long.
+- skip locked intervals to avoid head-of-line blocking in kill tasks.
+- require little to no configuration.
+- can keep up with a large number of unused segments in the cluster.
+- take advantage of the segment metadata cache on the Overlord.
+
+Refer to [Auto-kill unused segments on the Overlord](../configuration/index.md#auto-kill-unused-segments-experimental) to configure auto-kill of unused segments on the Overlord.
+See [Auto-kill metrics](../operations/metrics.md#auto-kill-unused-segments) for the metrics emitted by embedded kill tasks.
diff --git a/docs/operations/clean-metadata-store.md b/docs/operations/clean-metadata-store.md
@@ -79,16 +79,7 @@ Segment records and segments in deep storage become eligible for deletion when b
 - When they meet the eligibility requirement of kill task datasource configuration according to `killDataSourceWhitelist` set in the Coordinator dynamic configuration. See [Dynamic configuration](../configuration/index.md#dynamic-configuration).
 - When the `durationToRetain` time has passed since their creation.
 
-Kill tasks use the following configuration:
-- `druid.coordinator.kill.on`: When `true`, enables the Coordinator to submit a kill task for unused segments, which deletes them completely from metadata store and from deep storage.
-Only applies to the specified datasources in the dynamic configuration parameter `killDataSourceWhitelist`.
-If `killDataSourceWhitelist` is not set or empty, then kill tasks can be submitted for all datasources.
-- `druid.coordinator.kill.period`: Defines the frequency in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) for the cleanup job to check for and delete eligible segments. Defaults to `druid.coordinator.period.indexingPeriod`. Must be greater than or equal to `druid.coordinator.period.indexingPeriod`.
-- `druid.coordinator.kill.durationToRetain`: Defines the retention period in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) after creation that segments become eligible for deletion.
-- `druid.coordinator.kill.ignoreDurationToRetain`: A way to override `druid.coordinator.kill.durationToRetain`. When enabled, the coordinator considers all unused segments as eligible to be killed.
-- `druid.coordinator.kill.bufferPeriod`: Defines the amount of time that a segment must be unused before it can be permanently removed from metadata and deep storage. This serves as a buffer period to prevent data loss if data ends up being needed after being marked unused.
-- `druid.coordinator.kill.maxSegments`: Defines the maximum number of segments to delete per kill task.
-- `druid.coordinator.kill.maxInterval`: Defines the largest interval, as an [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations), of segments to delete per kill task. Set to zero, e.g. `PT0S`, for unlimited.
+Refer to [Data Management on Coordinator](../configuration/index.md#data-management) to configure auto-kill of unused segments on the Coordinator.
 
 ### Audit records
 

diff --git a/docs/operations/metrics.md b/docs/operations/metrics.md
@@ -354,6 +354,20 @@ The following metrics are emitted only when [segment metadata caching](../config
 |`segment/metadataCache/pending/updated`|Number of pending segments updated in the cache during the latest sync.|`dataSource`|
 |`segment/metadataCache/pending/skipped`|Number of unparseable pending segment records that were skipped in the latest sync.|`dataSource`|
 
+### Auto-kill unused segments
+
+These metrics are emitted only if [auto-kill of unused segments](../data-management/delete.md#auto-kill-data-on-the-overlord-experimental) is enabled on the Overlord.
+
+|Metric|Description|Dimensions|
+|------|-----------|----------|
+|`segment/killed/metadataStore/count`|Number of segments permanently deleted from the metadata store.|`taskId`, `groupId`, `taskType`(=`kill`), `dataSource`|
+|`segment/killed/deepStorage/count`|Number of segments permanently deleted from the deep storage.|`taskId`, `groupId`, `taskType`(=`kill`), `dataSource`|
+|`segment/kill/unusedIntervals/count`|Number of intervals containing unused segments for a given datasource.|`dataSource`|
+|`segment/kill/skippedIntervals/count`|Number of intervals that were skipped for kill due to being already locked by another task.|`taskId`, `groupId`, `taskType`(=`kill`), `dataSource`|
+|`segment/kill/queueReset/time`|Time taken in milliseconds to reset the kill queue.||
+|`segment/kill/queueProcess/time`|Time taken in milliseconds to fully process the kill queue.||
+|`segment/kill/jobsProcessed/count`|Number of jobs processed from the kill queue for a given datasource.|`dataSource`|
+
 ## Shuffle metrics (Native parallel task)
 
 The shuffle metrics can be enabled by adding `org.apache.druid.indexing.worker.shuffle.ShuffleMonitor` in `druid.monitoring.monitors`.