Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -997,7 +997,7 @@ These Overlord static configurations can be defined in the `overlord/runtime.pro
|`druid.indexer.server.maxConcurrentActions`|Maximum number of concurrent action requests (such as getting locks, creating segments, fetching segments etc) that the Overlord will process simultaneously. This prevents thread exhaustion while preserving access to health check endpoints. Set to `0` to disable quality of service filtering entirely. If not specified, defaults to `max(1, max(serverHttpNumThreads - 4, serverHttpNumThreads * 0.8))`.|`max(1, max(serverHttpNumThreads - 4, serverHttpNumThreads * 0.8))`|
|`druid.indexer.storage.type`|Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. One of `local` or `metadata`. `local` is mainly for internal testing while `metadata` is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|`local`|
|`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|`PT24H`|
|`druid.indexer.tasklock.forceTimeChunkLock`|**Setting this to false is still experimental**<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context-parameters). See [Task lock system](../ingestion/tasks.md#task-lock-system) for more details about locking in tasks.|true|
|`druid.indexer.tasklock.forceTimeChunkLock`|If set to true, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use, and may select the deprecated segment lock. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context-parameters). See [Task lock system](../ingestion/tasks.md#task-lock-system) for more details about locking in tasks.|true|
|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average `task/action/run/time`. See [batching `segmentAllocate` actions](../ingestion/tasks.md#batching-segmentallocate-actions) for details.|true|
|`druid.indexer.tasklock.batchAllocationWaitTime`|Number of milliseconds after Druid adds the first segment allocate action to a batch, until it executes the batch. Allows the batch to add more requests and improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled.|0|
|`druid.indexer.tasklock.batchAllocationNumThreads`|Number of worker threads to use for batch segment allocation. This represents the maximum number of allocation batches that can be processed in parallel for distinct datasources. Batches for a single datasource are always processed sequentially. This configuration takes effect only if `batchSegmentAllocation` is enabled.|5|
Expand Down
2 changes: 1 addition & 1 deletion docs/data-management/compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ See [Setting up a manual compaction task](./manual-compaction.md#setting-up-manu

## Data handling with compaction

During compaction, Druid overwrites the original set of segments with the compacted set. Druid also locks the segments for the time interval being compacted to ensure data consistency. By default, compaction tasks do not modify the underlying data. You can configure the compaction task to change the query granularity or add or remove dimensions in the compaction task. This means that the only changes to query results should be the result of intentional, not automatic, changes.
During compaction, Druid overwrites the original set of segments with the compacted set. Druid also locks the time interval being compacted to ensure data consistency. By default, compaction tasks do not modify the underlying data. You can configure the compaction task to change the query granularity or add or remove dimensions in the compaction task. This means that the only changes to query results should be the result of intentional, not automatic, changes.

You can set `dropExisting` in `ioConfig` to "true" in the compaction task to configure Druid to replace all existing segments fully contained by the interval. See the suggestion for reindexing with finer granularity under [Implementation considerations](../ingestion/native-batch.md#implementation-considerations) for an example.
:::info
Expand Down
2 changes: 1 addition & 1 deletion docs/design/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ Note that atomic replacement happens for each time chunk individually. If a batc
Typically, atomic replacement in Druid is based on a core set concept that works in conjunction with segment versions.
When a time chunk is overwritten, a new core set of segments is created with a higher version number. The core set must all be available before the Broker will use them instead of the older set. There can also only be one core set per version per time chunk. Druid will also only use a single version at a time per time chunk. Together, these properties provide Druid's atomic replacement guarantees.

Druid also supports an experimental segment locking mode that is activated by setting
Druid also supports a deprecated segment locking mode that is activated by setting
[`forceTimeChunkLock`](../ingestion/tasks.md#context-parameters) to false in the context of an ingestion task. In this case, Druid creates an atomic update group using the existing version for the time chunk, instead of creating a new core set with a new version number. There can be multiple atomic update groups with the same version number per time chunk. Each one replaces a specific set of earlier segments in the same time chunk and with the same version number. Druid will query the latest one that is fully available. This is a more powerful version of the core set concept, because it enables atomically replacing a subset of data for a time chunk, as well as doing atomic replacement and appending simultaneously.

If segments become unavailable due to multiple Historicals going offline simultaneously (beyond your replication factor), then Druid queries will include only the segments that are still available. In the background, Druid will reload these unavailable segments on other Historicals as quickly as possible, at which point they will be included in queries again.
2 changes: 1 addition & 1 deletion docs/ingestion/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ The reason for this is because a Kafka indexing task always appends new segments
The segments created with the segment locking have the _same_ major version and a _higher_ minor version.

:::info
The segment locking is still experimental. It could have unknown bugs which potentially lead to incorrect query results.
The segment locking has been deprecated. It could have unknown bugs which potentially lead to incorrect query results.
:::

To enable segment locking, you may need to set `forceTimeChunkLock` to `false` in the [task context](#context).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,9 @@
public enum LockGranularity
{
TIME_CHUNK,
/**
* @deprecated use TIME_CHUNK instead.
*/
@Deprecated
SEGMENT
}
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,18 @@
import java.util.Objects;

/**
* @deprecated segment lock is deprecated, NumberedOverwriteShardSpec should be only used for backward compatibility.
* <p>
* This shardSpec is used only for the segments created by overwriting tasks with segment lock enabled.
* When the segment lock is used, there is a concept of atomic update group which is a set of segments atomically
* becoming queryable together in Brokers. It is a similar concept to the core partition set (explained
* {@link NumberedShardSpec}), but different in a sense that there is only one core partition set per time chunk
* while there could be multiple atomic update groups in one time chunk.
*
* <p>
* The atomic update group has the root partition range and the minor version to determine the visibility between
* atomic update groups; the group of the highest minor version in the same root partition range becomes queryable
* when they have the same major version ({@link DataSegment#getVersion()}).
*
* <p>
* Note that this shardSpec is used only when you overwrite existing segments with segment lock enabled.
* If the task doesn't overwrite segments, it will use NumberedShardSpec instead even when segment lock is used.
* Similar to NumberedShardSpec, the size of the atomic update group is determined when the task publishes segments
Expand All @@ -51,6 +53,7 @@
*
* @see AtomicUpdateGroup
*/
@Deprecated
public class NumberedOverwriteShardSpec implements OverwriteShardSpec
{
private final int partitionId;
Expand Down
Loading