Limit max batch size for segment allocation, add docs by kfaraz · Pull Request #13503 · apache/druid

kfaraz · 2022-12-06T05:26:54Z

Follow up to #13369

Changes:

Limit max batch size in SegmentAllocationQueue to 500
Rename batchAllocationMaxWaitTime to batchAllocationWaitTime since it was a little misleading and actual wait time may exceed this configured value.
Minor refactor: remove usage of SegmentInsertAction from TaskToolbox

gianm · 2022-12-06T05:37:10Z

+
+### Batching `segmentAllocate` actions
+
+In a cluster with a large number of concurrent tasks (say > 1000), `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated.


I think this is going to be useful even at much lower concurrent task counts. I have seen people have issues with allocation timings around task rollover even with just 10s of tasks, requiring a scale-up of metadata store size.

gianm · 2022-12-06T05:38:00Z

+These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows:
+- `lockAcquire`: acquires a time-chunk lock on an interval for the task 
+- `lockRelease`: releases a lock acquired by the task on an interval
+- `segmentInsertion`: inserts segments into metadata store


Worth mentioning segmentTransactionalInsert too?

gianm · 2022-12-06T05:38:41Z

+|`task/action/log/time`|Milliseconds taken to log a task action to the audit log.| `dataSource`, `taskId`, `taskType`, `taskActionType`|< 1000 (subsecond)|
+|`task/action/run/time`|Milliseconds taken to execute a task action.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies from subsecond to a few seconds, based on action type.|
+|`task/action/success/count`|Number of task actions that were executed successfully during the emission period. Currently only being emitted for batched `segmentAllocate` actions.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies|
+|`task/action/failed/count`|Number of task actions that failed during the emission period. Currently only being emitted for batched `segmentAllocate` actions.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies|


Would be good to link all mentions of "batched segmentAllocate actions" to ../ingestion/tasks.md#batching-segmentallocate-actions.

gianm · 2022-12-06T05:39:05Z

 |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H|
 |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true|
+|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`.|false|
+|`druid.indexer.tasklock.batchAllocationWaitTime`|Milliseconds to wait between adding the first segment allocate action to a batch and executing that batch. The waiting time allows the batch to add more requests and thus improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled. <br> This value should be decreased __only if__ there are failures while allocating segments due to metadata operations on a very large batch.|500|


500 milliseconds already seems pretty short. If people are getting failures due to too-large batches, I wonder if reducing this will reliably fix their problem. Maybe it's better to suggest they turn off batch segment allocation completely.

Ideally, the batching would be robust to this case, and avoid generating oversized metadata operations. Is that possible?

Yes, we can limit the batch size to a (hard-coded) max value, say 500. We have seen clusters where batches of 700-800 have executed successfully.

gianm · 2022-12-06T05:39:32Z

 |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. "local" is mainly for internal testing while "metadata" is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|local|
 |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H|
 |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true|
+|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`.|false|


Would be good to link to ../ingestion/tasks.md#batching-segmentallocate-actions.

kfaraz · 2022-12-06T08:09:44Z

Thanks a lot for the feedback, @gianm !

I have made the following changes:

Limited max batch size to 500
Updated links and docs based on feedback
Added doc about segmentTransactionalInsert and removed doc of segmentInsert.
A segmentInsert action always translates to a segmentTransactionInsert action (see snippet below). It was being used only in the TaskToolbox. I have updated this usage.

I wonder if we should just deprecate and remove the segmentInsert action.
Please let me know if the changes make sense.

druid/indexing-service/src/main/java/org/apache/druid/indexing/common/actions/SegmentInsertAction.java

Lines 71 to 75 in 136322d

    
           @Override 
        
           public Set<DataSegment> perform(Task task, TaskActionToolbox toolbox) 
        
           { 
        
             return SegmentTransactionalInsertAction.appendAction(segments, null, null).perform(task, toolbox).getSegments(); 
        
           }

writer-jill

Added a few suggestions for style & clarity.

Co-authored-by: Jill Osborne <jill.osborne@imply.io>

gianm · 2022-12-07T03:16:08Z

+      }
+    }
+
+    throw new ISE("Allocation queue is at capacity, all batches are full.");


Is MAX_QUEUE_SIZE measured in batches or allocations? What happens if it gets hit? How likely is it that we will hit it?

MAX_QUEUE_SIZE applies to batches. I am using it in the loop above because that is the maximum possible number of batches, even for the same datasource, group id, etc

It is extremely unlikely to hit the max queue limit. It can happen if say, we have 2000 datasources, all doing streaming ingestion concurrently. If the queue is full, new incoming allocate actions are rejected.

Got it. Sounds good to me.

kfaraz · 2022-12-07T04:08:17Z

Merging this as the failure is unrelated. The same test has passed for jdk11.

Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`

maytasm · 2024-10-10T23:22:36Z

@kfaraz @gianm Any reason why we did not make batch size configurable? If we are hitting batch size limit then increasing batchAllocationWaitTime will not help us.

Docs: Add changes for batched segment allocation

991a40a

kfaraz added the Area - Documentation label Dec 6, 2022

kfaraz added this to the 25.0 milestone Dec 6, 2022

gianm reviewed Dec 6, 2022

View reviewed changes

Add limit on allocation batch size

e34b3e9

kfaraz changed the title ~~Docs: Add changes for batched segment allocation~~ Limit max batch size for segment allocation, add docs Dec 6, 2022

Minor fixes to docs

f3308da

kfaraz requested a review from gianm December 6, 2022 08:16

writer-jill reviewed Dec 6, 2022

View reviewed changes

Apply suggestions from code review

f58cc1e

Co-authored-by: Jill Osborne <jill.osborne@imply.io>

gianm reviewed Dec 7, 2022

View reviewed changes

gianm approved these changes Dec 7, 2022

View reviewed changes

kfaraz merged commit c7229fc into apache:master Dec 7, 2022

kfaraz deleted the docs_seg_alloc branch December 7, 2022 04:37

kfaraz mentioned this pull request Dec 7, 2022

[Backport] Limit max batch size for segment allocation, add docs (#13503) #13517

Merged

This was referenced Dec 18, 2022

[Draft] 25.0.0 Release Notes #13592

Closed

Add SegmentAllocationQueue to batch allocation actions #13369

Merged


		### Batching `segmentAllocate` actions

		In a cluster with a large number of concurrent tasks (say > 1000), `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated.

Conversation

kfaraz commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz commented Dec 6, 2022

Uh oh!

writer-jill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz commented Dec 7, 2022

Uh oh!

maytasm commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kfaraz commented Dec 6, 2022 •

edited

Loading