Skip to content

Limit max batch size for segment allocation, add docs#13503

Merged
kfaraz merged 4 commits intoapache:masterfrom
kfaraz:docs_seg_alloc
Dec 7, 2022
Merged

Limit max batch size for segment allocation, add docs#13503
kfaraz merged 4 commits intoapache:masterfrom
kfaraz:docs_seg_alloc

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Dec 6, 2022

Follow up to #13369

Changes:

  • Limit max batch size in SegmentAllocationQueue to 500
  • Rename batchAllocationMaxWaitTime to batchAllocationWaitTime since it was a little misleading and actual wait time may exceed this configured value.
  • Minor refactor: remove usage of SegmentInsertAction from TaskToolbox

@kfaraz kfaraz added this to the 25.0 milestone Dec 6, 2022
Comment thread docs/ingestion/tasks.md Outdated

### Batching `segmentAllocate` actions

In a cluster with a large number of concurrent tasks (say > 1000), `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be useful even at much lower concurrent task counts. I have seen people have issues with allocation timings around task rollover even with just 10s of tasks, requiring a scale-up of metadata store size.

Comment thread docs/ingestion/tasks.md Outdated
These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows:
- `lockAcquire`: acquires a time-chunk lock on an interval for the task
- `lockRelease`: releases a lock acquired by the task on an interval
- `segmentInsertion`: inserts segments into metadata store
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning segmentTransactionalInsert too?

Comment thread docs/operations/metrics.md Outdated
|`task/action/log/time`|Milliseconds taken to log a task action to the audit log.| `dataSource`, `taskId`, `taskType`, `taskActionType`|< 1000 (subsecond)|
|`task/action/run/time`|Milliseconds taken to execute a task action.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies from subsecond to a few seconds, based on action type.|
|`task/action/success/count`|Number of task actions that were executed successfully during the emission period. Currently only being emitted for batched `segmentAllocate` actions.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies|
|`task/action/failed/count`|Number of task actions that failed during the emission period. Currently only being emitted for batched `segmentAllocate` actions.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to link all mentions of "batched segmentAllocate actions" to ../ingestion/tasks.md#batching-segmentallocate-actions.

Comment thread docs/configuration/index.md Outdated
|`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H|
|`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true|
|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`.|false|
|`druid.indexer.tasklock.batchAllocationWaitTime`|Milliseconds to wait between adding the first segment allocate action to a batch and executing that batch. The waiting time allows the batch to add more requests and thus improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled. <br> This value should be decreased __only if__ there are failures while allocating segments due to metadata operations on a very large batch.|500|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500 milliseconds already seems pretty short. If people are getting failures due to too-large batches, I wonder if reducing this will reliably fix their problem. Maybe it's better to suggest they turn off batch segment allocation completely.

Ideally, the batching would be robust to this case, and avoid generating oversized metadata operations. Is that possible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can limit the batch size to a (hard-coded) max value, say 500. We have seen clusters where batches of 700-800 have executed successfully.

Comment thread docs/configuration/index.md Outdated
|`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. "local" is mainly for internal testing while "metadata" is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|local|
|`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H|
|`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true|
|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`.|false|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to link to ../ingestion/tasks.md#batching-segmentallocate-actions.

@kfaraz kfaraz changed the title Docs: Add changes for batched segment allocation Limit max batch size for segment allocation, add docs Dec 6, 2022
@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Dec 6, 2022

Thanks a lot for the feedback, @gianm !

I have made the following changes:

  • Limited max batch size to 500
  • Updated links and docs based on feedback
  • Added doc about segmentTransactionalInsert and removed doc of segmentInsert.
  • A segmentInsert action always translates to a segmentTransactionInsert action (see snippet below). It was being used only in the TaskToolbox. I have updated this usage.

I wonder if we should just deprecate and remove the segmentInsert action.
Please let me know if the changes make sense.

@Override
public Set<DataSegment> perform(Task task, TaskActionToolbox toolbox)
{
return SegmentTransactionalInsertAction.appendAction(segments, null, null).perform(task, toolbox).getSegments();
}

@kfaraz kfaraz requested a review from gianm December 6, 2022 08:16
Copy link
Copy Markdown
Contributor

@writer-jill writer-jill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few suggestions for style & clarity.

Comment thread docs/configuration/index.md Outdated
Comment thread docs/configuration/index.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
Co-authored-by: Jill Osborne <jill.osborne@imply.io>
}
}

throw new ISE("Allocation queue is at capacity, all batches are full.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MAX_QUEUE_SIZE measured in batches or allocations? What happens if it gets hit? How likely is it that we will hit it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_QUEUE_SIZE applies to batches. I am using it in the loop above because that is the maximum possible number of batches, even for the same datasource, group id, etc

It is extremely unlikely to hit the max queue limit. It can happen if say, we have 2000 datasources, all doing streaming ingestion concurrently. If the queue is full, new incoming allocate actions are rejected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Sounds good to me.

@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Dec 7, 2022

Merging this as the failure is unrelated. The same test has passed for jdk11.

@kfaraz kfaraz merged commit c7229fc into apache:master Dec 7, 2022
@kfaraz kfaraz deleted the docs_seg_alloc branch December 7, 2022 04:37
kfaraz added a commit to kfaraz/druid that referenced this pull request Dec 7, 2022
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
kfaraz added a commit that referenced this pull request Dec 8, 2022
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
@maytasm
Copy link
Copy Markdown
Contributor

maytasm commented Oct 10, 2024

@kfaraz @gianm Any reason why we did not make batch size configurable? If we are hitting batch size limit then increasing batchAllocationWaitTime will not help us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants