Make SegmentAllocationQueue multithreaded by kfaraz · Pull Request #18098 · apache/druid

kfaraz · 2025-06-09T07:01:07Z

Description

Follow up to #17390

Once we start maintaining a TaskLockbox for each datasource, the single-threaded design of the SegmentAllocationQueue would become the bottleneck.

This patch makes SegmentAllocationQueue multithreaded so that allocation for multiple datasources
can happen in parallel.

Non-batch segment allocation is already multithreaded as each allocation runs on its individual jetty thread.

Changes

Add config druid.indexer.tasklock.batchAllocationNumThreads with default value 5
Use worker threads in SegmentAllocationQueue to perform segment allocation
Add a manager thread which polls the allocation queue and submits jobs to workers
Skip a job if another job for the same datasource is already in progress
Emit metric task/action/batch/submitted for the count of submitted jobs
Emit metric task/action/batch/skipped for the count of skipped jobs

Release note

Add config druid.indexer.tasklock.batchAllocationNumThreads with default value 5 to control the number of
segment allocation threads. This allows for concurrent segment allocations if there are segment allocations
happening for several different datasources.

Note that setting this config to a very large value will put undue strain on the metadata store and only hamper performance.

This PR has:

gianm · 2025-06-10T16:58:14Z

  private boolean batchAllocationReduceMetadataIO = true;

+  @JsonProperty
+  private int batchAllocationNumThreads = 5;


Should be documented.

gianm · 2025-06-10T17:01:26Z

+  /**
+   * Thread-safe list of datasources for which a segment allocation is currently in-progress.
+   */
+  private final List<String> runningDatasources = Collections.synchronizedList(new ArrayList<>());


contains and remove run on this list- could it be a Set?

gianm · 2025-06-10T17:09:15Z

+      final String dataSource = nextBatch.key.dataSource;
+      if (nextBatch.isDue()) {
+        if (runningDatasources.contains(dataSource)) {
+          // Skip this batch as another batch for the same datasource is in progress


Will this cause a busy loop where we keep retrying this skipped batch over and over? I would think since it remains in processingQueue, we'll do a scheduleQueuePoll at the end of this function, and the default maxWaitTimeMilis is zero (meaning another immediate poll).

Yes, thanks for catching this! This was a mental TODO but I forgot to mark it.
I will check what we can do here.

Updated. Added a small delay of 5 millis in case anything was skipped or if all threads are busy.

Hmm. I think this would still lead to a ton of task/action/batch/skipped metrics being emitted if we have a long-running allocation in flight with another queued up. Fixing that by extending the min wait time would be bad, because that would slow down our responsiveness to allocation requests. Delays are undesirable anyway- we want everything to be as reactive as possible.

Is there an alternate approach you could go with? Maybe when we skip a batch, put it into a separate data structure keyed by datasource. Then when the current batch finishes, the worker thread running that batch could move the skipped batches back to the main queue.

Thanks for the suggestion, @gianm!

I have updated the approach in the PR but with some modifications that seemed to adhere
to the current design of the class better.

Remove the delay

When skipping a batch, mark it as "skipped" and emit the metric. Do not emit metric again if already skipped.

Do not reschedule queue poll if all workers are busy OR if queue is empty OR if all batches were skipped.

When a worker finishes, schedule a queue poll.

gianm · 2025-06-11T15:25:56Z

+      // All remaining entries in the queue were skipped
+      log.debug("Not scheduling again since datasources are already being processed.");
+    } else if (processingQueue.isEmpty()) {
+      log.debug("Not scheduling again since queue is empty.");


I think this would be caught by the previous line -- if the queue is empty, numSkippedBatches and processingQueue.size() are both zero, and 0 >= 0. Consider collapsing them both into a block like Not scheduling again since there are no eligible batches (skipped[%d])

Collapsed the condition but still retained the check on processingQueue.isEmpty() since it makes the compiler happy.

Otherwise, it warns about a potential NPE.

* Make SegmentAllocationQueue multithreaded * Do not run multiple jobs for the same datasource * Add docs, min schedule delay to avoid busy waiting * Trigger queue poll when worker finishes * Emit skip metric once per queued batch * Simplify scheduling condition

Make SegmentAllocationQueue multithreaded

592e80e

github-actions Bot added the Area - Ingestion label Jun 9, 2025

Do not run multiple jobs for the same datasource

f8d56c1

kfaraz mentioned this pull request Jun 9, 2025

Maintain TaskLockbox at datasource level for higher concurrency #17390

Merged

10 tasks

gianm reviewed Jun 10, 2025

View reviewed changes

Add docs, min schedule delay to avoid busy waiting

90f2363

github-actions Bot added the Area - Documentation label Jun 10, 2025

kfaraz requested a review from gianm June 10, 2025 18:03

kfaraz added 2 commits June 11, 2025 05:32

Trigger queue poll when worker finishes

d56290e

Emit skip metric once per queued batch

4dc23b0

gianm approved these changes Jun 11, 2025

View reviewed changes

Simplify scheduling condition

8efce27

gianm merged commit 608abc6 into apache:master Jun 11, 2025
134 of 138 checks passed

capistrant added this to the 34.0.0 milestone Jul 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make SegmentAllocationQueue multithreaded#18098

Make SegmentAllocationQueue multithreaded#18098
gianm merged 6 commits intoapache:masterfrom
kfaraz:seg_alloc_queue_threads

kfaraz commented Jun 9, 2025 •

edited

Loading

Uh oh!

gianm Jun 10, 2025

Uh oh!

kfaraz Jun 10, 2025

Uh oh!

gianm Jun 10, 2025

Uh oh!

kfaraz Jun 10, 2025

Uh oh!

gianm Jun 10, 2025

Uh oh!

kfaraz Jun 10, 2025

Uh oh!

kfaraz Jun 10, 2025

Uh oh!

gianm Jun 10, 2025

Uh oh!

kfaraz Jun 11, 2025

Uh oh!

gianm Jun 11, 2025

Uh oh!

kfaraz Jun 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kfaraz commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Release note

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kfaraz commented Jun 9, 2025 •

edited

Loading