Skip to content

Poll from memory before fetching task information from DB#18448

Merged
jtuglu1 merged 5 commits intoapache:masterfrom
jtuglu1:use-taskqueue-as-caching-layer-for-more-task-methods
Aug 31, 2025
Merged

Poll from memory before fetching task information from DB#18448
jtuglu1 merged 5 commits intoapache:masterfrom
jtuglu1:use-taskqueue-as-caching-layer-for-more-task-methods

Conversation

@jtuglu1
Copy link
Copy Markdown
Contributor

@jtuglu1 jtuglu1 commented Aug 28, 2025

Description

When running high #s of tasks in a cluster, the Overlord can become bottlenecked by the endpoint /task/{taskid}/status.
Because taskQueryTool.getTaskInfo(taskid) currently only issues a direct I/O to the underlying task storage, this results in heavy I/O + serde from the metadata store to pull the associated task information.

This change stores/updates the TaskInfo information inside the activeTasks queue's TaskEntry type, allowing the requests to fetch from memory if the task entry exists, otherwise falling back to the DB.

Some lingering questions:

Code Style/Smell

I didn't really like the idea of shoving the TaskInfo into the TaskEntry, but it felt like the most convenient way of storing all the necessary information needed. We're really just relying on the TaskInfo for the creation time as well as some up-to-date value of the current TaskStatus. It might be worth standardizing on TaskInfo as Task and TaskStatus are often paired with each other.

Consistency

TaskStatus

Despite updating the task status on TaskQueue::notifyStatus callbacks, I wonder whether it may be possible for the respective states in task storage and memory to diverge for a given task's status.

  • In normal state, my initial thought is that the value returned from TaskQueryTool::getActiveTaskInfo and TaskQueryTool::getTaskInfo should always return state that is at least as recent as the latest commit to the DB, since the status updates to the DB in TaskQueue::notifyStatus are in the same critical section as those to the TaskEntry value. There might be merit in updating the task status in the critical section once it's been committed to the DB, however.
  • In a failure scenario, the overlord could receive requests prior to initially (or recently) syncing with DB, resulting in partial/incorrect results.
TaskQueryTool::getAllActiveTasks

Prior to this change, the function would use a dummy timestamp for both creation/queue insertion timestamps in the returned payload. Now that we can serve accurate timestamps, I believe we can still use the same timestamp for both fields.

Release note

Poll from memory before fetching task information from DB


Key changed/added classes in this PR
  • MyFoo
  • OurBar
  • TheirBaz

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch from cb5caaa to e99d16e Compare August 28, 2025 06:46
@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch from e99d16e to 68d0116 Compare August 28, 2025 06:56
@jtuglu1 jtuglu1 changed the title Poll from TaskQueue::activeTasks before fetching from DB Poll from memory before fetching task information from DB Aug 28, 2025
@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch from f481a0c to bc4578b Compare August 28, 2025 09:43
@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch 2 times, most recently from b011d13 to 7bbc2f3 Compare August 28, 2025 18:33
@jtuglu1 jtuglu1 marked this pull request as ready for review August 28, 2025 20:19
@jtuglu1 jtuglu1 requested a review from kfaraz August 28, 2025 22:57
@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch from 7bbc2f3 to ff4deb6 Compare August 29, 2025 01:35
@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch from ff4deb6 to 0b5aced Compare August 29, 2025 01:58
@jtuglu1 jtuglu1 requested a review from maytasm August 29, 2025 05:19
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jtuglu1 ! Patch looks good.
I have left some minor suggestions in a couple of places.

Comment thread processing/src/main/java/org/apache/druid/indexer/TaskInfo.java Outdated
Comment thread processing/src/main/java/org/apache/druid/indexer/TaskInfo.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
@jtuglu1 jtuglu1 requested a review from kfaraz August 29, 2025 15:51
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Outdated
@jtuglu1 jtuglu1 force-pushed the use-taskqueue-as-caching-layer-for-more-task-methods branch from e7c6086 to e3177e4 Compare August 30, 2025 06:50
Copy link
Copy Markdown
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for @kfaraz and @samarthjain reviews too.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻

@jtuglu1 jtuglu1 merged commit 6d7c378 into apache:master Aug 31, 2025
70 checks passed
@jtuglu1 jtuglu1 deleted the use-taskqueue-as-caching-layer-for-more-task-methods branch September 1, 2025 22:26
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants