Poll from memory before fetching task information from DB#18448
Merged
jtuglu1 merged 5 commits intoapache:masterfrom Aug 31, 2025
Merged
Conversation
cb5caaa to
e99d16e
Compare
e99d16e to
68d0116
Compare
f481a0c to
bc4578b
Compare
b011d13 to
7bbc2f3
Compare
7bbc2f3 to
ff4deb6
Compare
ff4deb6 to
0b5aced
Compare
kfaraz
reviewed
Aug 29, 2025
kfaraz
reviewed
Aug 29, 2025
kfaraz
reviewed
Aug 30, 2025
e7c6086 to
e3177e4
Compare
maytasm
approved these changes
Aug 30, 2025
Contributor
maytasm
left a comment
There was a problem hiding this comment.
LGTM. Please wait for @kfaraz and @samarthjain reviews too.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When running high #s of tasks in a cluster, the Overlord can become bottlenecked by the endpoint
/task/{taskid}/status.Because
taskQueryTool.getTaskInfo(taskid)currently only issues a direct I/O to the underlying task storage, this results in heavy I/O + serde from the metadata store to pull the associated task information.This change stores/updates the
TaskInfoinformation inside theactiveTasksqueue'sTaskEntrytype, allowing the requests to fetch from memory if the task entry exists, otherwise falling back to the DB.Some lingering questions:
Code Style/Smell
I didn't really like the idea of shoving the
TaskInfointo theTaskEntry, but it felt like the most convenient way of storing all the necessary information needed. We're really just relying on theTaskInfofor the creation time as well as some up-to-date value of the currentTaskStatus. It might be worth standardizing onTaskInfoasTaskandTaskStatusare often paired with each other.Consistency
TaskStatusDespite updating the task status on
TaskQueue::notifyStatuscallbacks, I wonder whether it may be possible for the respective states in task storage and memory to diverge for a given task's status.TaskQueryTool::getActiveTaskInfoandTaskQueryTool::getTaskInfoshould always return state that is at least as recent as the latest commit to the DB, since the status updates to the DB inTaskQueue::notifyStatusare in the same critical section as those to theTaskEntryvalue. There might be merit in updating the task status in the critical section once it's been committed to the DB, however.TaskQueryTool::getAllActiveTasksPrior to this change, the function would use a dummy timestamp for both creation/queue insertion timestamps in the returned payload. Now that we can serve accurate timestamps, I believe we can still use the same timestamp for both fields.
Release note
Poll from memory before fetching task information from DB
Key changed/added classes in this PR
MyFooOurBarTheirBazThis PR has: