Maintain TaskLockbox at datasource level for higher concurrency#17390
Maintain TaskLockbox at datasource level for higher concurrency#17390gianm merged 18 commits intoapache:masterfrom
Conversation
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request has been marked as stale due to 60 days of inactivity. |
| dataSource, | ||
| (ds, resource) -> { | ||
| if (resource != null && resourcePredicate.test(resource)) { | ||
| // TODO: what if a bad runaway operation is holding the TaskLockbox.giant? |
There was a problem hiding this comment.
This is only a concern on shutdown, right? Since at other times, clear() will only be called when the refcount is zero, meaning no locks should be held. Just trying to make sure I understand.
There was a problem hiding this comment.
Yes, it's a concern only on shutdown(), which is invoked on loss of leadership.
We would want the leadership change listener to return quickly.
There was a problem hiding this comment.
True, we do, although we also want the old leader to fully stand down before the new one stands up, to avoid races between the leaders. Ideal thing would be to interrupt other in flight operations, if that's feasible. If not feasible right now, then please update this comment to describe the concern and to not be a TODO comment.
There was a problem hiding this comment.
Updated and moved the comment to shutdown() method as it seems more relevant there.
| ) throws T | ||
| { | ||
| // Verify that sync is complete | ||
| if (!syncComplete.get()) { |
There was a problem hiding this comment.
Check again in the critical section of getLockboxResource? This code seems potentially racey, since syncComplete could be set to false right after this check.
There was a problem hiding this comment.
Thanks for the tip!
I didn't initially put the check in getLockboxResource() because that method is called from syncFromStorage() as well before the syncComplete flag has been set.
I guess I can have two variants of the getLockboxResource() method or just pass a boolean
to decide whether to perform the check or not.
There was a problem hiding this comment.
A parameter about performing the check sounds reasonable to me.
|
Thanks for the review, @gianm ! |
Description
In clusters with a large number of datasources, task operations block each other owing to the
giantlock in theTaskLockbox. This can become particularly problematic in cases of streaming ingestion being done into multiple datasources, where segment allocations become very slow since all of them must pass through a single queue.None of the task operations share a critical section across datasources and it should suffice to perform the locking
at the datasource level.
This patch is an attempt to remediate this problem.
Changes
TaskLockboxto a single datasource.GlobalTaskLockboxwhich delegates to the respective datasource for any operation.GlobalTaskLockbox.syncFromStorage()must be mutually exclusive from any operationbeing performed on any of the datasources.
syncFromStorage()is called only fromTaskQueue.start()on becoming leader, using aReadWriteLockdidn't seem necessarysyncFromStorage()marks theGlobalTaskLockboxas "unsynced" at the start of the methodGlobalTaskLockboxis marked "synced" again and operations can proceed as normalGlobalTaskLockbox.shutdown()andTaskLockbox.clear()to clean up unused resources uponloss of leadership.
TaskLockboxof a datasource is removed when it is not needed anymorePending
Follow up
After this patch, the bottleneck for segment allocation will be the single-threaded
SegmentAllocationQueue.One approach could be to maintain a separate
SegmentAllocationQueuefor each datasource but that woulddrastically increase the pressure on metadata store in case of multiple datasources.
A better alternative would be to Make segment allocation queue multithreaded
Release note
Improve concurrency on Overlord by ensuring that task actions on one datasource do not block actions on other datasources.
This PR has: