[WIP] Deactivate ThreadPool local queues when idle by benaadams · Pull Request #21713 · dotnet/coreclr

benaadams · 2018-12-30T02:12:16Z

Defer adding local queue to list of active queues to potentially steal from until to first local item queued rather than thread creation.
Remove thread's local queue from list of active queues if the thread finds no work to do i.e. nothing in global or its own local queue (or any other local queue/missed steal) - e.g. thread is idling.
Readd when next local item is queued (e.g. back to step 1.)

Resolves: #19088

stephentoub · 2018-12-30T02:45:29Z

Doesn't this potentially significantly increase both allocation and contention if a thread repeatedly adds its local queue and then removes it, which would happen if it processed a work item that queued a single local work item? It'd register its list, then process its own item, and then remove the list. Seems like this could result in significant regressions. What am I missing? How are you validating this change, both for the desired improvements and against such regressions?

VSadov · 2018-12-30T17:33:28Z

I have been looking at #19088 and similar for a long time.
I am pretty convinced that the root cause is the "queue per thread" approach itself.
There are several ways how it may stand in a way of scalability or lead into anomalous and hard to recover from behaviors. And it gets worse on machines with more cores.

It is not just having to scan through empty queues. The queues that have only few items are also a problem. Ultimately you have to scan queues for correctness, and no matter how you spread the cost, when there are lots of queues it gets expensive. Short queues in particular - because of contentions and false sharing.
The whole tendency to get into "numerous shallow queues" situation is harmful to the efficiency of the pool.

I think I am getting ready to discuss #18403 as a more permanent solution.

benaadams · 2018-12-30T21:17:32Z

Will close this and see where that goes :)

stephentoub · 2018-12-31T13:54:04Z

The queues that have only few items are also a problem

Can you elaborate? You stop scanning the moment you successfully remove an item from a queue. Are you saying you're seeing contention on queues with, say, only one item causing a problem, as multiple threads all try to take from it, fail, and then continue scanning? I'd have guessed that condition would be relatively rare, in particular with threads all starting their scan at different locations.

VSadov · 2018-12-31T21:05:59Z

Just an observation from the work stealing theory -

Ideally all items are popped. Stealing is basically a more expensive fallback codepath and should be rare (ideally).
when stealing happens, it is better if the queue is long since that would reduce chances that thief interacts with the pusher/popper (that is where contention and false sharing would happen).
longer queues also make it less likely that poppers will go through all their items and will have to start stealing.

Empty queues are worse than short ones, obviously :-), since the work spent on them cannot yield a workitem.
But short/unbalanced queues are also undesirable since they move you away from the most efficient mode.

VSadov · 2018-12-31T21:08:40Z

To be sure - we are not talking about common cases. In typical loads our thread pool works wonderfully.

In fact, according to literature, most thread scheduling strategies are adequate in the common case - just because the actual "work" dominates over the scheduling by a far margin.
There is not a lot of motivation to improve such case.

It is mostly about the tolerance to the less common inconvenient cases where threadpool may become a bottleneck.

VSadov · 2018-12-31T21:27:00Z

Our "inconvenient" case is where we need more TP workers than CPU cores to compensate for worker latencies and bursty loads. Sometimes we need a lot more workers than cores.
In such cases we end up with unnecessarily many work queues, which bound to be shallow (the same total number of tasks divided over more queues) and often misbalanced.

The changes in #18403 are trying to address the root cause by keeping the #queues == #cores and by spending some extra effort to keep work queues balanced. - so that we end up operating closer to the ideal mode even when the actual work load is far from ideal.

It looks like the extra complexity generally pays for itself and the benchmark performance is roughly the same or better.
At the same time the scenarios like in #19088 are no longer a problem.

I think the changes are very promising, but could use more testing/tuning of course.

Deactivate ThreadPool local queues when idle

7f366c3

benaadams mentioned this pull request Dec 30, 2018

[WIP] Lazy alloc ThreadPool local queues #21695

Closed

benaadams changed the title ~~Deactivate ThreadPool local queues when idle~~ [WIP] Deactivate ThreadPool local queues when idle Dec 30, 2018

benaadams closed this Dec 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Deactivate ThreadPool local queues when idle#21713

[WIP] Deactivate ThreadPool local queues when idle#21713
benaadams wants to merge 1 commit into
dotnet:masterfrom
benaadams:localqueues

benaadams commented Dec 30, 2018 •

edited

Loading

Uh oh!

stephentoub commented Dec 30, 2018 •

edited

Loading

Uh oh!

VSadov commented Dec 30, 2018 •

edited

Loading

Uh oh!

benaadams commented Dec 30, 2018

Uh oh!

stephentoub commented Dec 31, 2018

Uh oh!

VSadov commented Dec 31, 2018

Uh oh!

VSadov commented Dec 31, 2018 •

edited

Loading

Uh oh!

VSadov commented Dec 31, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

benaadams commented Dec 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Dec 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Dec 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaadams commented Dec 30, 2018

Uh oh!

stephentoub commented Dec 31, 2018

Uh oh!

VSadov commented Dec 31, 2018

Uh oh!

VSadov commented Dec 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Dec 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benaadams commented Dec 30, 2018 •

edited

Loading

stephentoub commented Dec 30, 2018 •

edited

Loading

VSadov commented Dec 30, 2018 •

edited

Loading

VSadov commented Dec 31, 2018 •

edited

Loading

VSadov commented Dec 31, 2018 •

edited

Loading