[WIP] Lazy alloc ThreadPool local queues#21695
Conversation
How? Under the assumption that most pool threads never have items queued and thus don't need to be checked for steals? |
Yes, more or less. Under the assumption that if you are using a MinWorkerThreads an order of magnitude higher than CPU count (x10 - x20) to get work done then you are less likely to be queuing local items to these 160-320 threads; so adding their queues to the list of queues to check can be deferred until something is queued. Deferring the queue creation to first queue on that thread rather than thread creation shouldn't be a significant performance impact to regular behaviour? |
|
Also am only going for "Contributes to" rather than "Resolves" as it could quickly return to every thread having a queue based on the workload. |
I don't know. Hopefully not, but there is an extra null check / branch now on every local queuing. |
5f199d8 to
bac0251
Compare
|
Updated summary |
I'm not understanding this part. Why is it less likely? You're assuming some kind of heterogeneous workload where only some work items queue locally and they somehow aren't involved in bursty work? I'd guess that if any pool thread had it's queue Initialized, most would. I see the value of a change like this, but primarily only if nothing in the app ever queues to a local queue, such that no local queues are Initialized. Is that the case in a bare-bones kestrel app? MVC? Typical app that lead to #19088? If there's no measurable cost to doing this, then ok. I'm just skeptical it'll have the desired impact on real apps. Happy to be proven wrong. |
Yes(ish) currently; though there are many paths that end up queuing locally and the local queues are a good mechanism. The main issue I see (with this change) is they only ratchet up, so once you have a local queue its always there to be checked. |
The Platform benchmarks do occasionally from what looks to be a TCS with Although its only occasional, the main issue is the queues ratchet up, but never down (in this PR). |
|
I think can just go full on, and use idle time to remove the queues #21713 |
Defer local queue creation (and add to list of local queues to steal from) to first local item queued to that thread rather than thread creation.
This means if 300 threadpool threads are active but no local queuing has happened; there will be no time spent checking if there is something to steal from other threads.
This is using an assumption that for the scenario outlined in https://github.com/dotnet/coreclr/issues/19088; that if you are using a MinWorkerThreads an order of magnitude higher than CPU count (x10 - x20) to get work done then you are less likely to be queuing local items to these 160-320 threads; so adding their queues to the list of queues to check can be deferred until something is queued.
Also am only going for "Contributes to" rather than "Resolves" as it could quickly return to every thread having a queue based on the workload (e.g. releases of blocked
SemaphoreSlim.WaitAsyncwhich queue locally)Should also improve the performance for apps that use the threadpool, but only use the global queue and never end up queuing items to the thread's local queues for the app lifetime - though not sure how common that would be?*
*Could perhaps remove a local queue if its not had anything on it for a while.
Contributes to https://github.com/dotnet/coreclr/issues/19088
/cc @stephentoub @vancem @kouvel @VSadov