Some more perf tweaks to ThreadPool#9279
Conversation
| @@ -534,81 +536,66 @@ internal static bool Dispatch() | |||
| // Dequeue and EnsureThreadRequested must be protected from ThreadAbortException. | |||
| } | ||
|
|
||
| // Simple random number generator. We don't need great randomness, we just need a little and for it to be fast. | ||
| internal sealed class FastRandom // xorshift prng |
b875c42 to
dd13b29
Compare
|
@jkotas, I've not seen this error before (in the arm Cross Debug Build leg): I assume this has nothing to do with my PR? |
|
I do not think it has anything to do with your PR. I have not seen it before. |
|
@dotnet-bot test Windows_NT arm Cross Debug Build please |
| _w = _w ^ (_w >> 19) ^ (t ^ (t >> 8)); | ||
|
|
||
| int r = (int)_w & int.MaxValue; | ||
| return (int)(r * (1.0 / ((double)int.MaxValue + 1)) * maxValue); |
There was a problem hiding this comment.
As much as idiv bothers me; I wonder what the effect of casting to double doing math and casting back; since vectors are increasingly in play; may be better just to % maxValue? (with early exit for maxValue == 0)
There was a problem hiding this comment.
I'll switch to mod. Looks like it might be 1-2% faster, but regardless it's simpler.
There was a problem hiding this comment.
I still wonder if light coordination between the threads using interlocked might be better than random to avoid collisions
As in rather than
int i = tl.random.Next(c);
do
int i = NextCounter.Next(c);
with it being something like https://gist.github.com/benaadams/9a66811bc54e7126a9cd8c181bbe6f53
Maybe something to look at in future.
There was a problem hiding this comment.
Thanks, Ben. Please let me know what you find (if you look).
|
LGTM |
dd13b29 to
e361700
Compare
The current data structure used to store the list of local work-stealing queues is maintains a sparse array, where entries can be null, and where removals null out entries in an active array that could be being read by another thread concurrently. This necessitates that threads looking for work need to use volatile reads and null checks on each element. Further, because the array doubles in size when it grows, and never shrinks, we often end up having many empty slots that threads need to look at as they're looking for work. It's actually relatively rare for threads to come and go. While the thread pool does take threads in and out of service, it only rarely actually terminates a thread or asks the OS for a new one, so it's relatively rare that threads are added/removed from the list. Given that, we can simply use immutable arrays, creating a new array of the exact right size whenever a thread is added or removed. Then iteration can be done without volatile reads and without null checks, because the contents of the array being read through won't change and won't ever be null.
And clean up the code around it.
Allow more methods to be inlined
We need some randomness, but it doesn't need to be particularly good, just fast. A simple xorshift prng is 3-4x faster than Random, in particular without multiple virtual method calls that don't get inlined. It's also a bit lighter in terms of memory (though there's only one of these per thread, so it doesn't impact much).
- Change LocalPop to be parameterless and return the IThreadPoolWorkItem rather than returning a bool and having the IThreadPoolWorkItem in an out arg. - Refactor LocalPop slightly to have an inlineable upfront check for whether the local queue is empty, and only call the non-inlineable method if there's something that could potentially be popped
e361700 to
70aae70
Compare
Some more perf tweaks to ThreadPool Commit migrated from dotnet/coreclr@cfd28c5
Main changes:
cc: @jkotas, @kouvel, @benaadams