Making performance improvements to sockets#2366
Conversation
|
Failures:
I believe this is one that @pakrym encountered when porting to the latest pipelines
|
|
Also the OSX logs had this: |
7f29d52 to
c7044fe
Compare
| var t = (Tuple<Action<object>, object>)s; | ||
| t.Item1(t.Item2); | ||
| }, | ||
| Tuple.Create(action, state)); |
There was a problem hiding this comment.
For now this is only used per connection so not a big deal
|
There is something wrong with this (in principle) :( Numbers are good though :) I think corefx needs a fix |
| _error = socketError; | ||
| _bytesTransfered = bytesTransferred; | ||
| Interlocked.Exchange(ref _callback, _callbackCompleted)?.Invoke(); | ||
| var continuation = Interlocked.Exchange(ref _callback, _callbackCompleted); |
There was a problem hiding this comment.
Isn't this double scheduling true-async reads? If so, shouldn't this readerScheduler be switched to PipeScheduler.Inline for the socket transport to avoid double dispatching (or even triple dispatching by the time the data flows through the request body pipe as currently implemented).
|
So I talked to @davidfowl about the purpose of double dispatching, and his theory is that without this change, a single socket could dominate the usage of the limited I/O threads for as long as there's data already available in the OS's input buffer by the time ReceiveAsync is called. I figured if that's the case, certain connections must be starving others of resources in dev, and that would mean this change should reduce wrk's reported standard deviation for latency. It looks like @davidfowl is on to something, because the latency numbers for the Socket transport are immensely better with this PR. Json WindowsSocketsBeforeAfterSo not only did this chage reduce the standard deviation for latency from 93.79ms to 14.06ms, it also reduced the average latency from 7.18ms to 1.36ms. The 99th latency percentile is even more impressive. The 50/75/90th percentiles are slightly better than before (note the difference between ms and us), but the 99th percentile went from 10.06ms to 0.99ms! It's interesting that the 99th percentile for latency is still lower than the average after than this change. It really shows that my stats prof was right about the mean being heavily impacted by outliers. |
|
@davidfowl @halter73 very interesting find 👍 |
Bumping up IO threads doesn't solve the problem. We just need to efficiently get off the IO thread if the next read is synchronous. @halter73 As an alternative, we could try dispatching if we're not on a thread pool thread and IO completes synchronously. Edit: Thinking about it though, it's pretty similar to what's there today since the read loop is always going to be on an IO thread... |
|
cc: @vancem @geoffkizer |
- Run SocketAwaitable continuations asynchronously - Run using the global thread pool queues instead of using local queues.
c7044fe to
6d65f86
Compare
of using local queues.
Theories:
Plaintext Linux
Libuv:
Thread count default
Thread count 2
Sockets:
Before:
After:
Plaintext Windows
Libuv:
Thread count default
Thread count 2
Sockets:
Before
After
Json Linux
Libuv
Sockets
Before
After
Json Windows
Libuv
Sockets
Before
After