Use LibuvThread inspired schedulers in socket transport by halter73 · Pull Request #2368 · aspnet/KestrelHttpServer

halter73 · 2018-03-07T11:26:25Z

Inspired by #2366. Looks like it might be a small improvement. We'll need to test more scenarios.

Json Windows

Libuv

[03:23:17.655] RequestsPerSecond:           452637
[03:23:17.656] Latency on load (ms):        0.7
[03:23:17.656] Max CPU (%):                 85
[03:23:17.656] WorkingSet (MB):             386
[03:23:17.656] Startup Main (ms):           419
[03:23:17.656] First Request (ms):          232.8
[03:23:17.657] Latency (ms):                0.8
[03:23:17.657] Socket Errors:               0
[03:23:17.657] Bad Responses:               0
[03:23:17.657] SDK:                         2.2.0-preview1-007522
[03:23:17.657] Runtime:                     2.1.0-preview2-26225-03
[03:23:17.657] ASP.NET Core:                2.1.0-preview2-30256

Sockets

Before

[03:21:06.064] RequestsPerSecond:           287068
[03:21:06.065] Latency on load (ms):        3
[03:21:06.065] Max CPU (%):                 84
[03:21:06.065] WorkingSet (MB):             360
[03:21:06.065] Startup Main (ms):           382
[03:21:06.065] First Request (ms):          228.6
[03:21:06.066] Latency (ms):                0.7
[03:21:06.066] Socket Errors:               0
[03:21:06.066] Bad Responses:               0
[03:21:06.066] SDK:                         2.2.0-preview1-007522
[03:21:06.066] Runtime:                     2.1.0-preview2-26225-03
[03:21:06.066] ASP.NET Core:                2.1.0-preview2-30255

#2366

[03:18:42.707] RequestsPerSecond:           342340
[03:18:42.707] Latency on load (ms):        1.4
[03:18:42.708] Max CPU (%):                 85
[03:18:42.708] WorkingSet (MB):             363
[03:18:42.708] Startup Main (ms):           376
[03:18:42.708] First Request (ms):          228.9
[03:18:42.708] Latency (ms):                0.8
[03:18:42.708] Socket Errors:               0
[03:18:42.708] Bad Responses:               0
[03:18:42.708] SDK:                         2.2.0-preview1-007522
[03:18:42.708] Runtime:                     2.1.0-preview2-26225-03
[03:18:42.708] ASP.NET Core:                2.1.0-preview2-30255

After

[03:19:41.107] RequestsPerSecond:           376305
[03:19:41.107] Latency on load (ms):        0.8
[03:19:41.107] Max CPU (%):                 82
[03:19:41.107] WorkingSet (MB):             368
[03:19:41.108] Startup Main (ms):           389
[03:19:41.108] First Request (ms):          230.1
[03:19:41.108] Latency (ms):                0.9
[03:19:41.108] Socket Errors:               0
[03:19:41.108] Bad Responses:               0
[03:19:41.108] SDK:                         2.2.0-preview1-007522
[03:19:41.108] Runtime:                     2.1.0-preview2-26225-03
[03:19:41.108] ASP.NET Core:                2.1.0-preview2-30255

halter73 · 2018-03-07T11:27:18Z

 {
    internal sealed class SocketTransport : ITransport
    {
+        private const int _numSchedulers = 8;


TODO: Determine the best way to calculate the optimal number of schedulers.

davidfowl · 2018-03-07T18:51:23Z

Nice! The thing I don't like about it is the thing you commented on. I don't want to be in the business of tuning the thread count with the sockets transport. That's one of the things we wanted to get away from generally.

Can you run plain text as well to see what you get (both oses)

halter73 · 2018-03-07T19:38:41Z

@davidfowl The nice thing is that these schedulers aren't in the business of managing any heavy resources like Threads. The CoalescingScheduler is basically just two queues of Actions a lock. Otherwise it's the normal ThreadPool doing all the work.

My still unproven theory is that just using the same calculation we use for the libuv thread count will be reasonable for all of the scenarios I left to test. If we end up merging this PR, we will probably want to make the scheduler count configurable like we do for the libuv transport.

benaadams · 2018-03-07T20:55:41Z

So you are queuing batches to the ThreadPool rather than individual items; this might reduce the number of threadpool threads, thus reducing contention on the global queue, and local queues?

Locking is much higher for the queuing to the batch queues, than it is queuing to ThreadPool's global ConcurrentQueue (as each schedule takes a full lock).

In your output can you also dump thread counts?

/cc @stephentoub

stephentoub · 2018-03-07T21:07:03Z

I don't want to be in the business of tuning the thread count with the sockets transport.

The nice thing is that these schedulers aren't in the business of managing any heavy resources like Threads. The CoalescingScheduler is basically just two queues of Actions a lock. Otherwise it's the normal ThreadPool doing all the work.

But you are still responsible for tuning thread counts. The way this is structured, you'll only ever have at most one thread processing work from one of these coalescing schedulers, so you'll need to decide exactly how many of these you want and how you want to divvy work up between them. Isn't that what David was asking about?

halter73 · 2018-03-07T21:11:02Z

@stephentoub I completely agree with your assessment. That's why I put the TODO in the code to dynamically calculate the count and suggested giving a way for the user to configure the number of schedulers (i.e. DoWork loops) like we do with libuv.

I didn't mean to imply the number of schedulers doesn't matter. Even quick experimentation changing _numSchedulers values shows what a big impact it has.

stephentoub · 2018-03-07T21:30:10Z

+                {
+                    queue.Dequeue()();
+                }
+            }


I'm not clear on why this logic with multiple queues is being used. What are you trying to achieve with it? Why not something like:

internal sealed class SingleWorkerScheduler { private static readonly WaitCallback s_worker = s => ((SingleWorkerScheduler)s).ProcessItems(); private readonly ConcurrentQueue<Action> _actions = new ConcurrentQueue<Action>(); private bool _hasWorker; public void Schedule(Action action) { _actions.Enqueue(action); lock (_actions) { if (!_hasWorker) { ThreadPool.QueueUserWorkItem(s_worker, this); _hasWorker = true; } } } private void ProcessItems() { try { while (_actions.TryDequeue(out Action item)) { item(); } } finally { lock (_actions) { if (!_actions.IsEmpty) { ThreadPool.QueueUserWorkItem(s_worker, this); _hasWorker = true; } else { _hasWorker = false; } } } } }

stephentoub · 2018-03-07T21:31:59Z

I completely agree with your assessment. That's why I put the TODO in the code to dynamically calculate the count and suggested giving a way for the user to configure the number of schedulers (i.e. DoWork loops) like we do with libuv. I didn't mean to imply the number of schedulers doesn't matter. Even quick experimentation changing _numSchedulers values shows what a big impact it has

Dynamically calculating and spinning up and down workers is a huge amount of work. That's the main thing the thread pool does. And it's way too easy to tune something like this to a particular benchmark and then find that it falls down quickly on others.

halter73 · 2018-03-07T21:37:27Z

I shouldn't have used the word dynamic. I meant that we should consider basing the scheduler count on the number of virtual processors like we do for libuv instead of keeping _numSchedulers as a const with a value of 8. I didn't mean we should look into dynamically changing the number of schedulers in an already-running server instance.

I'll measure the performance of your SingleWorkerScheduler @stephentoub. The only reason for using two non-concurrent queues is that we've already successfully used that pattern in the libuv transport.

stephentoub · 2018-03-07T21:45:14Z

he only reason for using two non-concurrent queues is that we've already successfully used that pattern in the libuv transport.

Ok. Which has better throughput will depend on the use case. Worst case for the two queues one is that you end up in a pattern where every work item dequeue causes you to take the lock and switch queues (and contend with enqueuers), and is thus worse than the concurrent queue. Best case for the two queues one is you end up in a real batch/burst scenario where you're able to fill a queue and swap it, and then there's no per work item synchronization when dequeueing each, just the one for the whole batch.

stephentoub · 2018-03-07T21:46:28Z

I meant that we should consider basing the scheduler count on the number of virtual processors like we do for libuv instead of keeping _numSchedulers as a const with a value of 8.

Ok. I continue to worry that we're tuning all of this to artificial benchmarks, and that real workloads will suffer as a result. I could be off base, or maybe these workloads really are effectively stand-ins for what people will be doing.

halter73 · 2018-03-07T22:54:23Z

The ConcurrentQueue seems to perform similarly in the JSON benchmark on Windows while simplifying the code, so I updated the PR to use a ConcurrentQueue.

Json Windows

Sockets

Swapping Queues

[02:50:24.579] RequestsPerSecond:           376675
[02:50:24.579] Latency on load (ms):        0.9
[02:50:24.579] Max CPU (%):                 82
[02:50:24.579] WorkingSet (MB):             377
[02:50:24.579] Startup Main (ms):           380
[02:50:24.579] First Request (ms):          233.6
[02:50:24.580] Latency (ms):                0.7
[02:50:24.580] Socket Errors:               0
[02:50:24.580] Bad Responses:               0
[02:50:24.580] SDK:                         2.2.0-preview1-007522
[02:50:24.580] Runtime:                     2.1.0-preview2-26225-03
[02:50:24.580] ASP.NET Core:                2.1.0-preview2-30261

ConcurrentQueue

[02:49:30.009] RequestsPerSecond:           379339
[02:49:30.009] Latency on load (ms):        0.8
[02:49:30.009] Max CPU (%):                 82
[02:49:30.009] WorkingSet (MB):             372
[02:49:30.009] Startup Main (ms):           387
[02:49:30.009] First Request (ms):          232.8
[02:49:30.010] Latency (ms):                0.9
[02:49:30.010] Socket Errors:               0
[02:49:30.010] Bad Responses:               0
[02:49:30.010] SDK:                         2.2.0-preview1-007522
[02:49:30.010] Runtime:                     2.1.0-preview2-26225-03
[02:49:30.010] ASP.NET Core:                2.1.0-preview2-30261

benaadams · 2018-03-07T23:19:01Z

You can drop the lock with ConcurrentQueue?

public class CoalescingScheduler
{
    private static readonly WaitCallback _doWorkCallback = s => ((CoalescingScheduler)s).DoWork();

    private readonly ConcurrentQueue<Action> _actions = new ConcurrentQueue<Action>();

    private int _doingWork;

    public void Schedule(Action action)
    {
        _actions.Enqueue(action);

        if (Interlocked.Exchange(ref _doingWork, 1) == 0)
        {
            ThreadPool.QueueUserWorkItem(_doWorkCallback, this);
        }
    }

    private void DoWork()
    {
        ProcessQueue();
        // Swtich off flag
        Volatile.Write(ref _doingWork, 0);
        // Process any items added in race
        ProcessQueue();
    }

    private void ProcessQueue()
    {
        while (_actions.TryDequeue(out Action item))
        {
            item();
        }
    }
}

benaadams · 2018-03-07T23:21:42Z

Probably is a better way to resolve the unset race

stephentoub · 2018-03-07T23:22:17Z

You can drop the lock with ConcurrentQueue?

With that scheme, only if you're ok with there being multiple workers running for some period of time.

benaadams · 2018-03-07T23:43:32Z

With that scheme, only if you're ok with there being multiple workers running for some period of time.

Was exercise for reader... 😉

This better?

public class CoalescingScheduler
{
    private static readonly WaitCallback _doWorkCallback = s => ((CoalescingScheduler)s).DoWork();

    private readonly ConcurrentQueue<Action> _actions = new ConcurrentQueue<Action>();

    private int _doingWork;

    public void Schedule(Action action)
    {
        _actions.Enqueue(action);

        if (Interlocked.Exchange(ref _doingWork, 1) == 0)
        {
            ThreadPool.QueueUserWorkItem(_doWorkCallback, this);
        }
    }

    private void DoWork()
    {
        while (_actions.TryDequeue(out Action item))
        {
            item();
        }
        // Swtich off flag
        Volatile.Write(ref _doingWork, 0);

        // Process any items added in race
        while (Volatile.Read(ref _doingWork) == 0 && _actions.TryDequeue(out Action item))
        {
            item();
        }
    }
}

stephentoub · 2018-03-07T23:46:55Z

This better?

Similar issue. DoWork could have just checked _doingWork and found it to be 0, then lots of enqueuers come along and queue a bunch of items, and the first will end up queueing another consumer. Now the original consumer will be dequeueing and executing at least one work item concurrently with another consumer.

halter73 · 2018-03-08T00:32:37Z

I like the idea Ben. I don't see there being a huge issue with having work items for a given scheduler running in parallel as long as it's rare. Unfortunately it doesn't seem to perform better than locking, and locking is a bit simpler conceptually.

Json Windows

Sockets

lock free

[04:21:06.395] RequestsPerSecond:           373138
[04:21:06.395] Latency on load (ms):        0.9
[04:21:06.396] Max CPU (%):                 84
[04:21:06.396] WorkingSet (MB):             370
[04:21:06.396] Startup Main (ms):           395
[04:21:06.396] First Request (ms):          233.2
[04:21:06.396] Latency (ms):                0.8
[04:21:06.397] Socket Errors:               0
[04:21:06.397] Bad Responses:               0
[04:21:06.397] SDK:                         2.2.0-preview1-007522
[04:21:06.397] Runtime:                     2.1.0-preview2-26225-03
[04:21:06.397] ASP.NET Core:                2.1.0-preview2-30263

davidfowl · 2018-03-08T06:01:32Z

@halter73 get some plaintext numbers.

davidfowl · 2018-03-08T09:08:04Z

+            {
+                if (!_actions.IsEmpty)
+                {
+                    ThreadPool.QueueUserWorkItem(_doWorkCallback, this);


Is this to avoid blocking the thread pool thread for too long?

That's the theory behind it, but I didn't test the performance of just looping again. I suspect that would perform similarly.

davidfowl · 2018-03-08T09:08:34Z

+
+        private void DoWork()
+        {
+            while (_actions.TryDequeue(out Action item))


Do we need a max number of items to dequeue so we don't starve this thread?

davidfowl · 2018-03-08T15:53:53Z

@stephentoub

Ok. I continue to worry that we're tuning all of this to artificial benchmarks, and that real workloads will suffer as a result. I could be off base, or maybe these workloads really are effectively stand-ins for what people will be doing.

Agreed. If we end up needing to tweak this number for different workloads then I think we've failed. We need to run all of our benchmarks to make sure we're not missing anything glaring.

@halter73 If reads do complete synchronously, it's possible for a connection to delay/starve the other work items in a queue. Before that was happening with the IO thread itself, but now we've moved that into our queue.

halter73 · 2018-03-08T20:09:01Z

@halter73 If reads do complete synchronously, it's possible for a connection to delay/starve the other work items in a queue. Before that was happening with the IO thread itself, but now we've moved that into our queue.

Yes. But now it wouldn't be a single connection starving others, but a single scheduler which is responsible for multiple connections. And unlike connections, there's a fixed number of schedulers which will likely be kept under Environment.ProcessorCount.

Still, if this is a concern, having a max iteration count like I did in an earlier version of the PR doesn't seem to effect performance much.

halter73 · 2018-03-09T01:35:48Z

Pipelined plaintext performance on Windows is similar to #2366.

[05:34:44.537] RequestsPerSecond:           1795988
[05:34:44.537] Latency on load (ms):        2.3
[05:34:44.537] Max CPU (%):                 91
[05:34:44.537] WorkingSet (MB):             373
[05:34:44.538] Startup Main (ms):           377
[05:34:44.538] First Request (ms):          141.2
[05:34:44.538] Latency (ms):                0.6
[05:34:44.538] Socket Errors:               0
[05:34:44.538] Bad Responses:               0
[05:34:44.538] SDK:                         2.2.0-preview1-007522
[05:34:44.538] Runtime:                     2.1.0-preview2-26225-03
[05:34:44.538] ASP.NET Core:                2.1.0-preview2-30272

davidfowl · 2018-03-09T08:25:49Z

@halter73 can you get updated numbers? We got a new corefx build.

davidfowl · 2018-03-09T08:27:50Z

+        /// Defaults to half of <see cref="Environment.ProcessorCount" /> rounded down and clamped between 1 and 16.
+        /// </remarks>
+        //public int IOLoopCountCount { get; set; } = ProcessorThreadCount;
+        public int IOLoopCountCount { get; set; } = Math.Min(Environment.ProcessorCount, 16);


QueueCount?

Is it possible to run queueless?

halter73 · 2018-03-09T10:34:53Z

Json Windows Sockets

[02:15:16.387] RequestsPerSecond:           371892
[02:15:16.387] Latency on load (ms):        1
[02:15:16.387] Max CPU (%):                 82
[02:15:16.387] WorkingSet (MB):             371
[02:15:16.387] Startup Main (ms):           379
[02:15:16.388] First Request (ms):          235.2
[02:15:16.388] Latency (ms):                0.7
[02:15:16.388] Socket Errors:               0
[02:15:16.388] Bad Responses:               0
[02:15:16.388] SDK:                         2.2.0-preview1-007522
[02:15:16.388] Runtime:                     2.1.0-preview2-26308-01
[02:15:16.388] ASP.NET Core:                2.1.0-preview2-30277

Plaintext Windows Sockets

[02:34:14.925] RequestsPerSecond:           1831133
[02:34:14.925] Latency on load (ms):        2.4
[02:34:14.926] Max CPU (%):                 91
[02:34:14.926] WorkingSet (MB):             390
[02:34:14.926] Startup Main (ms):           380
[02:34:14.926] First Request (ms):          148.7
[02:34:14.926] Latency (ms):                0.6
[02:34:14.926] Socket Errors:               0
[02:34:14.926] Bad Responses:               0
[02:34:14.926] SDK:                         2.2.0-preview1-007522
[02:34:14.926] Runtime:                     2.1.0-preview2-26308-01
[02:34:14.926] ASP.NET Core:                2.1.0-preview2-30277

tmds · 2018-03-09T11:04:33Z

Do you have some thoughts what gets coalesced here?

This could be work items which are created from the epoll thread successively. Currently, these work items do the i/o operations (e.g. read/write from the socket). This PR may cause those operations to be serialized. It would be interesting to see a benchmark when the epoll thread performed the i/o operation and then created a work item. Then read/write operations would be serialized already on the epoll thread.

stephentoub · 2018-03-09T14:59:03Z

Yes. But now it wouldn't be a single connection starving others, but a single scheduler which is responsible for multiple connections. And unlike connections, there's a fixed number of schedulers which will likely be kept under Environment.ProcessorCount. Still, if this is a concern, having a max iteration count like I did in an earlier version of the PR doesn't seem to effect performance much.

Work items queued to these schedulers run user code, right? If so, how does limiting max iterations help if a single work item blocks for a period of time? Seems like it doing so will end up starving every other connection assigned to that scheduler.

Even worse, and this may be stretching things, but imagine code did:

httpClient.PostAsync(uriBackToSameServer).Wait();

That could then end up deadlocking many connections, right?

stephentoub · 2018-03-09T15:02:54Z

+            }
+        }
+
+        private class Work


Why a class rather than a struct?

Originally this was a struct like it is in LibuvThread where I copied this from. I was comparing the performance of queueing a struct vs. a class vs. an implicitly allocated closure. Whatever difference there is between the three was indistinguishable from the noise of the JSON and plaintext benchmarks so I forgot about it and left it as a class. I'll change it back to a struct though.

davidfowl · 2018-03-09T15:03:07Z

@stephentoub no user code runs on these schedulers, just our code (the code in the SocketConnection itself).

It's used to dispatch work from the IO thread this socket is associated with (on windows)

On linux it ends up being a double dispatch. @halter73 We could avoid the queue if we're already on a thread pool thread in the async SocketAwaitable.

stephentoub · 2018-03-09T15:03:29Z

+        {
+            var work = new Work
+            {
+                CallbackAdapter = (c, s) => ((Action<T>)c)((T)s),


If the adapter is the same for every work item, why take the space to store the delegate? I assume this will change once David's change goes through to make the scheduler state object instead of T?

We won't need this once state goes back to object.

Technically the CallbackAdapter changes as T does. My understanding is that the Action<object, object> assigned here is static but it's static per T meaning a new Action<object, object> gets initialized the first time this method gets called with a given T. It should be the equivalent to what we do here.

stephentoub · 2018-03-09T15:07:57Z

no user code runs on these schedulers, just our code (the code in the SocketConnection itself).

Ah, ok.

davidfowl

LGTM

vancem · 2018-03-12T20:53:23Z

I have a theory why dedicating threads to I/O and thread hoping for the request processing improves performance.

My theory is that it is a data-caching effect. Basically by having the I/O threads do pretty much nothing but I/O, they keep the code needed for that in their L1 cache. Thus I/O threads can run faster (because their L1 cache is 'precharged' with the data they need). Yes, you have the costs of handoff, but those happen only twice per request, and the actual data are at least in L2 so it is easy to come out ahead by specializing.

I want to think about whether we can test that this is what is happening, and whether we can leverage it more.

vancem · 2018-03-12T21:41:20Z

By the way both the libuv and the sockets (before or after) show that we only get to 85% CPU utilization.

libuv: [03:23:17.656] Max CPU (%): 85
after [03:19:41.107] Max CPU (%): 82

That is not where we want to be. We want to be at 100% (unless we know that we are network constrained?) I did not think you could do that with the JSON benchmark.

More things to look at.

halter73 · 2018-03-12T21:52:31Z

@vancem I agree that the perf improvement probably has something to do with the reduced context switching. I also think giving subsets of connections their own IO queues reduces contention.

vancem · 2018-03-12T23:02:05Z

I am not actually suggesting fewer context switches are the reason for the improved performance. In fact I would expect MORE context switching in the new approach. I am suggesting that it is because by ONLY doing the OS part of the read (which is a big chunk of the total processing), you allow those I/O threads to SPECIALIZE in doing just that (their L1 caches have all the OS code and data needed to do that), and thus run faster (since memory latency is often the limiting factor for CPU performance).

You can go further (as you have) and break the pipeline into more 'chunks' where each CPU can 'specialize in doing one chunk. however the problem is that the task scheduler is not 'smart' about scheduling these chunks, which will 'ruin' the cache that was built up (by the way you would also expect results to vary depending on the size of the machine caches (which doe vary from machine to machine)).

I am thinking about ways of both validating and maximizing the good effect here. Basically make the threadpool 'smart' about allowing threads to specialize in doing one particular thing.

benaadams · 2018-03-12T23:41:43Z

Was speculating on this; obv from a blackbox level... best I could come up with was

With immediate dispatching there's mostly likely a thread waiting for the IO so the data can be immediately handed over to user-space.

When the IO thread handles some of the request processing there isn't a thread waiting so the kernel needs to buffer the data then do an additional kernel->user mode copy when the thread comes back.

There would be an additional cost from thread switching and potentially cold caches from the additional dispatch; however the direct to user mode data might more than make up for this? Meltdown and Spectre probably haven't helped.

Just an idea...

vancem · 2018-03-13T18:49:07Z

It is easy enough to validate some of these things with ETW. We will know context switches and we can get DCache miss rates, and test in scnearios where more or less 'specialization' by thread/processor is possible. I am playing with this now...

halter73 requested a review from davidfowl March 7, 2018 11:26

halter73 commented Mar 7, 2018

View reviewed changes

stephentoub reviewed Mar 7, 2018

View reviewed changes

halter73 force-pushed the halter73/socket-coalesce branch from 60c1a0c to f7b7823 Compare March 7, 2018 22:48

halter73 force-pushed the halter73/socket-coalesce branch from f7b7823 to 45d0e42 Compare March 7, 2018 23:00

halter73 force-pushed the halter73/socket-coalesce branch from 45d0e42 to e2a16c0 Compare March 8, 2018 00:54

davidfowl reviewed Mar 8, 2018

View reviewed changes

halter73 force-pushed the halter73/socket-coalesce branch from e2a16c0 to 78dc643 Compare March 9, 2018 01:30

davidfowl mentioned this pull request Mar 9, 2018

Making performance improvements to sockets #2366

Closed

halter73 added 3 commits March 9, 2018 00:19

Use LibuvThread inspired schedulers in socket transport

0955360

Use ConcurrentQueue instead of swapping queues in CoalescingScheduler

50a37a4

Derive _numSchedulers from Environment.ProcessorCount

1d8ec43

halter73 force-pushed the halter73/socket-coalesce branch from 78dc643 to 1d8ec43 Compare March 9, 2018 08:20

davidfowl reviewed Mar 9, 2018

View reviewed changes

Explicitly allocate closures in CoalescingScheduler

3b9d42d

Make CoalescingScheduler optional

dc6fddd

halter73 force-pushed the halter73/socket-coalesce branch from cfb0250 to dc6fddd Compare March 9, 2018 10:39

stephentoub reviewed Mar 9, 2018

View reviewed changes

Respond to PR feedback

6a6fd10

davidfowl approved these changes Mar 9, 2018

View reviewed changes

Respond to more PR feedback

5466368

halter73 force-pushed the halter73/socket-coalesce branch from 9a9a8b8 to 5466368 Compare March 9, 2018 22:56

halter73 merged commit 71bff00 into dev Mar 9, 2018

halter73 deleted the halter73/socket-coalesce branch March 12, 2018 23:16

halter73 mentioned this pull request Apr 10, 2020

SocketSender/Receiver: also Inline continuations on Windows dotnet/aspnetcore#19394

Closed

halter73 mentioned this pull request Jun 28, 2024

Increase the default number of IO queues on larger machines dotnet/aspnetcore#56501

Merged

Conversation

halter73 commented Mar 7, 2018

Json Windows

Libuv

Sockets

Before

After

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidfowl commented Mar 7, 2018

Uh oh!

halter73 commented Mar 7, 2018

Uh oh!

benaadams commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

halter73 commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Mar 7, 2018

Uh oh!

halter73 commented Mar 7, 2018

Uh oh!

stephentoub commented Mar 7, 2018

Uh oh!

stephentoub commented Mar 7, 2018

Uh oh!

halter73 commented Mar 7, 2018

Json Windows

Sockets

Swapping Queues

ConcurrentQueue

Uh oh!

benaadams commented Mar 7, 2018

Uh oh!

benaadams commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Mar 7, 2018

Uh oh!

benaadams commented Mar 7, 2018

Uh oh!

stephentoub commented Mar 7, 2018

Uh oh!

halter73 commented Mar 8, 2018

Json Windows

Sockets

lock free

Uh oh!

davidfowl commented Mar 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidfowl commented Mar 8, 2018

Uh oh!

halter73 commented Mar 8, 2018

Uh oh!

halter73 commented Mar 9, 2018

Uh oh!

davidfowl commented Mar 9, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halter73 commented Mar 9, 2018

Json Windows Sockets

Plaintext Windows Sockets

benaadams commented Mar 7, 2018 •

edited

Loading

stephentoub commented Mar 7, 2018 •

edited

Loading

halter73 commented Mar 7, 2018 •

edited

Loading

stephentoub Mar 7, 2018 •

edited

Loading

benaadams commented Mar 7, 2018 •

edited

Loading

stephentoub commented Mar 9, 2018 •

edited

Loading

stephentoub Mar 9, 2018 •

edited

Loading

halter73 commented Mar 12, 2018 •

edited

Loading

benaadams commented Mar 12, 2018 •

edited

Loading