SocketAsyncEngine.Unix: improve performance of context lookup#36358
Conversation
|
@davidfowl @sebastienros can you benchmark this to see if it matters in scenarios you care about? |
|
Linux x64_Release has infrastructure related failure: @dotnet-bot test corefx-ci (Linux x64_Release) please |
|
/azp run corefx-ci (Linux x64_Release) |
|
/azp run corefx-ci |
| try | ||
| { | ||
| bool shutdown = false; | ||
| SocketAsyncContext[] contexts = new SocketAsyncContext[EventBufferCount]; |
There was a problem hiding this comment.
Could we make _handleToContextMap a ConcurrentDictionary<IntPtr, SocketAsyncContext>? That would result in an allocation on every add, but adds are rare-ish, only when new sockets are added (right?), in which case there's already other allocation happening (e.g. the socket itself and all associated state), and we could avoid the lock on reads entirely, plus avoid this array allocation per event loop, and avoid needing to iterate through the events twice.
There was a problem hiding this comment.
I'll make a branch that implements this and we can benchmark both.
There was a problem hiding this comment.
This branch uses ConcurrentDictionary: tmds@9cc7527. @stephentoub , you can add some review comments on that commit if you want.
There was a problem hiding this comment.
Thanks. I prefer the ConcurrentDictionary version, but we should see what perf looks like for both.
|
@tmds Would you mind sharing the dlls with and without the changes you want to benchmark? If you send me an email or tweet I can create a shared folder for you to drop the files in. |
…or context lookup
|
@sebastienros, @tmds, were you guys able to get any benchmarking done? Anything I can help with? |
I sent dlls to Sébastien. I didn't get benchmark results yet. |
|
I got these benchmark results from @sebastienros:
That is: a 0.17% increase with the ConcurrentDictionary, and a 0.4% increase with Dictionary + lock. [This benchmark is: Benchmarked Plaintext non-pipelined on Linux. Latest runtime, aspnet and sdk. All runs done 5 times, excluding highest lowest result for each run and average the 3 remaining ones.] |
Thanks. How representative of access patterns do we think this test is? If it's representative, the dictionary+lock seems fine. But if we expect there may be other access patterns, I'd be tempted to suggest the concurrent dictionary route, since it won't ever have contention on reads, whereas the dictionary+lock does, and with a global lock. |
|
With these small gains, we can just go for the simpler implementation. I'm changing this to the ConcurrentDictionary version. |
36b7c23 to
9cc7527
Compare
|
Changed to the ConcurrentDictionary implementation. |
|
@stephentoub probably this is ok to merge? |
|
Regular Dictionary with IntPtr key should be much faster than a ConcurrentDictionary, due to the optimizations it has around struct keys; then if the lock is changed from a static lock to an instance lock; it should generally be uncontended on the receive? (unlike the static lock) |
Doesn't matter if there's contention; the lock still adds non-trivial overheads... using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
[InProcess]
[MemoryDiagnoser]
public class Test
{
public static void Main()
{
BenchmarkRunner.Run<Test>();
}
public Test()
{
for (int i = 0; i < 1000; i++)
{
_cd.TryAdd((IntPtr)i, new object());
_d.Add((IntPtr)i, new object());
}
}
private ConcurrentDictionary<IntPtr, object> _cd = new ConcurrentDictionary<IntPtr, object>();
private Dictionary<IntPtr, object> _d = new Dictionary<IntPtr, object>();
[Benchmark]
public bool Concurrent()
{
bool result = true;
for (int i = 0; i < 1000; i++) result &= _cd.TryGetValue((IntPtr)i, out _);
return result;
}
[Benchmark]
public bool Locked()
{
bool result = true;
for (int i = 0; i < 1000; i++) lock (_d) result &= _d.TryGetValue((IntPtr)i, out _);
return result;
}
} |
…or context lookup (dotnet/corefx#36358) Commit migrated from dotnet/corefx@1c881ce
No description provided.