Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Concurrent work queues, less native async_send transitions#522

Closed
benaadams wants to merge 2 commits into
aspnet:devfrom
benaadams:work-queues
Closed

Concurrent work queues, less native async_send transitions#522
benaadams wants to merge 2 commits into
aspnet:devfrom
benaadams:work-queues

Conversation

@benaadams
Copy link
Copy Markdown
Contributor

Process queues till complete
Remove the high contention locks
Reduce the calls to native async_send when not required (e.g. loop is currently processing).

From #519

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure there is a race here.

What if we don't call uv_async_send because the loop is apparently not idle when in fact the loop is merely finishing up? It's possible that _workQueue.IsEmpty returns true, but before _loopIdle is actually set to true, new work is enqueued and WakeUpLoop is called.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this for a while; only performant way I can think of is to hit it with a sledgehammer. Changed how OnPost works.

@halter73
Copy link
Copy Markdown
Member

halter73 commented Jan 4, 2016

There are basically two things we need to determine before merging this PR. Whether or not using a concurrent queue is faster than swapping normal queues inside a lock when consuming and always locking when adding.

I think we should separately measure the impact of tracking _loopIdle in managed code to reduce native calls to async_send.

@benaadams
Copy link
Copy Markdown
Contributor Author

The queue issue is all the writes contending on the lock rather than the swapping; which is probably quite efficient - though the writes also contend with the swap.

The async_send issue is the managed->native transition being expensive as it does stack prep-work; e.g. for Buffer.BlockCopy just calling its native counterpart (https://github.com/dotnet/coreclr/issues/2430#issuecomment-166594959)

There doesn't appear be any transition cost but Buffer::BlockCopy's function prolog takes 11.3%. Not surprising considering that it stores 7 registers to stack - that's 56 bytes of stores in a function that's supposed to copy, say, 16 bytes.

Also validation on the Safehandles, the return values and the jit not being able to optimize the native call to anything better, inline it etc; and that's ignoring whatever libuv actually does with the call

vs checking a bool :)

@benaadams
Copy link
Copy Markdown
Contributor Author

i.e. on the swapping if it was a single writer and single reader; rather than many writers and one reader then the swap on a regular Queue would be faster than a ConcurrentQueue - but then the lock would also only need to exist on the swap; not on Add

@halter73
Copy link
Copy Markdown
Member

dev:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.73ms    5.44ms 239.17ms   94.31%
    Req/Sec    36.83k     1.86k   59.48k    85.32%
  11813916 requests in 10.10s, 1.45GB read
Requests/sec: 1169764.49
Transfer/sec:    147.26MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.79ms    6.16ms 290.01ms   96.14%
    Req/Sec    36.91k     2.87k   73.88k    95.12%
  11820562 requests in 10.10s, 1.45GB read
Requests/sec: 1170379.09
Transfer/sec:    147.33MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.53ms    4.48ms 274.42ms   96.42%
    Req/Sec    36.90k     1.74k   61.62k    85.45%
  11844142 requests in 10.10s, 1.46GB read
Requests/sec: 1172712.52
Transfer/sec:    147.63MB

benaadams/work-queue-merged:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.39ms    3.16ms  76.18ms   94.44%
    Req/Sec    37.03k     2.65k   78.47k    95.15%
  11860835 requests in 10.10s, 1.46GB read
Requests/sec: 1174346.96
Transfer/sec:    147.83MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.30ms    2.64ms 234.58ms   93.71%
    Req/Sec    37.06k     2.87k   76.10k    95.96%
  11864330 requests in 10.10s, 1.46GB read
Requests/sec: 1174694.86
Transfer/sec:    147.88MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.70ms    6.36ms 154.95ms   95.57%
    Req/Sec    37.04k     3.55k   97.62k    96.54%
  11843404 requests in 10.10s, 1.46GB read
Requests/sec: 1172695.07
Transfer/sec:    147.62MB

The removed locks do seem ever-so-slightly faster, and I personally don't see any race condition in this PR. Still, when I forwarded this PR out to @GrabYourPitchforks (best writer of lock-free code I know) to verify the correctness, his first response is it really worth removing the locks which make this obviously correct. I'm inclined to agree. The locks make me much more warm and fuzzy inside since it's easier for me to reason about than access to a volatile variable.

@halter73 halter73 closed this Jan 12, 2016
@benaadams
Copy link
Copy Markdown
Contributor Author

Was this a comparison of current dev (faster); to branch (older)?

Regardless, this would probably be an unsensible time to merge it*

*As I imagine people will freeze to the RTM in production so as you and @davidfowl had mentioned previously; stability and security (or fixes for bad perf; rather than to make great perf better 😁 )

@halter73
Copy link
Copy Markdown
Member

No. I merged this change into dev and then ran the test.

I'm glad you agree. Like I said, I think this change is good, but I don't want any uncertainty.

@benaadams benaadams deleted the work-queues branch May 10, 2016 02:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants