Skip to content

Try make IOQueue auto-parallelizing#21873

Closed
tmds wants to merge 1 commit into
dotnet:masterfrom
tmds:ioqueue_auto_parallelize
Closed

Try make IOQueue auto-parallelizing#21873
tmds wants to merge 1 commit into
dotnet:masterfrom
tmds:ioqueue_auto_parallelize

Conversation

@tmds
Copy link
Copy Markdown
Member

@tmds tmds commented May 15, 2020

Applies the technique from dotnet/runtime#35330 to IOQueue.

Applies the technique from dotnet/runtime#35330
to IOQueue.
@ghost ghost added the area-servers label May 15, 2020
@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 15, 2020

This is an experiment for benchmarking. I'm not sure what to expect.

cc @kouvel @adamsitnik @stephentoub @halter73 @davidfowl

@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 15, 2020

cc @benaadams

@adamsitnik
Copy link
Copy Markdown
Member

Applies the technique from dotnet/runtime#35330 to IOQueue.

I was thinking about it too :D

The other (and most probably a very stupid) idea I had was to try to have a scheduler that in the ctor would use reflection to access the internal field of ThreadPool that stores the work items in a ConcurrentQueue:

https://github.com/dotnet/runtime/blob/ec2209e7360cfae481c9f6df8540dccadb02dcb4/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPool.cs#L403

and implement the Schedule method as a simple call to enqueue of this CQ

Again, this is a very dirty idea ;)

@adamsitnik
Copy link
Copy Markdown
Member

@tmds could you provide the modified .dll so I could run some benchmarks for you?

@stephentoub
Copy link
Copy Markdown
Member

stephentoub commented May 15, 2020

Again, this is a very dirty idea ;)

As an experiment it's totally fine. We will not ship that.

@dotnet dotnet deleted a comment from pr-benchmarks Bot May 15, 2020
@dotnet dotnet deleted a comment from pr-benchmarks Bot May 15, 2020
@dotnet dotnet deleted a comment from pr-benchmarks Bot May 15, 2020
@dotnet dotnet deleted a comment from pr-benchmarks Bot May 15, 2020
@halter73
Copy link
Copy Markdown
Member

@aspnet-hello benchmark

@pr-benchmarks
Copy link
Copy Markdown

pr-benchmarks Bot commented May 15, 2020

Starting 'Default' pipelined plaintext benchmark with session ID '1279e01068334a1c99e7c62e8b24d597'. This could take up to 30 minutes...

@pr-benchmarks
Copy link
Copy Markdown

pr-benchmarks Bot commented May 15, 2020

Baseline

stdout: Starting baseline run on '8675632723423f6ea2568c4f3cabec9a8364285a'...
[11:32:22.648] Using worker Wrk
[11:32:22.828] Running session '1279e01068334a1c99e7c62e8b24d597' with description 'Before'
[11:32:22.828] Starting scenario Default on benchmark server...
[11:32:22.828] POST http://10.0.0.9:5001/jobs {"DriverVersion":1,"ServerVersion":3,"Id":0,"Hardware":null,"HardwareVersion":null,"OperatingSystem":null,"KestrelThreadCount":null,"Scenario":"Default","Scheme":"Http","Port":5000,"Path":"/plaintext","Connections":0,"Threads":0,"ReadyStateText":"Application started.","IsConsoleApp":false,"AspNetCoreVersion":"Latest","RuntimeVersion":"Latest","SdkVersion":"5.0.100-preview.5.20258.4","UseMonoRuntime":false,"NoGlobalJson":false,"Database":0,"StartupMainMethod":"00:00:00","BuildTime":"00:00:00","PublishedSize":0,"ServerCounters":[],"Source":{"BranchOrCommit":"master","Repository":"https://github.com/aspnet/benchmarks.git","Project":"src/Benchmarks/Benchmarks.csproj","InitSubmodules":false,"DockerFile":null,"DockerImageName":null,"DockerLoad":null,"DockerCommand":null,"DockerContextDirectory":null,"DockerFetchPath":null,"LocalFolder":null,"SourceCode":null},"Arguments":null,"NoArguments":false,"State":"New","Url":null,"WebHost":"KestrelSockets","UseRuntimeStore":false,"Attachments":[],"BuildAttachments":[],"LastDriverCommunicationUtc":"2020-05-15T23:32:22.598588Z","DotNetTrace":false,"DotNetTraceProviders":null,"Collect":false,"CollectArguments":null,"PerfViewTraceFile":null,"CollectStartup":false,"CollectCounters":false,"BasePath":null,"ProcessId":0,"EnvironmentVariables":{},"BuildArguments":[],"NoClean":false,"Framework":null,"Error":null,"SelfContained":true,"BeforeScript":null,"AfterScript":null,"MemoryLimitInBytes":0,"CpuLimitRatio":0.0,"CpuSet":null,"Counters":{},"Measurements":[],"Metadata":[],"Endpoints":[],"Variables":null,"WaitForExit":false,"Timeout":0,"StartTimeout":"00:00:00","Options":{"DisplayOutput":false,"Fetch":false,"FetchOutput":null,"DownloadFiles":[],"TraceOutput":null,"DisplayBuild":false,"RequiredOperatingSystem":null,"RequiredArchitecture":null,"DiscardResults":false,"BuildFiles":[],"OutputFiles":[]},"Features":[]}...
[11:32:22.841] 202 Accepted
[11:32:22.842] Fetching job: http://10.0.0.9:5001/jobs/179
[11:32:22.842] GET http://10.0.0.9:5001/jobs/179...
[11:32:23.906] GET http://10.0.0.9:5001/jobs/179...
[11:32:23.915] Job has been selected by the server ...
[11:32:23.925] Interrupting due to an unexpected exception
[11:32:23.955] System.IO.DirectoryNotFoundException: Could not find a part of the path '/app/aspnetcore/artifacts/bin/Microsoft.AspNetCore.Server.Kestrel/Release/netcoreapp5.0'.
   at System.IO.Enumeration.FileSystemEnumerator`1.CreateDirectoryHandle(String path, Boolean ignoreNotFound)
   at System.IO.Enumeration.FileSystemEnumerator`1.Init()
   at System.IO.Enumeration.FileSystemEnumerator`1..ctor(String directory, Boolean isNormalized, EnumerationOptions options)
   at System.IO.Enumeration.FileSystemEnumerable`1..ctor(String directory, FindTransform transform, EnumerationOptions options, Boolean isNormalized)
   at System.IO.Enumeration.FileSystemEnumerableFactory.UserFiles(String directory, String expression, EnumerationOptions options)
   at System.IO.Directory.InternalEnumeratePaths(String path, String searchPattern, SearchTarget searchTarget, EnumerationOptions options)
   at System.IO.Directory.GetFiles(String path, String searchPattern, SearchOption searchOption)
   at BenchmarksDriver.Program.Run(Uri serverUri, Uri[] clientUris, String sqlConnectionString, ServerJob serverJob, String session, String description, Int32 iterations, Int32 exclude, String shutdownEndpoint, TimeSpan span, List`1 downloadFiles, Boolean fetch, String fetchDestination, Boolean collectR2RLog, String traceDestination, CommandOption outputFileOption, CommandOption sourceOption, CommandOption scriptFileOption, CommandOption markdownOption, CommandOption writeToFileOption, Nullable`1 requiredOperatingSystem, CommandOption archOption, CommandOption saveOption, CommandOption diffOption)
[11:32:23.955] Deleting scenario 'Default' on benchmark server...
[11:32:23.955] DELETE http://10.0.0.9:5001/jobs/179...
[11:32:23.956] 202 Accepted


stderr: Baseline benchmark run on '8675632723423f6ea2568c4f3cabec9a8364285a' failed.

PR


@davidfowl
Copy link
Copy Markdown
Member

TFMS!

@halter73
Copy link
Copy Markdown
Member

@aspnet-hello benchmark

@dotnet dotnet deleted a comment from pr-benchmarks Bot May 16, 2020
@dotnet dotnet deleted a comment from pr-benchmarks Bot May 16, 2020
@pr-benchmarks
Copy link
Copy Markdown

pr-benchmarks Bot commented May 16, 2020

Starting 'Default' pipelined plaintext benchmark with session ID 'f4f99e3ddb614ab3ae381a3270183a8d'. This could take up to 30 minutes...

@pr-benchmarks
Copy link
Copy Markdown

pr-benchmarks Bot commented May 16, 2020

Baseline

Starting baseline run on '8675632723423f6ea2568c4f3cabec9a8364285a'...
RequestsPerSecond:           743,642
Max CPU (%):                 99
WorkingSet (MB):             88
Avg. Latency (ms):           3.41
Startup (ms):                486
First Request (ms):          121.24
Latency (ms):                0.41
Total Requests:              11,178,758
Duration: (ms)               15,030
Socket Errors:               25
Bad Responses:               0
Build Time (ms):             15,505
Published Size (KB):         120,816
SDK:                         5.0.100-preview.5.20258.4
Runtime:                     5.0.0-preview.6.20262.14
ASP.NET Core:                5.0.0-preview.5.20255.6


PR

Starting PR run on '778f8d5c3a8c90497fee6f65621544a1bed0ffde'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 743,642 |      99 |          88 |              3.41 |          486 |           15505 |              120816 |             121.24 |         0.41 |     25 |  1.00 |
|       After | 743,872 |      99 |          89 |              3.13 |          455 |            5502 |              120816 |             124.48 |         0.41 |      0 |  1.00 |


@benaadams
Copy link
Copy Markdown
Member

Is there a non-pipelined benchmark that can be triggered?

@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 18, 2020

@tmds could you provide the modified .dll so I could run some benchmarks for you?

@adamsitnik here you to: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll.tar.gz

@halter73
Copy link
Copy Markdown
Member

@aspnet-hello benchmark json

@dotnet dotnet deleted a comment from pr-benchmarks Bot May 18, 2020
@dotnet dotnet deleted a comment from pr-benchmarks Bot May 18, 2020
@pr-benchmarks
Copy link
Copy Markdown

pr-benchmarks Bot commented May 18, 2020

Starting 'json' pipelined plaintext benchmark with session ID '09c904d3d164426f9cfde9da065a433f'. This could take up to 30 minutes...

@pr-benchmarks
Copy link
Copy Markdown

pr-benchmarks Bot commented May 18, 2020

Baseline

Starting baseline run on '8675632723423f6ea2568c4f3cabec9a8364285a'...
RequestsPerSecond:           622,644
Max CPU (%):                 99
WorkingSet (MB):             201
Avg. Latency (ms):           3.9
Startup (ms):                461
First Request (ms):          144.42
Latency (ms):                0.44
Total Requests:              9,358,924
Duration: (ms)               15,030
Socket Errors:               0
Bad Responses:               0
Build Time (ms):             5,504
Published Size (KB):         120,819
SDK:                         5.0.100-preview.5.20258.4
Runtime:                     5.0.0-preview.6.20264.1
ASP.NET Core:                5.0.0-preview.5.20255.6


PR

Starting PR run on '778f8d5c3a8c90497fee6f65621544a1bed0ffde'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 622,644 |      99 |         201 |               3.9 |          461 |            5504 |              120819 |             144.42 |         0.44 |      0 |  1.00 |
|       After | 605,376 |      98 |         199 |              3.94 |          470 |            5502 |              120819 |             143.89 |         0.38 |      0 |  0.97 |


@benaadams
Copy link
Copy Markdown
Member

I was looking at the traces and sendmsg is very slow (comparatively); so thought it wasn't a good idea to have the sends on the same queue as the receives (thus blocking them).

However, didn't have great success in separating them #21981

@adamsitnik
Copy link
Copy Markdown
Member

@tmds the results:

obraz

@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 19, 2020

On ARM this gives some nice results. On Citrine, regression.
I'll close this based on Citrine regression.

@tmds tmds closed this May 19, 2020
@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 19, 2020

An interesting observation:

Starting PR run on '778f8d5c3a8c90497fee6f65621544a1bed0ffde'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 622,644 |      99 |         201 |               3.9 |          461 |            5504 |              120819 |             144.42 |         0.44 |      0 |  1.00 |
|       After | 605,376 |      98 |         199 |              3.94 |          470 |            5502 |              120819 |             143.89 |         0.38 |      0 |  0.97 |

After we use less CPU.
A hypothesis: there is contention, and parallelizing doesn't help.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants