Issue #17118 Loop unrolling in Span.CopyTo slow path by WinCPP · Pull Request #18435 · dotnet/corefx

WinCPP · 2017-04-15T19:14:10Z

Fixes #17118.

@shiftylogic @jkotas Kindy review the code change. I have merged the forward and reverse paths using a direction variable. I am not sure if this would affect vectorization avenues if that was the intention to have unrolled loops. Kindly advise. If yes, then I think the option will be to write separate blocks for forward and reverse copy. Thanks.

jkotas · 2017-04-15T19:18:05Z

This is performance fix. Could you please measure some performance numbers before and after for a few interesting cases - to verify that it is indeed making the code faster?

I have merged the forward and reverse paths

You need to measure the performance impact of such change. It is quite possible that this will make it slower than the trivial loop.

WinCPP · 2017-04-15T19:31:03Z

@jkotas Yup I will do it. Hmmm I understand it could impact performance - I was caught in dilemma - potential perf loss on one side and, code duplication and compile time size on the other...

I think of following combinations for performance testing,

Different span: of int, of Guids and of class object references.
Different sizes 2000 and 20000 of spans.
Three different alogs - previous, new merged and new split.
Each of above combinations to be run at least 10 times (?)

For the perf testing, would specifying just the framework, as in case of build and testing, be sufficient?

I think I will require some time to gather this :) In few hours from now, in the morning I've to go out for weekend. Hope its fine if I continue Monday onwards... Thanks!

ahsonkhan · 2017-04-17T23:42:16Z

Each of above combinations to be run at least 10 times (?)

I would think 3-5x would be enough to see if there is a regression or improvement.

shiftylogic · 2017-04-19T17:53:24Z

@WinCPP Any results for how this impacts performance?

WinCPP · 2017-04-19T18:25:38Z

Hi @shiftylogic I'll be working on this. I got two other milestone 2.0 that I'm contributing to on priority. Can we hold off on this for a while. I want to work on this too... just that this being for 'future' milestone, I took liberty to give it a lower priority... Hope it is fine...

WinCPP · 2017-04-20T19:50:24Z

I have started working on this to gather performance data. I have been getting errors with performance testing framework, but I think it is time I got the sandbox is shape.

I have entire setup as per the 'performance testing' document in the repo wiki. On giving msbuild command, the performance tests run. But at the end it fails giving this descriptive message,

  [4/21/2017 1:15:06 AM][INF] Statistics written to "M:\corefx\bin\Windows_NT.AnyCPU.Release\System.Memory.Performance.Tests\netcoreapp\Perf-System.Memory.Performance.Tests.csv"
  E:\Program Files (x86)\Python36-32\python.exe: can't open file 'M:\corefx\Tools/Microsoft.BenchView.JSONFormat\tools\measurement.py': [Errno 2] No such file or directory
  Finished running tests.  End time= 1:15:07.83, Exit code = 2

I have run repair for VS 2015 community edition. I don't know what more is required to have measurement.py. Is there some way by which I can force download the missing tools, if that is the case here?

@shiftylogic @jkotas @stephentoub @karelz Kindly advise.

karelz · 2017-04-21T00:16:04Z

@DrewScoggins @mellinoe can you please help troubleshoot the perf infra failures?

ahsonkhan · 2017-04-21T01:28:24Z

I did a clean build of corefx and tried to run the performance tests (following this):

D:\GitHub\Fork\corefx\src\System.Memory\tests>msbuild /t:BuildAndTest /p:Performance=true /p:ConfigurationGroup=Release /p:TargetOS=Windows_NT

I get the following errors:
CSC : error CS0006: Metadata file 'D:\GitHub\Fork\corefx\bin/runtime/netcoreapp-Windows_NT-Release-x64/xunit.core.dll' could not be found [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
CSC : error CS0006: Metadata file 'D:\GitHub\Fork\corefx\bin/runtime/netcoreapp-Windows_NT-Release-x64/Xunit.NetCore.Extensions.dll' could not be found [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
CSC : error CS0006: Metadata file 'D:\GitHub\Fork\corefx\bin/runtime/netcoreapp-Windows_NT-Release-x64/xunit.assert.dll' could not be found [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
CSC : error CS0006: Metadata file 'D:\GitHub\Fork\corefx\bin/runtime/netcoreapp-Windows_NT-Release-x64/xunit.abstractions.dll' could not be found [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
CSC : error CS0006: Metadata file 'D:\GitHub\Fork\corefx\bin/runtime/netcoreapp-Windows_NT-Release-x64/xunit.performance.core.dll' could not be found [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
CSC : error CS0006: Metadata file 'D:\GitHub\Fork\corefx\bin/runtime/netcoreapp-Windows_NT-Release-x64/xunit.performance.api.dll' could not be found [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
Done Building Project "D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj" (BuildAndTest target(s)) -- FAILED.

Build FAILED.

mellinoe · 2017-04-21T01:51:45Z

@ahsonkhan Have you done the "source build" (build.cmd) in release mode first? That error seems like the one you'd get if you didn't.

ahsonkhan · 2017-04-21T02:02:54Z

Have you done the "source build" (build.cmd) in release mode first? That error seems like the one you'd get if you didn't.

No, I hadn't. I ran build.cmd -release. That resolved the issue I mentioned above.

I get this error now (and the testResults.xml doesn't exist in the bin directory):
RunTestsForProject:
D:\GitHub\Fork\corefx\bin/AnyOS.AnyCPU.Release/System.Memory.Tests/netcoreapp//RunTests.cmd D:\GitHub\Fork\corefx\bin/testhost/netcoreapp-Windows_NT-Release-x64/
Using D:\GitHub\Fork\corefx\bin\testhost\netcoreapp-Windows_NT-Release-x64\ as the test runtime folder.
Executing in D:\GitHub\Fork\corefx\bin\AnyOS.AnyCPU.Release\System.Memory.Tests\netcoreapp
Running tests... Start time: 18:59:36.10
Command(s):
D:\GitHub\Fork\corefx\bin\testhost\netcoreapp-Windows_NT-Release-x64\dotnet.exe PerfRunner.exe --perf:runid Perf
if exist Perf-System.Memory.Tests.xml (
py D:\GitHub\Fork\corefx\Tools/Microsoft.BenchView.JSONFormat\tools\measurement.py xunit Perf-System.Memory.Tests.xml --better desc --drop-first-value --append -o D:\GitHub\Fork\corefx\measurement.json
)
The application to execute does not exist: 'D:\GitHub\Fork\corefx\bin\AnyOS.AnyCPU.Release\System.Memory.Tests\netcoreapp\PerfRunner.exe'

Finished running tests. End time=18:59:36.11, Exit code = -2147450751
D:\GitHub\Fork\corefx\Tools\tests.targets(326,5): warning MSB3073: The command "D:\GitHub\Fork\corefx\bin/AnyOS.AnyCPU.Release/System.Memory.Tests/netcoreapp//RunTests.cmd D:\GitHub\Fork\corefx\bin/testhost/netcoreapp-Windows_NT-Release
-x64/" exited with code -2147450751. [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
The previous error was converted to a warning because the task was called with ContinueOnError=true.
Build continuing because "ContinueOnError" on the task "Exec" is set to "true".
D:\GitHub\Fork\corefx\Tools\tests.targets(334,5): error : One or more tests failed while running tests from 'System.Memory.Tests' please check D:\GitHub\Fork\corefx\bin/AnyOS.AnyCPU.Release/System.Memory.Tests/netcoreapp/testResults.xml
for details! [D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj]
Done Building Project "D:\GitHub\Fork\corefx\src\System.Memory\tests\System.Memory.Tests.csproj" (BuildAndTest target(s)) -- FAILED.

Build FAILED.

WinCPP · 2017-04-21T04:29:27Z

@mellinoe ... about the issue that I'm facing. The folder "M:\corefx\Tools/Microsoft.BenchView.JSONFormat" itself doesn't exist. Looks like it didn't get downloaded from the build repo? Is there someway to force download the missing tools? because simple 'clean.cmd' followed by 'build.cmd' doesn't seem to be pulling it down...

DrewScoggins · 2017-04-21T16:43:55Z

I have a PR out to fix this issue. The main problem is that we did not have the calls to the tools that we use to upload the data to our results service completely hidden by the logging flag. This change should fix that. In the meantime you can get the tooling by using the command I pasted below. This will unblock you. You should replace %WORKSPACE% with the root of the CoreFX repo. Also of course ensure that you have a copy of nuget.exe.

C:\Tools\nuget.exe install Microsoft.BenchView.JSONFormat -Source http://benchviewtestfeed.azurewebsites.net/nuget -OutputDirectory "%WORKSPACE%\Tools" -Prerelease -ExcludeVersion

WinCPP · 2017-04-21T20:32:18Z

@DrewScoggins awesome! It solved my problem. I'm set to design and execute the perf tests.... Thanks!

Between, I just wanted to point out that the performance testing steps document (here) mentions "...run from the tests directory." [This is first line in second paragraph for Windows section under "Running the tests" section.] Actually when run in the 'tests' directory it gives 'PerfRunner.exe' does not exist... I think the line should mention running the msbuild command in "tests\Performance" directory. Only then did I get the expected output. Thanks!

WinCPP · 2017-04-22T08:35:09Z

@jkotas @shiftylogic Need help with the command for running performance tests for the slow path. Following is the command that I'm running in the src\System.Memory\tests\performance directory.

msbuild /t:RebuildAndTest /p:Performance=true /p:ConfigurationGroup=Release /p:TargetGroup=netfx

This is mix of the command mentioned on performance test help page (here) and @jkotas 's comment on the issue page (here) about how to invoke slow path i.e. netfx framework... Kindly let me know if the above command makes sense. I do not see exeisting performance tests being executed when I issue the command. However if I use the command on the performance test page (for Windows_NT OS), it runs and dumps statistics on the screen.

Kindly help.

WinCPP · 2017-04-23T05:27:55Z

@shiftylogic @jkotas @ahsonkhan @karelz @mellinoe Hope I am not causing inconvenience by this thread. But the perf test configuration for TargetGroup netfx is not going smooth. From the 'developer guide' and 'project guidelines' documents, I figured out that I need to additionally mention /p:TargetOS=Windows_NT, so the full command for performance testing with netfx framework should be (I think),

msbuild /t:RebuildAndTest /p:Performance=true /p:ConfigurationGroup=Release /p:TargetGroup=netfx /p:TargetOS=Windows_NT

With that the performance loop was triggered but I got a new error related to dotnet.exe not being present in the testhost directory. Snippet of the relevant error lines is towards end of this comment. In the folder D:\WinCPP\corefx\bin\testhost, the structure of netfx-Windows_NT-Release-x64 folder is way different from netcoreapp-Windows_NT-Debug-x64. The latter has dotnet.exe but the former (netfx) just has a dump of various assemblies, I think from the build.

I am now trying to figure out how to get dotnet.exe into testhost\netfx* folder.

The console output of interest, as I mentioned above, is given below. Line of interest (dotnet.exe) is 4th from top.

RunTestsForProject:
  D:\WinCPP\corefx\bin/AnyOS.AnyCPU.Release/System.Memory.Performance.Tests/netfx//RunTests.cmd D:\WinCPP\corefx\bin/testhost/netfx-Windows_NT-Release-x64/
  Using D:\WinCPP\corefx\bin\testhost\netfx-Windows_NT-Release-x64\ as the test runtime folder.
  'D:\WinCPP\corefx\bin\testhost\netfx-Windows_NT-Release-x64\\dotnet.exe' is not recognized as an internal or external command,
  operable program or batch file.
  Executing in D:\WinCPP\corefx\bin\AnyOS.AnyCPU.Release\System.Memory.Performance.Tests\netfx\
  Running tests... Start time: 10:44:17.35
  Command(s):
  set DEVPATH=D:\WinCPP\corefx\bin\testhost\netfx-Windows_NT-Release-x64\
  D:\WinCPP\corefx\bin\testhost\netfx-Windows_NT-Release-x64\\dotnet.exe PerfRunner.exe --perf:runid Perf
  if exist Perf-System.Memory.Performance.Tests.xml (
  py D:\WinCPP\corefx\Tools/Microsoft.BenchView.JSONFormat\tools\measurement.py xunit Perf-System.Memory.Performance.Tests.xml --better desc --drop-first-value --append -o D:\WinCPP\corefx\measurement.json
  )
  Finished running tests.  End time=10:44:17.35, Exit code = 9009
D:\WinCPP\corefx\Tools\tests.targets(326,5): warning MSB3073: The command "D:\WinCPP\corefx\bin/AnyOS.AnyCPU.Release/System.Memory.Performance.Tests/netfx//RunTests.cmd D:\WinCPP\corefx\bin/testhost/netfx-Windows_NT-Release-x64/" exited
 with code 9009. [D:\WinCPP\corefx\src\System.Memory\tests\Performance\System.Memory.Performance.Tests.csproj]
  The previous error was converted to a warning because the task was called with ContinueOnError=true.
  Build continuing because "ContinueOnError" on the task "Exec" is set to "true".
D:\WinCPP\corefx\Tools\tests.targets(334,5): error : One or more tests failed while running tests from 'System.Memory.Performance.Tests' please check D:\WinCPP\corefx\bin/AnyOS.AnyCPU.Release/System.Memory.Performance.Tests/netfx/testRe
sults.xml for details! [D:\WinCPP\corefx\src\System.Memory\tests\Performance\System.Memory.Performance.Tests.csproj]
Done Building Project "D:\WinCPP\corefx\src\System.Memory\tests\Performance\System.Memory.Performance.Tests.csproj" (RebuildAndTest target(s)) -- FAILED.```

karelz · 2017-04-23T05:57:03Z

@ahsonkhan @shiftylogic @KrzysztofCwalina can you please help with guidance how you guys do perf comparisons?

@DrewScoggins can you please try to get @WinCPP unblocked?

@WinCPP if you need to compare couple of tests, try using BenchmarkDotNet for one off perf test measurements ... to get yourself unblocked.

WinCPP · 2017-04-23T18:44:29Z

I tried collecting data by changing the existing normal test to print performance data. (I wanted to finish this round of perf data today... hence the short cut...)

The data is towards end of the comment. Meaning of various keywords in the tables are as follows,

Existing - current implementation without loop unrolling that has two separate loops for foward and reverse traversal respectively. Data set (1) and (4).
Two loops - 'Existing' forward and reverse loops modified with loop unrolling. Data set (2) and (5).
One loop - Combined loop with direction variable to indicate forward or reverse index access (current version in PR - it had a 'mul' operation in generated IL). Data set (3) and (6).
int vs value type data - Each of the above, were tested with a span of ints and a value type with an two ints, a long and a char with respective data being in column (a) and (b).
Loop direction - forward and reverse direction for the loops that copy data from source span to destination span with respective data in Table A and Table B. Forward loop is hit when source span begins after destination span and reverse is hit when source span begins before destination span.

Looks like the One Loop implementation that I had pushed previous doesn't have any more benefits, in fact is worse in some case. So I will replace that with Two Loops implementation.

Between Existing and Two Loops implementation, I am not able to make a call. Latter appears to show better performance for span of ints (data set 2a, 5a vs 1a, 4a) but has negligible effect in case of value types (data set 2b, 5b vs 1b, 4b). @jkotas appreciate your inputs. Thanks!

+--------+-----------------------------------------------------+
|        |             Copy loop direction: Forward            |
|        |             Copy direction: Dest <- Src             |
|        |         Source starts later than desitnation        |
|        +-----------------+-----------------+-----------------+
|        |   Existing (1)  |  Two Loops (2)  |  One Loop (3)   |
|        +--------+--------+--------+--------+--------+--------+
|        |   (a)  |  (b)   |   (a)  |  (b)   |   (a)  |  (b)   |
|        +--------+--------+--------+--------+--------+--------+
|        | 161    | 486    | 152    | 483    | 162    | 487    |
|        | 148    | 399    | 139    | 394    | 149    | 400    |
|        | 153    | 466    | 144    | 496    | 154    | 519    |
|        | 153    | 455    | 145    | 429    | 155    | 437    |
|        | 154    | 451    | 144    | 445    | 154    | 438    |
|        | 153    | 449    | 144    | 440    | 154    | 447    |
|   T    | 153    | 442    | 144    | 436    | 156    | 479    |
|   A    | 154    | 424    | 144    | 433    | 156    | 521    |
|   B    | 153    | 502    | 144    | 495    | 154    | 435    |
|   L    | 154    | 444    | 146    | 457    | 156    | 438    |
|   E    | 153    | 435    | 144    | 433    | 156    | 436    |
|        | 153    | 437    | 145    | 434    | 155    | 434    |
|   A    | 153    | 446    | 144    | 430    | 154    | 438    |
|        | 154    | 438    | 147    | 420    | 154    | 411    |
|        | 153    | 433    | 144    | 485    | 156    | 434    |
|        | 155    | 435    | 146    | 433    | 155    | 439    |
|        | 153    | 435    | 145    | 426    | 155    | 439    |
|        | 153    | 441    | 145    | 429    | 154    | 435    |
|        | 153    | 477    | 145    | 426    | 155    | 524    |
|        | 152    | 433    | 145    | 522    | 155    | 431    |
+--------+--------+--------+--------+--------+--------+--------+
|        | 153    | 444.32 | 144.42 | 445.42 | 154.58 | 449.21 |
|        |   1.34 |  20.7  |   1.53 |  30.88 |   1.53 |  34.41 |
+--------+--------+--------+--------+--------+--------+--------+

+--------+-----------------------------------------------------+
|        |             Copy loop direction: Reverse            |
|        |             Copy direction: Src -> Dest             |
|        |       Source starts earlier than destination        |
|        +-----------------+-----------------+-----------------+
|        |   Existing (1)  |  Two Loops (2)  |  One Loop (3)   |
|        +--------+--------+--------+--------+--------+--------+
|        |   (a)  |  (b)   |   (a)  |  (b)   |   (a)  |  (b)   |
|        +--------+--------+--------+--------+--------+--------+
|        | 157    | 415    | 147    | 403    | 154    | 402    |
|        | 167    | 410    | 143    | 384    | 150    | 398    |
|        | 164    | 500    | 149    | 520    | 156    | 513    |
|        | 162    | 550    | 150    | 540    | 153    | 434    |
|        | 162    | 448    | 148    | 437    | 153    | 518    |
|        | 163    | 511    | 150    | 457    | 156    | 551    |
|   T    | 162    | 599    | 149    | 430    | 155    | 451    |
|   A    | 163    | 472    | 149    | 458    | 155    | 469    |
|   B    | 163    | 455    | 149    | 442    | 154    | 441    |
|   L    | 162    | 451    | 149    | 467    | 154    | 480    |
|   E    | 164    | 485    | 149    | 476    | 154    | 540    |
|        | 163    | 499    | 150    | 447    | 154    | 451    |
|   B    | 163    | 451    | 149    | 447    | 153    | 448    |
|        | 162    | 454    | 149    | 444    | 154    | 445    |
|        | 162    | 445    | 149    | 428    | 154    | 443    |
|        | 162    | 443    | 149    | 515    | 154    | 444    |
|        | 163    | 511    | 149    | 436    | 154    | 457    |
|        | 162    | 447    | 149    | 441    | 155    | 462    |
|        | 163    | 449    | 149    | 439    | 154    | 438    |
|        | 163    | 445    | 149    | 441    | 154    | 443    |
+--------+--------+--------+--------+--------+--------+--------+
|        | 162.89 | 475    | 148.79 | 455.21 | 154    | 464.53 |
|        |   1.17 |  43.61 |   1.44 |  35.39 |   1.26 |  38.17 |
+--------+--------+--------+--------+--------+--------+--------+

karelz · 2017-04-23T18:57:22Z

@WinCPP can you put your modifications / experiments into a gist or somewhere publicly accessible? It's super-useful when people want to double-check your changes or when they have idea they want to measure & build on top of your changes ...

jkotas · 2017-04-23T20:56:36Z

+1 Could you please share the exact source of the test? In particular, I would like to know the block size that you are using for the test.

WinCPP · 2017-04-24T03:01:44Z

Kindly refer to the following commit in another branch in my fork for the source. It has different CopyTo versions that I used for testing with renaming and also the test wrapper. I have added comments there to explain what I was trying to do. Thanks!

Commit link: WinCPP@8881ede

jkotas · 2017-04-24T16:59:03Z

CopyPerfTestWrapperBackward<T>(20000000, iterationCount, timeSpent);

You should measure different blockSizes. 20M block size won't fit into the cache, and so the micro benchmark will be likely dominated by the memory latency. It may explain why you are not seeing much difference between different variants of the code.

DrewScoggins · 2017-04-24T18:23:59Z

It looks like for actually collecting performance numbers we are kind of unblocked, by what @WinCPP did. As for the actual problem at hand we have never tested or done any work to make performance tests run on any configuration other than the default, netcoreapp. It certainly would be possible to do the work to make the tests work under this configuration, but I am not sure what the benefits would be nor do I have a good idea right now of that amount of work involved.

karelz · 2017-04-24T19:01:59Z

@DrewScoggins the benefits are obvious (at least to me). I thought this was always part of the work as well. Let's chat to poke more at the gaps of expectations here ...

shiftylogic · 2017-04-25T17:57:05Z

I took his code change and ran some performance numbers on it. Below are the results.

It appears that the unrolled version gets us ~15-20% (give or take with noise) for non-trivial sized buffers.

NOTE: Ignore the "Fast?" column. It only makes sense if I run the tests that include taking the fast path through (copy block).

Tag	Length	Base	Unrolled	Ratio	Fast?	Forward?
inside	16	14	15	1.07	False	True
overlap front	16	11	11	1.00	False	False
overlap back	16	14	13	0.93	False	True
covers head	16	10	13	1.30	False	False
covers tail	16	12	7	0.58	False	True
inside	256	89	51	0.57	False	True
overlap front	256	62	85	1.37	False	False
overlap back	256	54	44	0.81	False	True
covers head	256	53	46	0.87	False	False
covers tail	256	56	46	0.82	False	True
inside	2048	367	337	0.92	False	True
overlap front	2048	405	316	0.78	False	False
overlap back	2048	376	309	0.82	False	True
covers head	2048	372	316	0.85	False	False
covers tail	2048	376	314	0.84	False	True
inside	4096	753	654	0.87	False	True
overlap front	4096	791	608	0.77	False	False
overlap back	4096	2148	1869	0.87	False	True
covers head	4096	762	606	0.80	False	False
covers tail	4096	742	609	0.82	False	True
inside	16384	2850	2383	0.84	False	True
overlap front	16384	3149	2463	0.78	False	False
overlap back	16384	2884	2399	0.83	False	True
covers head	16384	2837	2294	0.81	False	False
covers tail	16384	2897	2396	0.83	False	True
inside	10485760	1864822	1541150	0.83	False	True
overlap front	10485760	2034064	1531378	0.75	False	False
overlap back	10485760	1963735	1504714	0.77	False	True
covers head	10485760	1837571	1587331	0.86	False	False
covers tail	10485760	1853799	1516896	0.82	False	True

WinCPP · 2017-04-25T18:20:12Z

Sorry guys, got held up at work... @shiftylogic thanks for looking into it. Just asking, is the data using the version of "Two Loops" version of the CopyTo on other branch...? Based on further instructions, I will check-in that version into this PR and resolve the conflicts... Thanks!

karelz · 2017-04-25T18:27:34Z

FYI: We discussed with @DrewScoggins the need to have ability to run perf tests against Desktop / current NuGet packages targeting Desktop.
We concluded to:

Put it on the perf team backlog (@DrewScoggins can you please link the issue here when you create it?)
Update docs now saying, it doesn't work yet (@DrewScoggins please link the issue/PR/commit here as well, thanks!)

shiftylogic · 2017-04-25T18:32:51Z

@WinCPP No, these numbers are the one loop variant. I didn't test the two loop variant.

shiftylogic · 2017-04-25T18:36:05Z

If you provide me the snippet of code that does the two-loop variant, I can run those numbers quickly and compare.

Either way, you also need to fix the commit conflict due to the bug fix for overlap detection that was merged this morning. Shouldn't impact your actual change.

WinCPP · 2017-04-25T19:10:35Z

Ah! So two loop variant needs data to be generated...? @shiftylogic is your framework shared somewhere that I could use it? My test app is too ad hoc and requires a lot of manual data collation...

WinCPP · 2017-04-25T20:31:29Z

@shiftylogic oops our replies crossed each other, i just noticed. the other loop is here... (link)

So based on the outputs and recommendation from you and @jkotas I will pick up the approved implementation and check it in with merge conflict resolution...

shiftylogic · 2017-04-26T00:49:46Z

It appears that the "two loop" variant of this change results in no performance gain at all. I'm digging into why this is, but the JIT generates better code for the "one loop" variant. I'm talking to the JIT team about what is causing this.

For now, can you please resolve the conflicts for the "one loop" variant and we can take this change. It gives us a decent perf bump.

WinCPP · 2017-04-26T19:26:11Z

@shiftylogic I have resolved the conflict and the builds have passed...

Between, if it is not off-limits for me (IPR, etc.), would you mind sharing the gist of your discussion with the JIT team about the code generated for "one / two loop" variants... I would love to read. Thanks!

shiftylogic · 2017-04-26T22:13:59Z

The one-loop variant has a bit of extra math (via the direction variable) that causes the JIT to do CSE on the subexpressions (runCount + direction * n) into a temp which resulted in slightly better code generation. The JIT decided that CSE wasn't necessary in the two-loop variant and thus resulted in many extra instructions being generated.

danmoseley · 2017-04-27T00:08:56Z

@ahsonkhan if this is approved should you merge?

WinCPP · 2017-04-27T18:28:39Z

@shiftylogic thanks for sharing! I'm sure intricacies must be lot interesting... :) Thanks!

karelz · 2017-05-01T23:03:11Z

FYI: Here's the tracking issue #19200 for:

FYI: We discussed with @DrewScoggins the need to have ability to run perf tests against Desktop / current NuGet packages targeting Desktop.
We concluded to:

Put it on the perf team backlog (@DrewScoggins can you please link the issue here when you create it?)

dnfclas added the cla-already-signed label Apr 15, 2017

karelz added the area-System.Memory label Apr 17, 2017

karelz assigned WinCPP, ahsonkhan, KrzysztofCwalina and shiftylogic Apr 17, 2017

WinCPP force-pushed the Issue-17118-2 branch 2 times, most recently from 95e5b7e to 1f64caf Compare April 18, 2017 02:44

WinCPP mentioned this pull request Apr 18, 2017

Issue #15622 New overload Dictionary.Remove #18109

Merged

Issue #17118 Loop unrolling in Span.CopyTo slow path

cbbe45d

WinCPP force-pushed the Issue-17118-2 branch from 1f64caf to cbbe45d Compare April 23, 2017 15:17

WinCPP referenced this pull request in WinCPP/corefx Apr 24, 2017

Performance testing

8881ede

Merge branch 'master' into Issue-17118-2

6c5ccf9

shiftylogic approved these changes Apr 26, 2017

View reviewed changes

ahsonkhan approved these changes Apr 26, 2017

View reviewed changes

ahsonkhan merged commit 10afd3f into dotnet:master Apr 27, 2017

karelz modified the milestone: 2.0.0 Apr 28, 2017

WinCPP deleted the Issue-17118-2 branch September 9, 2019 03:42

Conversation

WinCPP commented Apr 15, 2017

Uh oh!

jkotas commented Apr 15, 2017

Uh oh!

WinCPP commented Apr 15, 2017

Uh oh!

ahsonkhan commented Apr 17, 2017

Uh oh!

shiftylogic commented Apr 19, 2017

Uh oh!

WinCPP commented Apr 19, 2017

Uh oh!

WinCPP commented Apr 20, 2017

Uh oh!

karelz commented Apr 21, 2017

Uh oh!

ahsonkhan commented Apr 21, 2017

Uh oh!

mellinoe commented Apr 21, 2017

Uh oh!

ahsonkhan commented Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WinCPP commented Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrewScoggins commented Apr 21, 2017

Uh oh!

WinCPP commented Apr 21, 2017

Uh oh!

WinCPP commented Apr 22, 2017

Uh oh!

WinCPP commented Apr 23, 2017

Uh oh!

karelz commented Apr 23, 2017

Uh oh!

WinCPP commented Apr 23, 2017

Uh oh!

karelz commented Apr 23, 2017

Uh oh!

jkotas commented Apr 23, 2017

Uh oh!

WinCPP commented Apr 24, 2017

Uh oh!

jkotas commented Apr 24, 2017

Uh oh!

DrewScoggins commented Apr 24, 2017

Uh oh!

karelz commented Apr 24, 2017

Uh oh!

shiftylogic commented Apr 25, 2017

Uh oh!

WinCPP commented Apr 25, 2017

Uh oh!

karelz commented Apr 25, 2017

Uh oh!

shiftylogic commented Apr 25, 2017

Uh oh!

shiftylogic commented Apr 25, 2017

Uh oh!

WinCPP commented Apr 25, 2017

Uh oh!

WinCPP commented Apr 25, 2017

Uh oh!

shiftylogic commented Apr 26, 2017

Uh oh!

WinCPP commented Apr 26, 2017

Uh oh!

shiftylogic commented Apr 26, 2017

Uh oh!

danmoseley commented Apr 27, 2017

Uh oh!

WinCPP commented Apr 27, 2017

Uh oh!

karelz commented May 1, 2017

Uh oh!

Reviewers

Assignees

Labels

ahsonkhan commented Apr 21, 2017 •

edited

Loading

WinCPP commented Apr 21, 2017 •

edited

Loading